Professional Documents
Culture Documents
Hima Lakkaraju XAI ShortCourse
Hima Lakkaraju XAI ShortCourse
Hima Lakkaraju XAI ShortCourse
Hima Lakkaraju
Schedule for Today
• 9am to 1020am: Introduction; Overview of Inherently Interpretable Models
• 3pm to 4pm: Analyzing Model Interpretations and Explanations, and Future Research Directions
[ Weller 2017 ]
Motivation
3
[ Weller 2017 ]
4
[ Weller 2017, Lipton 2017, Doshi-Velez and Kim 2016 ]
5
When and Why Model Understanding?
ML is increasingly being employed in complex high-stakes settings
6
When and Why Model Understanding?
• High-stakes decision-making settings
• Impact on human lives/health/finances
• Settings relatively less well studied, models not extensively validated
9
Example: Why Model Understanding?
Input This model is
relying on incorrect
features to make
this prediction!! Let
me fix the model
Model understanding facilitates debugging.
Model Understanding
10
[ Larson et. al. 2016 ]
Crimes
Gender
11
Example: Why Model Understanding?
Loan Applicant Details I have some means
for recourse. Let me
go and work on my
promotion and pay
Model Understanding my bills on time.
Model understanding helps provide recourse to individuals
who are adversely affected by model predictions.
Increase salary by
50K + pay credit
card bills on time
for next 3 months
to get a loan
12
Example: Why Model Understanding?
Model Understanding This model is using
Patient Data
irrelevant features when
If gender = female, predicting on female
25, Female, Cold if ID_num > 200, then sick subpopulation. I should
32, Male, No not trust its predictions
31, Male, Cough If gender = male, for that group.
if cold = true and cough = true, then sick
.
.
Model understanding helps assess if and when to trust
. model predictions when making decisions.
. Predictions
Healthy
Sick
Sick
.
Predictive .
Healthy
Model Healthy
Sick
13
Example: Why Model Understanding?
Model Understanding
Patient Data This model is using
irrelevant features when
If gender = female,
predicting on female
25, Female, Cold if ID_num > 200, then sick
subpopulation. This
32, Male, No
31, Male, Cough If gender = male, cannot be approved!
. Model understanding allows us to vet models to determine
if cold = true and cough = true, then sick
.
. if they are suitable for deployment in real world.
. Predictions
Healthy
Sick
Sick
.
Predictive .
Healthy
Model Healthy
Sick
14
Summary: Why Model Understanding?
Utility Stakeholders
15
[ Letham and Rudin 2015; Lakkaraju et. al. 2016 ]
16
[ Ribeiro et. al. 2016, Ribeiro et al. 2018; Lakkaraju et. al. 2019 ]
Explainer
17
[ Cireşan et. al. 2012, Caruana et. al. 2006, Frosst et. al. 2017, Stewart 2020 ]
20
Inherently Interpretable Models vs.
Post hoc Explanations
21
Agenda
❑ Inherently Interpretable Models
22
Agenda
❑ Inherently Interpretable Models
23
Inherently Interpretable Models
27
Bayesian Rule Lists: Generative Model
29
[Lakkaraju et. al. 2016]
32
IDS: Objective Function
33
IDS: Objective Function
34
IDS: Objective Function
35
IDS: Objective Function
36
IDS: Objective Function
37
IDS: Optimization Procedure
• The problem is a non-normal, non-monotone, submodular
optimization problem
38
Inherently Interpretable Models
40
[Ustun and Rudin, 2016]
• Recidivism
• Loan Default
41
Objective function to learn risk scores
Above turns out to be a mixed integer program, and is optimized using a cutting plane
method and a branch-and-bound technique.
42
Inherently Interpretable Models
44
Formulation and Characteristics of GAMs
fi is a shape function
45
2
GAMs and GA Ms
• While GAMs model first order terms, GA2Ms model second order
feature interactions as well.
2
GAMs and GA Ms
• Learning:
• Represent each component as a spline
• Least squares formulation; Optimization problem to balance
smoothness and empirical error
• GA2Ms: Build GAM first and and then detect and rank all possible
pairs of interactions in the residual
• Choose top k pairs
• k determined by CV
47
Inherently Interpretable Models
I h1 s1 Je
am h2 s2 suis
C
Bob h3 s3 Bob
[Bahdanau et. al. 2016; Xu et. al. 2015]
I h1 c1 s1 Je
am h2 c2 s2 suis
Bob h3 c3 s3 Bob
Attention Layers in Deep Learning Models
• Context vector corresponding to si can be written as follows:
61
What is an Explanation?
62
What is an Explanation?
Definition: Interpretable description of the model behavior
User
Classifier
63
[ Lipton 2016 ]
What is an Explanation?
Definition: Interpretable description of the model behavior
...
64
Local Explanations vs. Global Explanations
Help unearth biases in the local Help shed light on big picture biases
neighborhood of a given instance affecting larger subgroups
Help vet if individual predictions are Help vet if the model, at a high level, is
being made for the right reasons suitable for deployment
65
Approaches for Post hoc Explainability
Local Explanations Global Explanations
• Feature Importances • Collection of Local Explanations
• Rule Based • Representation Based
• Saliency Maps • Model Distillation
• Prototypes/Example Based • Summaries of Counterfactuals
• Counterfactuals
66
Approaches for Post hoc Explainability
Local Explanations Global Explanations
• Feature Importances • Collection of Local Explanations
• Rule Based • Representation Based
• Saliency Maps • Model Distillation
• Prototypes/Example Based • Summaries of Counterfactuals
• Counterfactuals
67
LIME: Local Interpretable Model-Agnostic
Explanations
69
LIME: Local Interpretable Model-Agnostic
Explanations
70
LIME: Local Interpretable Model-Agnostic
Explanations
71
LIME: Local Interpretable Model-Agnostic
Explanations
72
[ Ribeiro et al. 2016 ]
Only 1 mistake!
73
[ Ribeiro et al. 2016 ]
O P(y) = 0.9
xi M(xi, O) = 0.1
76
[ Ribeiro et al. 2018 ]
Anchors
Salary Prediction
Approaches for Post hoc Explainability
Local Explanations Global Explanations
• Feature Importances • Collection of Local Explanations
• Rule Based • Representation Based
• Saliency Maps • Model Distillation
• Prototypes/Example Based • Summaries of Counterfactuals
• Counterfactuals
79
Saliency Map Overview
Input Model Predictions
Junco Bird
80
Saliency Map Overview
Input Model Predictions
Junco Bird
What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’?
81
Saliency Map Overview
Input Model Predictions
Junco Bird
What parts of the input are most relevant for the model’s prediction: ‘Junco Bird’?
82
Modern DNN Setting
Input Model Predictions
Junco Bird
Model
83
Input-Gradient
Input Model Predictions
Junco Bird
Input-Gradient
Same dimension as
Logit the input.
Input 84
Baehrens et. al. 2010; Simonyan et. al. 2014 .
Input-Gradient
Input Model Predictions
Junco Bird
Input-Gradient
Visualize as a heatmap
Logit
Input
Junco Bird
Input-Gradient
Challenges
● Visually noisy & difficult to
interpret.
● ‘Gradient saturation.’
Logit
Shrikumar et. al. 2017.
Input
Junco Bird
SmoothGrad
Average Input-gradient of
‘noisy’ inputs.
Gaussian noise
Junco Bird
SmoothGrad
Average Input-gradient of
‘noisy’ inputs.
Gaussian noise
Junco Bird
Baseline input
Junco Bird
Baseline input
Junco Bird
Input
Input gradient
Junco Bird
Gradient-Input
Element-wise product of
input-gradient and input.
Input
logit gradient
93
Prototypes/Example Based Post hoc Explanations
94
Training Point Ranking via Influence Functions
Input Model Predictions
Junco Bird
Which training data points have the most ‘influence’ on the test loss?
95
Training Point Ranking via Influence Functions
Input Model Predictions
Junco Bird
Which training data points have the most ‘influence’ on the test loss?
96
Training Point Ranking via Influence Functions
Influence Function: classic tool used in robust statistics for assessing
the effect of a sample on regression parameters (Cook & Weisberg, 1980).
Instead of refitting model for every data point, Cook’s distance provides analytical
alternative.
97
Training Point Ranking via Influence Functions
Koh & Liang (2017) extend the ‘Cook’s distance’ insight to modern machine learning setting.
Applications:
102
Challenges and Other Approaches
Influence function Challenges:
1. scalability: computing hessian-vector products can be tedious in
practice.
Alternatives:
104
Activation Maximization
These approaches identify examples, synthetic or natural, that
strongly activate a function (neuron) of interest.
Implementation Flavors:
107
Counterfactual Explanations
What features need to be changed and by how much to flip a model’s prediction?
Counterfactual Explanations
What features need to be changed and by
how much to flip a model’s prediction ?
(i.e., to reverse an unfavorable outcome).
109
Counterfactual Explanations
Predictive Applicant
Model
Loan Application
f(x)
Deny Loan
rse
c ou
Re
Counterfactual Generation
Algorithm
Recourse: Increase your salary by 5K & pay your credit card bills on time for next 3 months
110
Generating Counterfactual Explanations:
Intuition
111
[ Wachter et. al., 2018 ]
Original Instance
113
Take 1: Minimum Distance Counterfactuals
114
[ Ustun et. al., 2019 ]
• Ustun et. al. only consider the case where the model is a linear classifier
• Objective formulated as an IP and optimized using CPLEX
117
[ Ustun et. al., 2019 ]
118
[ Mahajan et. al., 2019, Karimi et. al. 2020 ]
After 1 year
f(x)
120
[ Verma et. al., 2020, Pawelczyk et. al., 2020]
121
Approaches for Post hoc Explainability
Local Explanations Global Explanations
• Feature Importances • Collection of Local Explanations
• Rule Based • Representation Based
• Saliency Maps • Model Distillation
• Prototypes/Example Based • Summaries of Counterfactuals
• Counterfactuals
122
Global Explanations
● Help detect big picture model biases persistent across larger subgroups
of the population
○ Impractical to manually inspect local explanations of several instances to
ascertain big picture biases!
Help unearth biases in the local Help shed light on big picture biases
neighborhood of a given instance affecting larger subgroups
Help vet if individual predictions are Help vet if the model, at a high level, is
being made for the right reasons suitable for deployment
124
Approaches for Post hoc Explainability
Local Explanations Global Explanations
• Feature Importances • Collection of Local Explanations
• Rule Based • Representation Based
• Saliency Maps • Model Distillation
• Prototypes/Example Based • Summaries of Counterfactuals
• Counterfactuals
125
Global Explanation as a Collection of Local Explanations
Representative Diverse
Should summarize the Should not be redundant in
model’s global behavior their descriptions
Single explanation
SP-LIME uses submodular optimization
and greedily picks k explanations Model Agnostic
83
[ Ribeiro et al. 2018 ]
84
Approaches for Post hoc Explainability
Local Explanations Global Explanations
• Feature Importances • Collection of Local Explanations
• Rule Based • Representation Based
• Saliency Maps • Model Distillation
• Prototypes/Example Based • Summaries of Counterfactuals
• Counterfactuals
129
Representation Based Approaches
• Derive model understanding by analyzing intermediate representations of a DNN.
130
Representation Based Explanations
Zebra
(0.97)
Random examples
134
Model Distillation for Generating Global Explanations
.
v1, v2
.
.
v11, v12
.
f(x) Data
Simpler, interpretable model
Explainer which is optimized to mimic
the model predictions
Label 1
Predictive Label 1
Model .
.
.
Label 2
Model
Predictions
135
[Tan et. al., 2019]
.
v1, v2
.
.
v11, v12
.
Model Agnostic
Data
Explainer
Label 1
Black Box Label 1
Model .
.
.
Label 2
Model
Predictions
136
[Tan et. al., 2019]
137
[Tan et. al., 2019]
138
[Tan et. al., 2019]
ŷ =
Fit this model to the predictions of the black box to obtain the shape functions.
139
[ Bastani et. al., 2019 ]
Explainer
Label 1
Black Box Label 1
Model .
.
.
Label 2
Model
Predictions
140
[ Lakkaraju et. al., 2019 ]
.
v1, v2
.
.
v11, v12
.
Model Agnostic
Data
Explainer
Label 1
Black Box Label 1
Model .
.
.
Label 2
Model
Predictions
141
[ Lakkaraju et. al., 2019 ]
Decision Logic
142
[ Lakkaraju et. al., 2019 ]
143
[ Lakkaraju et. al., 2019 ]
Unambiguity Unambiguity
No contradicting explanations Minimize the number of duplicate rules
applicable to each instance
Simplicity Simplicity
Users should be able to look at the explanation Minimize the number of conditions in rules;
and reason about model behavior Constraints on number of rules & subgroups;
Customizability Customizability
Users should be able to understand model Outer rules should only comprise of features
behavior across various subgroups of of user interest (candidate set restricted)
interest
144
[ Lakkaraju et. al., 2019 ]
145
Approaches for Post hoc Explainability
Local Explanations Global Explanations
• Feature Importances • Collection of Local Explanations
• Rule Based • Representation Based
• Saliency Maps • Model Distillation
• Prototypes/Example Based • Summaries of Counterfactuals
• Counterfactuals
146
[ Rawal and Lakkaraju, 2020 ]
Counterfactual Explanations
Predictive
Model
DENIED
f(x) LOANS
RECOURSES Decision Maker
(or) Regulatory Authority
147
[ Rawal and Lakkaraju, 2020 ]
DENIED
f(x) LOANS
148
[ Rawal and Lakkaraju, 2020 ]
Recourse Rules
149
[ Rawal and Lakkaraju, 2020 ]
Customizability Customizability
Stakeholders should be able to understand model behavior Outer rules should only comprise of features of stakeholder interest
across various subgroups of interest (candidate set restricted)
150
[ Rawal and Lakkaraju, 2020 ]
151
Breakout Groups
• What concepts/ideas/approaches from our morning discussion stood out to you ?
153
Evaluating Model
Interpretations/Explanations
Evaluating Interpretability
155
Evaluating Interpretability
• Functionally-grounded evaluation: Quantitative metrics – e.g.,
number of rules, prototypes --> lower is better!
160
Interface for Objective Questions
161
Interface for Descriptive Questions
162
User Study Results
Task Metrics Our Bayesian
Approach Decision
Lists
Descriptive Human 0.81 0.17
Accuracy
Avg. Time 113.4 396.86
Spent (secs.)
Avg. # of 31.11 120.57
Words
Objective Human 0.97 0.82
Accuracy
Avg. Time 28.18 36.34
Spent (secs.)
N o ! !
• Would alternative attention weights yield different predictions?
[Agarwal et. al., 2022]
166
Evaluating Faithfulness of Post hoc
Explanations – Ground Truth
Spearman rank correlation coefficient computed over features of interest
167
Evaluating Faithfulness of Post hoc
Explanations – Explanations as Models
168
Evaluating Faithfulness of Post hoc
Explanations
Prediction Probability
% of Pixels deleted
170
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels deleted
171
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels deleted
172
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels deleted
173
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels deleted
174
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels deleted
175
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels inserted
176
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels inserted
177
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels inserted
178
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels inserted
179
[ Qi, Khorram, Fuxin, 2020 ]
Prediction Probability
% of Pixels inserted
180
[Alvarez-Melis, 2018; Agarwal et. al., 2022]
max
Input
[Agarwal et. al., 2022]
184
[ Ribeiro et al. 2018, Hase and Bansal 2020 ]
Data
Show to user
Predictions &
Explanations
Classifier
185
[ Poursabzi-Sangdeh et al. 2018 ]
186
[ Lai and Tan, 2019 ]
Human-AI Collaboration
• Are Explanations Useful for Making Decisions?
• For tasks where the algorithms are not reliable by themselves
187
[ Lai and Tan, 2019 ]
Human-AI Collaboration
• Deception Detection: Identify fake reviews online
• Are Humans better detectors with explanations?
https://machineintheloop.com/deception/ 188
Can we improve the accuracy of decisions
using feature attribution-based explanations?
+ +
Important Features
ESR
Family Risk
Chronic Health
Conditions
At Risk (0.91) At Risk (0.91)
+ +
Important Features
Appointment time
Appointment day
Zip code
Doctor ID > 150
At Risk (0.91) At Risk (0.91)
194
Empirically Analyzing
Interpretations/Explanations
•Lot of recent focus on analyzing the behavior of post hoc explanation methods.
Gradient ⊙ Input
Guided Backprop
Guided GradCAM
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018. 196
Limitations: Faithfulness
Gradient ⊙ Input
Guided Backprop
Guided GradCAM
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018. 197
Limitations: Faithfulness
No !!
Gradient ⊙ Input
Guided Backprop
Guided GradCAM
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018. 198
Limitations: Faithfulness
Adebayo, Julius, et al. "Sanity checks for saliency maps." NeurIPS, 2018. 199
Limitations: Stability
Are post-hoc explanations unstable wrt small
non-adversarial input perturbation?
Input
hyperparameters
Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018. 200
Limitations: Stability
Are post-hoc explanations unstable wrt small
non-adversarial input perturbation?
Alvarez-Melis, David et al. "On the robustness of interpretability methods." WHI, ICML, 2018. 201
[Slack et. al., 2020]
When you repeatedly run LIME on the same instance, you get different explanations (blue region)
202
Post-hoc Explanations are Fragile
Post-hoc explanations can be easily manipulated.
213
Building Adversarial Classifiers
• Input: Adversary provides us with the biased classifier f, an
input dataset X sampled from real world input distribution Xdist
214
Building Adversarial Classifiers
• Adversarial classifier e can be defined as:
215
Limitations: Stability
Post-hoc explanations can be unstable to small, non-adversarial,
perturbations to the input.
Input
218
Sensitivity to Hyperparameters
222
[Kaur et. al., 2020]
223
Utility: Explanations for Debugging
In a housing price prediction task, Amazon mechanical turkers are
unable to use linear model coefficients to diagnose model mistakes.
227
Conflicting Evidence on Utility of Explanations
● Mixed evidence:
● simulation and benchmark studies show that
explanations are useful for debugging;
● however, recent user studies show limited utility in
practice.
• Characterizing disagreement:
• Top features are different
• Ordering among top features is different
• Direction of top feature contributions is different
• Relative ordering of features of interest is different
230
How do Practitioners Resolve Disagreements?
231
How do Practitioners Resolve Disagreements?
232
Empirical Analysis: Summary
● Faithfulness/Fidelity
■ Some explanation methods do not ‘reflect’ the underlying model.
● Fragility
■ Post-hoc explanations can be easily manipulated.
● Stability
■ Slight changes to inputs can cause large changes in explanations.
● Useful in practice?
233
Theoretically Analyzing Interpretable
Models
▪ Two main classes of theoretical results:
• Local error of surrogate model is bounded away from zero with high probability
235
[Agarwal et. al., 2020]
• At expectation, the resulting explanations are provably robust according to the notion
of Lipschitz continuity
• Finite sample complexity bounds for the number of perturbed samples required for
SmoothGrad and C-LIME to converge to their expected output
236
[Han et. al., 2022]
• But…
237
[Han et. al., 2022]
238
Function Approximation Perspective to
Characterizing Post hoc Explanation Methods
239
Agenda
❑ Inherently Interpretable Models
240
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
242
Alvarez-Melis, 2018
243
Slack et. al., 2020
244
Challenges with LIME: Consistency
245
Challenges with LIME: Scalability
• Querying complex models (e.g., Inception Network, ResNet,
AlexNet) repeatedly for labels can be computationally
prohibitive
246
Explanations with Guarantees: BayesLIME
and BayesSHAP
• Intuition: Instead of point estimates of feature importances,
define these as distributions
247
BayesLIME and BayesSHAP
• Construct a Bayesian locally weighted regression that can
accommodate LIME/SHAP weighting functions
Black Box
Feature Perturbations
Predictions
Importances
Weighting
Function
249
Estimating the Required Number of Perturbations
250
Improving Efficiency: Focused Sampling
• Instead of sampling perturbations randomly and querying the
black box, choose points the learning algorithm is most uncertain
about and only query their labels from the black box.
251
Other Questions
252
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
• Can we characterize the conditions under which each post hoc explanation method
(un)successfully captures the behavior of the underlying model?
• Given the properties of the underlying model, data distribution, can we theoretically
determine which explanation method should be employed?
254
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
• There is even less work on the empirical analysis of the correctness/utility of the
interpretations generated by inherently interpretable models. For instance, are the
prototypes generated by adding prototype layers correct/meaningful? Can they be
leveraged in any real world applications? What about attention weights?
256
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
258
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
260
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
• Are there any inherent trade-offs between (certain kinds of) model interpretability
and model robustness? Or do these aspects reinforce each other?
262
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
• Conducting more empirical evaluations and user studies to determine how interpetations and
explanations can complement statistical notions of fairness in identifying racial/gender biases
• How does the fairness (statistical) of inherently interpretable models compare with that of vanilla
models? Are there any inherent trade-offs between (certain kinds of) model interpretability and
model fairness? Or do these aspects reinforce each other?
264
Future of Model Understanding
Methods for More Reliable Model Understanding
Post hoc Explanations Beyond Classification
268
Some Parting Thoughts..
● There has been renewed interest in model understanding over the past
half decade, thanks to ML models being deployed in healthcare and other
high-stakes settings
● You can approach the field of XAI from diverse perspectives: theory,
algorithms, HCI, or interdisciplinary research – there is room for
everyone! ☺
269
Thank You!
• Acknowledgements: Special thanks to Julius Adebayo, Chirag Agarwal, Shalmali Joshi, and Sameer
Singh for co-developing and co-presenting sub-parts of this tutorial at NeurIPS, AAAI, and FAccT
conferences.
270