Professional Documents
Culture Documents
29382-Article Text-33436-1-2-20240324
29382-Article Text-33436-1-2-20240324
14651
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
independently quantify uncertainties for the predicted expla- attacks. We also perform the theoretical analysis for the fi-
nations and their corresponding final predictions. Neverthe- nal prediction sets. The extensive experimental results verify
less, we argue that within self-explaining neural networks, the effectiveness of our proposed method.
the high-level interpretation space is informative due to its
powerful inductive bias in generating high-level basis con- Problem Formulation
cepts for interpretable representations. Imagine a scenario Let us consider a general multi-class classification prob-
in which we engage in the binary classification task distin- lem with a total of M > 2 possible labels. For an input
guishing between zebras and elephants, and we hold a strong x ∈ Rd , its true label is represented as y ∈ R. Note that
95% confidence in identifying the presence of the “Stripes” self-explaining networks generate high-level concepts in the
concept within a test image sample. In this context, we can interpretation layer prior to the output layer. In Definition 1,
assert that the model exhibits high confidence in assigning we give a general definition of a self-explaining model (Al-
the “Zebra” label to this particular test sample. This also varez Melis and Jaakkola 2018; Elbaghdadi et al. 2020).
aligns with existing class-wise explanations (Li et al. 2021;
Zhao et al. 2021) that focus on uncovering the informative Definition 1 (Self-explaining Models). Let x ∈ X ⊂ Rd
patterns that influence the model’s decision-making process and Y ⊂ RM be the input and output spaces. We say that
when classifying instances into a particular class category. f : X → Y is a self-explaining prediction model if it has the
Our goal in this paper is to offer distribution-free uncer- following form
tainty quantification for self-explaining networks via two in- f (x) = g(θ1 (x)h1 (x), · · · , θC (x)hC (x)), (1)
dispensable and interconnected aspects: establishing a reli-
able uncertainty measure for the generated explanations and where the prediction head g depends on its arguments, the
effectively transferring the explanation prediction sets to the set {(θc (x), hc (x))}C
c=1 consists of basis concepts and their
associated predicted class outcomes. Note that the recent influence scores, and C is small compared to d.
work of (Slack et al. 2021) focuses exclusively on post-hoc Example 1: Concept bottleneck models (Koh et al. 2020).
explanations and relies on certain assumptions regarding the Concept bottleneck models (CBMs) are interpretable neu-
properties of these explanations, such as assuming a Gaus- ral networks that first predict labels for human-interpretable
sian distribution of explanation errors. On the other hand, concepts relevant to the prediction task, and then predict the
(Kim et al. 2023) mainly presents probabilities and uncer- final label based on the concept label predictions. We denote
tainties for individual concepts or classes, and does not pro- raw input features by x, the labels by y, and the pre-defined
vide the prediction set with uncertainty, neither for the con- true concepts by c ∈ RC . Given the observed training sam-
cept layer nor for the prediction layer. Therefore, in addition tra
ples Dtra = {(xi , ci , yi )}Ni=1 , a CBM (Koh et al. 2020)
to providing distribution-free uncertainty quantification for
can be trained using the concept and class prediction losses
the generated basis explanations, it is also essential to study
via two distinct approaches: independent and joint.
how to transfer the constructed prediction sets in the inter-
Example 2: Prototype-based self-explaining networks
pretation space to the final prediction decision space.
(Alvarez Melis and Jaakkola 2018). Different from CBMs
To address the above challenges, in this paper, we design (Koh et al. 2020), prototype-based self-explaining networks
a novel uncertainty modeling framework for self-explaining usually use an autoencoder to learn a set of prototype-
neural networks (unSENN), which can make valid and reli- based concepts directly from the training data (i.e., Dtra =
able predictions or estimates with quantifiable uncertainty, tra
14652
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
that concept bottleneck models incorporate pre-defined con- following non-conformity measure
cepts into a supervised learning procedure. We denote the
K
available dataset as D = {(xi , ci , yi )}N i=1 , where xi ∈ R ,
d X
C
ci = [ci,1 , · · · , ci,C ] ∈ R and yi ∈ [M ]. These samples s(xi , h∗ (xi ), h̃(xi )) = (1 − 1/(1 + exp(−h[j] (xi ))))
j=1
are drawn exchangeably from some unknown distribution
PXCY . To train a self-explaining network, we first split the C
X K
X
available dataset D = {(xi , ci , yi )}N i=1 into a training set
+ λ2 ∗ (1/(1 + exp(−h[j] (xi )))) = |h∗[j] (xi )
Dtra and a calibration set Dcal , where Dtra ∩ Dcal = ∅ j=K+1 j=1
and Dtra ∪ Dcal = D. Given the available training set Dtra C
X
associated with the supervised concepts, we can train a self- − h̃[j] (xi )| + λ2 ∗ |h∗[j] (xi ) − h̃[j] (xi )|, (3)
explaining network f = g ◦ h, where h : Rd → RC maps j=K+1
a raw sample x into the concept space and g : RC → RM
maps concepts into a final class prediction. where λ2 is a pre-defined trade-off parameter, and [j] corre-
Given the underlying well-trained self-explaining model sponds to the concept index of the top-j concept prediction
f = g ◦ h, our goal is to provide validity guarantees on score of h̃(xi ). The above non-conformity measure returns
its explanation and final prediction outcomes in a post-hoc small values when the concept prediction outputs of the true
way. Note that each xi , its true relevant concepts are repre- relevant concepts are high and the concept outputs of the
sented by a concept vector ci = [ci,1 , · · · , ci,j , · · · , ci,C ] ∈ non-relevant concepts are low. Note that smaller conformal
{0, 1}C , with ci,j = 1 indicating that xi is associ- scores correspond to more model confidence (Angelopoulos
ated with the j-th concept. We also use Ccpt (xi ) = and Bates 2021; Teng, Tan, and Yuan 2021). In particular,
{j|ci,j = 1} to represent the set of true concepts for xi . when λ2 = 0, we only consider the top-ranked concepts.
For xi , we can use h : Rd → RC to obtain h(xi ) = Based on the above constructed non-conformity measure,
[h1 (xi ), · · · , hj (xi ), · · · , hC (xi )], where hj (xi ) is the pre- for each calibration sample (xi , ci , yi ) ∈ Dcal , we apply the
diction score of xi with regards to the j-th concept. We de- non-conformity measure s(xi , h∗ (xi ), h̃(xi )) to get N cal
cal
note [h[1] (xi ), · · · , h[j] (xi ), · · · , h[C] (xi )] as the sorted val- non-conformity scores {si }Ni=1 , where N
cal
is the number
ues of h(xi ) in the descending order, i.e., h[1] (xi ) is the of calibration samples. Then, we select a miscoverage rate
cal
largest (top-1) score, h[2] (xi ) is the second largest (top- ε ∈ (0, 1) and compute Q1−ε as the ⌈(N N+1)(1−ε)⌉ cal quan-
2) score, and so on. Furthermore, [j] corresponds to the N cal
tile of the non-conformity scores {si }i=1 , where ⌈·⌉ is the
concept index of the top-j concept prediction score, i.e.,
ceiling function. Formally, Q1−ε is defined as
j ′ = [j] if hj ′ (xi ) = h[j] (xi ). In practice, researchers are
usually interested in identifying the top-K important con- |{i : s(xi , h∗ (xi ), h̃(xi )) ≤ Q}|
cepts to help users understand the key factors driving the Q1−ε = inf{Q : (4)
model’s predictions (Agarwal et al. 2022; Brunet, Ander- N cal
son, and Zemel 2022; Yeh et al. 2020; Huai et al. 2022; ⌈(N cal + 1)(1 − ε)⌉
≥ }.
Rajagopal et al. 2021; Hitzler and Sarker 2022). Here, we N cal
can easily convert a general concept predictor h into a top- Based on the above, given Q1−ε and a desired coverage level
K concept classifier by returning the set of concepts cor- 1 − ε ∈ (0, 1), for a test sample xtest (where xtest is known
responding to the set of top-K prediction scores from h. but the true top-K concepts C are not), we can get the expla-
For xi , the top-K concept prediction function returns the nation prediction set Γεcpt (xtest ) for xtest with 1 − ε cover-
set C˜cpt (xi ) = {[1], · · · , [K]} for 1 ≤ K < C. Here, we as- age guarantee. Algorithm 1 gives the procedure to calculate
sume the concept prediction scores are calibrated, i.e., taking the concept prediction sets.
values in the range of [0, 1]. For Ψ ∈ R, we can use simple
transforms such as σ(Ψ) = 1/(1 + e−Ψ ) to map it to the Definition 2 (Data Exchangeability). Consider variables
range of [0, 1] without changing their ranking. Then, for xi , z1 , · · · , zN . The variables are exchangeable if for ev-
we obtain the below calibrated concept importance scores ery permutation τ of the integers 1, · · · , N , the variables
w1 , · · · , wN where wi = zτ (i) , have the same joint proba-
h̃(xi ) = [1/(1 + exp(−h1 (xi ))), · · · , 1/(1 (2) bility distribution as z1 , · · · , zN .
+ exp(−hC (xi )))] = [σ(h1 (xi )), · · · , σ(hC (xi ))], Theorem 1. Suppose that the calibration samples (Dcal =
cal
{(xi , ci , yi )}N
i=1 ) and the given test sample x
test
are ex-
where σ(hj (xi )) ∈ [0, 1] is the calibrated impor- changeable. Then, if we calculate the quantile value Q1−ε
tance score of xi with regards to the j-th concept. and construct Γεcpt (xtest ) as indicated above, for the above
Not that for xi , it is associated with a concept subset non-conformity score function s and any ε ∈ (0, 1), the de-
Ccpt (xi ), which can be represented by a vector h∗ (xi ) = rived Γεcpt (xtest ) satisfies
[h∗1 (xi ), · · · , h∗j (xi ), · · · , h∗C (xi )], where h∗j (xi ) = 1 if and
only if concept j is associated with xi , and 0 otherwise. P (Ccpt (xtest ) ∈ Γεcpt (xtest )) ≥ 1 − ε, (5)
In order to quantify how “strange” the generated concept
explanations are for a given sample xi , we utilize the under- where Ccpt (xtest ) is the true relevant concept set for the
lying well-trained self-explaining model f to construct the given test sample xtest , and ε is a desired miscoverage rate.
14653
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Algorithm 1: Uncertainty quantification for self-explaining (Sesia and Favaro 2022). For a random variable v ∈ RC , we
neural networks denote [v[1] , v[2] , · · · , v[C] ] as the sorted values of v in the
1: Input: Calibration data D cal , pre-trained model f = descending order, i.e., v[1] is the largest (top-1) score, v[2]
g ◦ h, non-conformity measure s(·), a test sample xtest , is the second largest (top-2) score, and so on. Then, we can
and a desired miscoverage rate ε ∈ (0, 1). construct the label prediction set for the test sample xtest as
2: Output: The concept prediction set Γεcpt (xtest ) and la- K
X
bel set Γεlab (xtest ) for sample xtest . Γεlab (xtest ) = {y = g(v) : |h∗[j] (xtest ) − h̃[j] (v)|
3: Calculate the non-conformity scores on the calibration j=1
samples (i.e., Dcal ) based on Eqn. (3) C
Compute the quantile value Q1−ε based on Eqn. (4)
X
4: + λ2 ∗ |h∗[j] (xtest ) − h̃[j] (v)| ≤ Q1−ε }, (7)
5: Construct the concept prediction set Γεcpt (xtest ) for j=K+1
xtest with 1 − ε coverage guarantee
6: Initialize Γεlab (xtest ) = ∅ where h̃(v) = σ(v). In the above, since the true concepts
7: for m = 1, 2, · · · , M do are unknown for xtest , we instead calculate h∗[j] (xtest ) =
8: P δ by optimizing the loss in Eqn. (10)
Obtain h̃[j] (xtest ) = σ(h[j] (xtest )). Without loss of generality, in
9: if m′ ∈[M ]\m I[gm (v) > gm′ (v)] == M − 1 then the following, we take a setting where λ2 = 1 as an example
10: Update Γεlab (xtest ) = Γεlab (xtest ) ∪ {m} to discuss how to transfer the confidence prediction sets in
11: end if the concept space to the final class label prediction layer,
12: end for and the extension to other cases can be easily derived. In this
13: Return: Γεcpt (xtest ) and Γεlab (xtest ). case, we calculate the set for xtest as Γεlab (xtest ) = {y =
g(v) : ||h̃(v) − h∗ (xtest )||1 ≤ Q1−ε }, where v ∈ RC ,
v̂ = σ(v) = [σ(v1 ), · · · , σ(vC )], g is the prediction head,
Proof. Let s1 , s2 , · · · , sN cal , stest denote the non-formality h̃(xtest ) = [σ(h1 (xtest )), · · · , σ(hC (xtest ))] and Q1−ε is
scores of the calibration samples and the given test samples. derived based on the calibration set. To derive the label sets,
To avoid handling ties, we consider the case where these we propose to solve the following optimization
non-formality scores are distinct with probability 1. With- X
out loss of generality, we assume these calibration scores max I[gm (v) > gm′ (v)] (8)
v∈RC
are sorted so that s1 < s2 < · · · < sN cal . In this case, m′ ∈[M ]\m
we have that Q1−ε = s⌈(N cal +1)(1−ε)⌉ when ε ≥ N cal1 +1 s.t., ||h̃(v) − h∗ (xtest )||1 ≤ Q1−ε ,
and Q1−ε = ∞ otherwise. Note that in the case where
Q1−ε = ∞, the coverage property is trivially satisfied; thus, where the quantile value Q1−ε is derived based on Eqn. (4).
we only need to consider the case when ε ≥ N cal1 +1 . We In the above, we want to find a v, whose predicted label is
begin by observing the below equality of the two events “m”, under the constraint that ||h̃(v)−h∗ (xtest )||1 ≤ Q1−ε .
If we can successfully find such a v, we will include the label
{Ccpt (xtest ) ∈ Γεcpt (xtest )} = {stest ≤ Q1−ε }. (6) “m” in the label set; otherwise, we will exclude it. Notably,
By combining this with the definition of Q1−ε , we have we need to iteratively solve the above optimization over all
possible labels to construct the final prediction set.
{Ccpt (xtest ) ∈ Γεcpt (xtest )} = {stest ≤ s⌈(N cal +1)(1−ε)⌉ }. However, it is difficult to directly optimize the above
problem, due to the non-differentiability of the discrete
Since the variables s1 , · · · , stest are exchangeable, we can
components in the above equation. Below, we use δ =
have that P (stest ≤ sk ) = k/(N cal + 1) holds true for
[δ1 , · · · , δC ] to denote the difference between v̂ and
any integer k. This means that stest is equally likely to fall
in anywhere between the calibration scores s1 , · · · , sN cal . h̃(xtest ), i.e., δ = v̂ − h̃(xtest ), where v̂ = σ(v) =
Notably, the randomness is over all variables. Building on [σ(v1 ), · · · , σ(vC )], and h̃(xtest ) = σ(h(xtest )). Based on
this, we can derive that P (stest ≤ s⌈(N cal +1)(1−ε)⌉ ) = this, we can have
⌈(N cal +1)(1−ε)⌉
(N cal +1)
≥ 1−ε, which yields the desired result. v = [v1 = − log(1/(h̃1 (xtest ) + δ1 ) − 1), · · · , (9)
test
In the above theorem, we show that the proposed algo- vC = − log(1/(h̃C (x ) + δC ) − 1)].
rithm can output predictive concept sets that are rigorously To address the above challenge, we propose to solve the be-
guaranteed to satisfy the desired coverage property shown in low reformulated optimization
Eqn. (5), no matter what (possibly incorrect) self-explaining
model is used or what the (unknown) distribution of the data min max( max
′
gm′ (v) − gm (v), −β), (10)
||δ||1 ≤Q1−ε m ̸=m
is. The above theorem relies on the assumption that the data
points are exchangeable (see Definition 2), which is widely where m′ ∈ [M ], β is a pre-defined value, v is derived based
adopted by existing conformal inference works (Cherubin, on Eqn. (9), and g : RC → RM maps v into a final class
Chatzikokolakis, and Jaggi 2021; Tibshirani et al. 2019). prediction. g(v) represents the logit output of the underly-
The data exchangeability assumption is much weaker than ing self-explaining model for each class when the input is v.
i.i.d. Note that i.i.d variables are automatically exchangeable The above adversarial loss enforces the actual prediction of
14654
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
P {Vn+1 ≤ Quantile(γ; V1:n ∪ {∞})} ≥ γ, (11) (a) MNIST (b) CIFAR-100 Super-class
where Quantile(γ; V1:n ∪ {∞}) denotes the level γ quantile Figure 1: Data efficiency for concept bottleneck models with
of V1:n ∪ {∞}. In the above, V1:n = {V1 , · · · , Vn }. different learning strategies.
Theorem 3. Suppose the calibration samples and the test
sample are exchangeable, if we construct Γεlab (xtest ) as in-
dicated above, the following inequality holds for any ε ∈ from the trained similarity vector to find a surrogate vector
(0, 1) around it. Additionally, to obtain the final prediction sets, we
follow Eqn. (10) to determine whether a specific class label
P (y test ∈ Γεlab (xtest )) ≥ 1 − ε, (12) should be included in the final prediction sets.
where y test is the true label of xtest , and Γεlab (xtest ) is the
label prediction set derived based on Eqn. (7). Experiments
Due to space constraints, the complete proof for Theorem In this section, we conduct experiments to evaluate the
3 is postponed to the full version of this paper. This proof is performance of the proposed method (i.e., unSENN). Due
based on Lemma 2. to space limitations, more experimental details and results
Discussions on prototype-based self-explaining neu- (e.g., experiments on more self-explaining models and run-
ral networks. Here, we will discuss how to generalize our ning time) can be found in the full version of this paper.
above method to the prototype-based self-explaining net-
works, which do not depend on the pre-defined expert con- Experimental Setup
cepts (Alvarez Melis and Jaakkola 2018; Li et al. 2018).
Real-world datasets. In experiments, we adopt the fol-
They usually adopt the autoencoder network to learn a
lowing real-world datasets: CIFAR-100 Super-class (Fis-
lower-dimension latent representation of the data with an
cher et al. 2019) and MNIST (Deng 2012). Specifically,
encoder network, and then use several network layers over
the CIFAR-100 Super-class dataset comprises 100 diverse
the latent space to learn C prototype-based concepts, i.e.,
classes of images, which are further categorized into 20
{pj ∈ Rq }C j=1 . After that, for each pj , the similarity layer super-classes. For example, the five classes baby, boy, girl,
computes its distance from the learned latent representation
man and woman belong to the super-class people. In this
(i.e., eenc (x)) as hj (x) = ||eenc (x) − pj ||22 . The smaller the
case, we exploit each image class as a concept-based ex-
distance value is, the more similar eenc (x) and the j-th pro-
planation for the super-class prediction (Hong and Pavlic
totype (pj ) are. Finally, we can output a probability distribu-
2023). The MNIST dataset consists of a collection of
tion over the M classes. Nonetheless, we have no access to
grayscale images of handwritten digits (0-9). In experi-
the true basis explanations if we want to conduct conformal
ments, we consider the MNIST range classification task,
predictions at the prototype-based interpretation level. To
wherein the goal is to classify handwritten digits into five
address this, for prototype-based self-explaining networks,
distinct classes based on their numerical ranges. Specifically,
we construct the following non-conformity measure
the digits are divided into five non-overlapping ranges (i.e.,
K
X [0, 1], [2, 3], [4, 5], [6, 7], [8, 9]), each assigned a unique label
s(xi , yi , f ) = inf [ |h[j] (xi ) − v[j] | (the label set is denoted as {0, 1, 2, 3, 4}). For example, an
v∈{v:ℓ(g(v),yi )≤α}
j=1 input image depicting a handwritten digit “4” is classified as
C class 2 within the range of digits 4 to 5. Here, we exploit the
identity of each digit as a concept-based explanation for the
X
+ λ2 ∗ |h[j] (xi ) − v[j] |], (13)
j=K+1
ranges prediction (Rigotti et al. 2021).
Baselines. In experiments, we adopt two baselines: Naive
where ℓ(g(v), yi ) is the classification cross entropy loss for baseline and Vanilla Conformal Predictors. These baselines
v, and α is a predefined small threshold value to make the exclusively rely upon X and Y. Specifically, the Naive base-
prediction based on v to be close to label yi . Based on the line constructs the set sequentially incorporating classes in
above non-conformity measure, we can calculate the non- decreasing order of their probabilities until the cumula-
conformity scores for the calibration samples, and then cal- tive probability surpasses the predefined threshold 1 − ε.
culate the desired quantile value. Based on this, for the given Notably, this Naive approach lacks a theoretical guaran-
test sample, following Eqn. (8) and (13), we can calculate tee of its coverage. The Vanilla Conformal Predictors uti-
the final prediction set. However, it is usually complicated lize the final decisions to construct the prediction sets. Fol-
to calculate the above score due to the infimum operator. To lowing (Cherubin, Chatzikokolakis, and Jaggi 2021), we
overcome this challenge, we apply gradient descent starting set the non-conformity measure for this Vanilla baseline as
14655
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Concept set: Concept set: Concept set: Concept set: Concept set: Concept set:
1 4 0 lamp dolphin bottle
7 5 whale can
6 cup
Concept set: Concept set: Concept set: Concept set: Concept set: Concept set:
9 4 1 mountain orchid apple
9 2 tulip orange
3 pear
7 sweet_pepper
Figure 2: Concept conformal sets on MNIST and CIFAR-100 Super-class. Phrases in bold and underlined mean true concepts.
Miscoverage Concept error rate Dataset Method Label coverage Average set size
Dataset
rate ε
unSENN Naive unSENN 0.988 ± 0.002 1.00 ± 0.00
MNIST Naive 0.985 ± 0.002 1.01 ± 0.00
0.05 0.047 ± 0.001 0.003 ± 0.000
Vanilla 0.903 ± 0.001 0.90 ± 0.00
0.1 0.094 ± 0.002 0.003 ± 0.000
MNIST
0.15 0.139 ± 0.006 0.004 ± 0.000 unSENN 0.847 ± 0.002 1.03 ± 0.00
CIFAR-100
0.2 0.174 ± 0.011 0.005 ± 0.000 Naive 0.786 ± 0.004 1.28 ± 0.02
Super-class
Vanilla 0.805 ± 0.006 1.15 ± 0.01
0.05 0.044 ± 0.002 0.073 ± 0.001
CIFAR-100 0.1 0.091 ± 0.003 0.100 ± 0.001
Super-class 0.15 0.137 ± 0.004 0.121 ± 0.001 Table 2: Label coverage and average set size of prediction
0.2 0.177 ± 0.003 0.136 ± 0.001 conformal sets on MNIST and CIFAR-100 Super-class.
Length
2 8
6
1
s(x, y) = −fy (x), where f is a pre-trained model, and 4
fy (x) is the softmax probability for x with true label y. 0 2
unSENN Naive Vanilla unSENN Naive Vanilla
Implementation details. In experiments, we evaluate the
proposed method (i.e., unSENN) across the following un- (a) MNIST (b) CIFAR-100 Super-class
derlying models: ResNet-50 (He et al. 2016), a convolu-
tional neural network (CNN), and a multi-layer perception Figure 3: Bloxplot of prediction sets. We display the lengths
(MLP). For the calibration set, we randomly hold out 10% of of conformal sets that are not equal to 1 over all the test data.
the original available dataset to compute the non-conformity
scores. All the experiments are conducted 10 times, and we
report the mean and standard errors. not held in the prediction sets) of concept conformal sets
with different miscoverage rates. From this table, we can
Experimental Results find that our proposed method can guarantee a probability
First, we measure the data efficiency (the amount of train- of error of at most ε and fully exploits its error margin to op-
ing data needed to achieve a desired level of accuracy) of timize efficiency. In contrast, the Naive baseline simply out-
the self-explaining model. For the CIFAR-100 Super-class, puts all possible concepts using the raw softmax probabili-
we apply ResNet-50 and MLP to learn the concepts and ties and does not provide theoretical guarantees. In addition,
labels, respectively. Similarly, we use CNN and MLP for we show concept conformal sets when ε = 0.1 in Figure 2.
the MNIST. In Figure 1, we present the test accuracy of Intuitively, a blurred image tends to have a large prediction
the self-explaining model using independent and joint learn- set due to its uncertainty during testing, while a clear image
ing strategies with various data proportions. Note that the tends to have a small set. Thus, a smaller set means higher
standard model ignores the concepts and directly maps in- confidence in the prediction, and a larger set indicates higher
put features to the final prediction. As we can see, the self- uncertainty regarding the predicted concepts.
explaining models are particularly effective on the adopted Next, we investigate the effectiveness of the final predic-
datasets. The joint model exhibits comparable performance tion sets within the self-explaining model. We provide in-
to the standard model, and the joint model is slightly more sights into the label coverage (the percentage of times the
accurate in lower data regimes (e.g., 20%). Therefore, in the prediction set size is 1 and contains the true label) as well
following experiments, we adopt the joint learning strategy. as the average set size. Table 2 reports the obtained ex-
Then, we explore the effectiveness of the concept confor- perimental results when ε = 0.1. From this table, we can
mal sets within the self-explaining model. In Table 1, we see that our proposed method consistently outperforms both
report the error rate (the percentage of true concepts that are the Naive and Vanilla baselines. Specifically, our proposed
14656
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Figure 4: Prediction conformal sets on CIFAR-100 Super-class. Phrases in bold and underlined mean true labels.
60 80
0.2% 0.2%
et al. 2023; Zarlenga et al. 2023). However, they fail to
40
2.0% 60 2.0% provide distribution-free uncertainty quantification for the
Density
Density
5.0% 5.0%
10.0% 40 10.0%
two simultaneously generated prediction outcomes in self-
20
20
explaining networks. On the other hand, to tackle the uncer-
tainty issues, traditional methods (e.g., Bayesian approxima-
0 0
0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 tion) for establishing confidence in prediction models usu-
Error rate Error rate
ally make strict assumptions and have high computational
(a) MNIST (b) CIFAR-100 Super-class cost properties. In this work, we build upon conformal pre-
diction (Gibbs and Candes 2021; Chernozhukov, Wüthrich,
Figure 5: Distribution of error rate with different calibration and Yinchu 2018; Xu and Xie 2023; Tibshirani et al. 2019;
sizes (splitting percentage of the original available data). Teng et al. 2022; Martinez et al. 2023; Ndiaye 2022; Lu et al.
2022; Jaramillo and Smirnov 2021; Liu et al. 2022), which
do not require strict assumptions on the underlying data dis-
method achieves notably higher label coverage and much tribution and has been applied before to classification and re-
tighter confidence sets. A more comprehensive understand- gression problems to attain marginal coverage. However, our
ing of the conformal set characteristics is illustrated in Fig- work is different from existing conformal works in that we
ure 3, where we display the lengths of conformal sets that first model explanation prediction uncertainties in both sce-
are not equal to 1. We can find that the Vanilla baseline con- narios – with and without true explanations. We also trans-
tains empty sets on MNIST, and the Naive baseline includes fer the explanation uncertainties to the final prediction un-
much looser prediction sets on the CIFAR-100 Super-class. certainties through designing new transfer functions and as-
In contrast, our proposed method generates reliable and tight sociated optimization frameworks for constructing the final
prediction sets. Additionally, in Figure 4, we demonstrate prediction sets.
prediction conformal sets when ε = 0.1. We can observe
that all the methods correctly output the true label, while Conclusion
the prediction set of our proposed method is the tightest.
From these reported experimental results, we can conclude In this paper, we design a novel uncertainty quantification
that our proposed method effectively produces uncertainty framework for self-explaining networks, which generates
prediction sets for the final predictions, leveraging the infor- distribution-free uncertainty estimates for the two intercon-
mative high-level basis concepts. nected outcomes (i.e., the final predictions and their cor-
Further, we perform an ablation study concerning the cal- responding explanations) without requiring any additional
ibration data size. In Figure 5, we illustrate the error rate model modifications. Specifically, in our approach, we first
distribution for the concept conformal sets with a miscov- design the non-conformity measures to capture the relevant
erage rate ε = 0.1 on MNIST and CIFAR-100 Super-class, basis explanations within the interpretation layer of self-
using different calibration data sizes. It is evident that utiliz- explaining networks, considering scenarios with and without
ing a larger calibration set can enhance the stability of the true concepts. Then, based on the designed non-conformity
prediction set. Therefore, we incorporate 10% of the orig- measures, we generate explanation prediction sets whose
inal available dataset for calibration without tuning, which size reflects the prediction uncertainty. To get better la-
also works well in our experiments. bel prediction sets, we also develop novel transfer func-
tions tailored to scenarios where self-explaining networks
are trained with and without true concepts. By leveraging
Related Work the adversarial attack paradigm, we further design novel op-
Currently, many self-explaining networks have been pro- timization frameworks to construct final prediction sets from
posed (Koh et al. 2020; Havasi, Parbhoo, and Doshi-Velez the designed transfer functions. We provide the theoretical
2022; Chauhan et al. 2023; Yuksekgonul, Wang, and Zou analysis for the proposed method. Comprehensive experi-
2022; Alvarez Melis and Jaakkola 2018; Li et al. 2018; Kim ments validate the effectiveness of our method.
14657
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
14658
The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Liu, M.; Ding, L.; Yu, D.; Liu, W.; Kong, L.; and Jiang, Teng, J.; Wen, C.; Zhang, D.; Bengio, Y.; Gao, Y.; and Yuan,
B. 2022. Conformalized Fairness via Quantile Regression. Y. 2022. Predictive inference with feature conformal predic-
Advances in Neural Information Processing Systems, 35: tion. arXiv preprint arXiv:2210.00173.
11561–11572. Tibshirani, R. J.; Foygel Barber, R.; Candes, E.; and Ram-
Lu, C.; Lemay, A.; Chang, K.; Höbel, K.; and Kalpathy- das, A. 2019. Conformal prediction under covariate shift.
Cramer, J. 2022. Fair conformal predictors for applications Advances in neural information processing systems, 32.
in medical imaging. In Proceedings of the AAAI Conference Xu, C.; and Xie, Y. 2023. Sequential predictive conformal
on Artificial Intelligence, 12008–12016. inference for time series. In International Conference on
Lundberg, S. M.; and Lee, S.-I. 2017. A unified approach Machine Learning, 38707–38727. PMLR.
to interpreting model predictions. Advances in neural infor- Yao, L.; Li, Y.; Li, S.; Liu, J.; Huai, M.; Zhang, A.; and
mation processing systems, 30. Gao, J. 2022. Concept-Level Model Interpretation From the
Martinez, J. A.; Bhatt, U.; Weller, A.; and Cherubin, G. Causal Aspect. IEEE Transactions on Knowledge and Data
2023. Approximating Full Conformal Prediction at Scale Engineering.
via Influence Functions. In Proceedings of the AAAI Con- Yeh, C.-K.; Kim, B.; Arik, S.; Li, C.-L.; Pfister, T.; and
ference on Artificial Intelligence, 6631–6639. Ravikumar, P. 2020. On completeness-aware concept-based
Nam, W.-J.; Gur, S.; Choi, J.; Wolf, L.; and Lee, S.-W. 2020. explanations in deep neural networks. Advances in neural
Relative attributing propagation: Interpreting the compar- information processing systems, 33: 20554–20565.
ative contributions of individual units in deep neural net- Yuksekgonul, M.; Wang, M.; and Zou, J. 2022. Post-hoc
works. In Proceedings of the AAAI conference on artificial Concept Bottleneck Models. In The Eleventh International
intelligence, 2501–2508. Conference on Learning Representations.
Ndiaye, E. 2022. Stable conformal prediction sets. In Inter- Zarlenga, M. E.; Shams, Z.; Nelson, M. E.; Kim, B.; and
national Conference on Machine Learning, 16462–16479. Jamnik, M. 2023. TabCBM: Concept-based Interpretable
PMLR. Neural Networks for Tabular Data. Transactions on Machine
Rajagopal, D.; Balachandran, V.; Hovy, E.; and Tsvetkov, Y. Learning Research.
2021. Selfexplain: A self-explaining architecture for neural Zhang, Y.; Gao, N.; and Ma, C. 2023. Learning to Select
text classifiers. arXiv preprint arXiv:2103.12279. Prototypical Parts for Interpretable Sequential Data Model-
Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. ” Why ing. In Proceedings of the AAAI Conference on Artificial
should i trust you?” Explaining the predictions of any clas- Intelligence, 6612–6620.
sifier. In Proceedings of the 22nd ACM SIGKDD interna- Zhao, C.; Qian, W.; Shi, Y.; Huai, M.; and Liu, N. 2023. Au-
tional conference on knowledge discovery and data mining, tomated Natural Language Explanation of Deep Visual Neu-
1135–1144. rons with Large Models. arXiv preprint arXiv:2310.10708.
Rigotti, M.; Miksovic, C.; Giurgiu, I.; Gschwind, T.; and Zhao, S.; Ma, X.; Wang, Y.; Bailey, J.; Li, B.; and Jiang, Y.-
Scotton, P. 2021. Attention-based interpretability with con- G. 2021. What do deep nets learn? class-wise patterns re-
cept transformers. In International Conference on Learning vealed in the input space. arXiv preprint arXiv:2101.06898.
Representations.
Rudin, C. 2018. Please stop explaining black box models
for high stakes decisions. Stat, 1050: 26.
Sesia, M.; and Favaro, S. 2022. Conformal frequency esti-
mation with sketched data. Advances in Neural Information
Processing Systems, 35: 6589–6602.
Sinha, S.; Huai, M.; Sun, J.; and Zhang, A. 2023. Under-
standing and enhancing robustness of concept-based mod-
els. In Proceedings of the AAAI Conference on Artificial
Intelligence, 15127–15135.
Slack, D.; Hilgard, A.; Singh, S.; and Lakkaraju, H. 2021.
Reliable post hoc explanations: Modeling uncertainty in ex-
plainability. Advances in neural information processing sys-
tems, 34: 9391–9404.
Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; and Lakkaraju, H.
2020. Fooling lime and shap: Adversarial attacks on post
hoc explanation methods. In Proceedings of the AAAI/ACM
Conference on AI, Ethics, and Society, 180–186.
Teng, J.; Tan, Z.; and Yuan, Y. 2021. T-sci: A two-stage
conformal inference algorithm with guaranteed coverage for
cox-mlp. In International Conference on Machine Learning,
10203–10213. PMLR.
14659