Professional Documents
Culture Documents
Sparx: Sparse Argumentative Explanations For Neural Networks
Sparx: Sparse Argumentative Explanations For Neural Networks
Sparx: Sparse Argumentative Explanations For Neural Networks
Note that the previous definition of influence in Section 3, use the clustering to create a corresponding clustered MLP µ
from [Potyka, 2021], is a special case because ln( 1−x 1
) is whose neurons correspond to clusters in the original M. We
call these neurons cluster-neurons and denote them by vC ,
where C is the associated cluster. The graphical structure of
µ is as follows.
Definition 3 (Graphical Structure of Clustered MLP). Given
an MLP M and a clustering P = ]di=1 Pi of M, the graph- (b)
ical structure of the corresponding clustered MLP µ is a di-
rected graph (V µ , E µ ) such that
µ
• V µ = ]d+1 l=0 Vl consists of (ordered) layers of cluster-
neurons such that:
(c)
1. the input layer V0µ consists of a singleton cluster-
neuron v{v0,i } for every input neuron v0,i ∈ V0 ;
2. the l-th hidden layer of µ (for 0 < l < d + 1) consists
of one cluster-neuron vC for every cluster C ∈ Pl ;
µ
3. the output layer Vd+1 consists of a singleton
cluster-neuron v{vd+1,i } for every output neuron (a) (d)
vd+1,i ∈ Vd+1 ;
Sd µ Figure 1: a) MLP for XOR, with cluster-neurons C1 and C2 . b)
• E µ = l=0 Vlµ × Vl+1 . Clustered MLP. c) Global explanation as a QAF. d) Word-cloud rep-
Example 1. For illustration, consider the MLP in Fig- resentation of the hidden cluster-neurons.
ure 1.a, trained to approximate the XOR function from
the dataset ∆ = {(0, 0), (0, 1), (1, 0), (1, 1)} with tar- Definition 5 (Global Aggregation Functions). The average
get outputs 0, 1, 1, 0, respectively. The activation val- bias and edge aggregation functions are, respectively:
ues of the hidden neurons for the four input pairs are
(0, 0, 0, 0), (1.7, 0, 1.8, 0), (0, 2.3, 0, 1.5), (0, 0, 0, 0), respec- 1 X l
Aggb (C) = bi ;
tively. Applying K-means clustering with δ as in Eq. 1 and |C|
vl,i ∈C
K = 2 for the hidden layer results in two clusters C1 , C2 , X X
indicated by rectangles in the figure. 1
Agge ((C1 , C2 )) = l
Wj,i .
We define global and local explanations for MLPs by trans- |C2 |
vl,i ∈C1 vl+1,j ∈C2
lating their corresponding clustered MLPs into QAFs. The The average bias aggregation function simply averages the
clustered MLPs for global and local explanations share the biases of neurons in the cluster. Intuitively, the weight of an
same graphical structure but differ in the parameters of the edge between cluster-neurons vC1 and vC2 has to capture the
cluster-neurons, that is, (i) the biases of cluster-neurons and effects of all neurons summarized in C1 on neurons summa-
(ii) the weights for edges between cluster-neurons. We define rized in C2 . The neurons in C1 all affect every neuron in C2 ,
these parameters in terms of aggregation functions, specif- therefore their effects have to be summed up. As vC2 acts as
ically a bias aggregation function Aggb : P → R, map- a replacement of all neurons in C2 , it has to aggregate their
ping clusters to biases, and an edge aggregation function activation. We achieve this aggregation by averaging again.
Agge : P × P → R ∪ {⊥}, mapping pairs of clusters to The following example illustrates global explanations
weights if the pairs correspond to edges in µ, or ⊥ other- drawn from clustered MLPs, with the average aggregation
wise. Given any concrete such aggregation functions (as de- functions, via their understanding as QAFs.
fined later), the parameters of µ can be defined as follows.
Example 2. Figure 1.b shows the clustered MLP for our
Definition 4 (Parameters of Clustered MLP). Given an MLP XOR example from Figure 1.a. The corresponding QAF can
M = (V, E, B, W, ϕ), let (V µ , E µ ) be the graphical struc- be visualised as in Figure 1.c, where attacks are given in red
ture of the corresponding clustered MLP µ. Then, for bias and and supports in green, and the thickness of the edges reflects
edge aggregation functions Aggb and Agge , respectively, µ is their weight. To better visualize the role of each cluster-
(V µ , E µ , B µ , W µ , ϕ) with parameters B µ , W µ as follows: neuron, we can use a word-cloud representation as in Figure
• for every cluster-neuron vC ∈ V µ , the bias (in B µ ) of 1.d (showing, e.g., that x0 and x1 play a negative and positive
vC is Aggb (C); role, respectively, towards C1 , which supports the output).
The word-cloud representation gives insights into the reason-
• for every edge (vC1 , vC2 ) ∈ E µ , the weight (in W µ ) of ing of the MLP (with the learned rule (x0 ∧ x1 ) ∨ (x0 ∧ x1 )).
the edge is Agge ((C1 , C2 )).
5.2 Sparse Argumentative Local Explanations
5.1 Sparse Argumentative Global Explanations While global explanations attempt to faithfully explain the
We consider the following aggregation functions. As we ex- behaviour of the MLP on all inputs, our local explanations fo-
plain in the supplementary material (SM), they minimize the cus on the behaviour in the neighborhood of the input x from
deviation (with respect to the least-squares error) of bias and the dataset ∆, similar to LIME [Ribeiro et al., 2016]. To
weights of the cluster neuron and the neurons contained in the do so, we generate random neighbors of x to obtain a sam-
cluster. ple dataset ∆0 , and weigh them with an exponential kernel
from LIME [Ribeiro et al., 2016], assigning lower weight to with Kd+1,j = 1). Then, the local structural unfaithfulness of
a sample x0 ∈ ∆0 that is further away from the target x: µ to M with respect to input x and dataset ∆ is:
πx0 ,x = exp(−D(x0 , x)2 /σ 2 ) X X Kl
d+1 X X µ
LM
s (µ) = πx0 ,x (OxM 2
0 (vl,i )−Ox0 (Cl,j )) .
where D is the Euclidean distance function and σ is the width x0 ∈∆ l=1 j=1 vl,i ∈Cl,j
of the exponential kernel.
We aggregate biases as before but replace the edge aggre- The global structural unfaithfulness GsM (µ) is defined
gation function with the following. analogously by removing the similarity terms πx0 ,x .
Definition 6 (Local Edge Aggregation Function). The local The lower the structured unfaithfulness of the clustered
edge aggregation function with respect to the input x is MLP µ, the more structurally faithful µ is to the original MLP.
Note that our notion of structural faithfulness is different from
Aggex (C1 , C2 ) = the notions of structural descriptive accuracy by [Albini et
X X X al., 2022]: they characterise bespoke explanations, defined
1
πx0 ,x µ
l
Wi,j OxM
0 (vl,i ) therein, of probabilistic classifiers equipped with graphical
0 0
|C 2 |.Ox0 (vC1 ) structures and cannot be used in place of our notion, tailored
x ∈∆ vl,i ∈C1 vl+1,j ∈C2
to local and global explanations
where OxM 0 (vl,i ) is the activation value of the neuron vl,i in Finally, we consider the cognitive complexity of an expla-
the original MLP and Oxµ0 (vC1 ) is the activation value of the nation based on its size, inspired by the notion of cognitive
cluster-neuron C1 in the clustered MLP. tractability in [Cyras et al., 2019]. In our case, we use the
Note that, by this definition, the edge weights are computed number of cluster-neurons/arguments as a measure.
layer by layer from input to output. Definition 9 (Cognitive Complexity). Let Kl be the number
of clusters at hidden layer l in µ (0 < l ≤ d). Then, the
6 Desirable Properties of Explanations cognitive complexity of µ is defined as
In order to evaluate SpArX, we consider three measures for Ω(µ) = Π Kl
0<l<d+1
assessing faithfulness and comprehensibility of explanations.
In this section, we assume as given an MLP M of depth d Note that there is a tradeoff between faithfulness and cogni-
and a corresponding clustered MLP µ. tive complexity in SpArX. By reducing the number of cluster-
To begin with, we consider a faithfulness measure inspired neurons, we reduce cognitive complexity. However, this also
by the notion of fidelity considered for LIME [Ribeiro et results in a higher variance in the neurons that are summa-
al., 2016], based on measuring the input-output difference be- rized in the clusters, so the faithfulness of the explanation can
tween the original model (in our case, M) and the substitute suffer. We will explore this trade-off experimentally next.
model (in our case, the clustered MLP/QAF). Finally, note that other properties of explanations by sym-
bolic approaches, notably by [Amgoud and Ben-Naim, 2022],
Definition 7 (Input-Output Unfaithfulness). The local input- are unsuitable for our mechanistic explanations as QAFs. In-
output unfaithfulness of µ to M with respect to input x and deed, these properties characterize the input-output behaviour
dataset ∆ is of classifiers, rather than their mechanics.
X X µ
LM (µ) = πx0 ,x (OxM 2
0 (v) − Ox0 (v)) .
x0 ∈∆ v∈Vd+1
7 Experiments
We conducted four sets of experiments to evaluate SpArX
The global input-output unfaithfulness of µ to M with re- with respect to (i) the trade-off between its sparsification and
spect to dataset ∆ is its ability to maintain faithfulness (Section 7.1 for global and
X X µ
Section 7.2 for local explanations), (ii) SpArX’s scalability
G M (µ) = (OxM 2
0 (v) − Ox0 (v)) . (Section 7.3), and (iii) the tradeoff between faithfulness and
x0 ∈∆ v∈Vd+1 cognitive complexity (Section 7.4). We used four datasets for
The lower the input-output unfaithfulness of the clustered classification: iris1 with 150 instances, 4 continuous fea-
MLP µ, the more faithful µ is to the original MLP. tures and 3 classes; breast cancer2 with 569 instances,
The input-output unfaithfulness measures deviations in the 30 continuous features and 2 classes; COMPAS [Jeff Lar-
input-output-behaviour of the substitute model, but, since son and Angwin, 2016] with 11,000 instances and 52 cat-
clustered MLPs maintain much of the structure of the origi- egorical features, to classify two year recid; and forest
nal MLPs, we can define a more fine-grained notion of struc- covertype3 [Blackard and Dean, 1999] with 581,012 in-
tured faithfulness by comparing the outputs of the individual stances, 54 features (10 continuous, 44 binary), and 7 classes.
neurons in the MLP to the output of the cluster-neurons sum- We used the first three datasets to evaluate the (input-output
marizing them in the clustered MLP. and structural) unfaithfulness of the global and local expla-
nations generated by SpArX. We then used the last dataset,
Definition 8 (Structural Unfaithfulness). Let Kl be the num-
ber of clusters at hidden layer l in µ (0<l≤d) and Kd+1 be the 1
https://archive.ics.uci.edu/ml/datasets/iris
2
number of output neurons (in µ and M). Let Kl,j be the num- https://archive.ics.uci.edu/ml/datasets/cancer
3
ber of neurons in the jth cluster-neuron Cl,j (0 < l ≤ d + 1, https://archive.ics.uci.edu/ml/datasets/covertype
which is considerably larger and requires deeper MLPs with Compression Datasets
varying architectures, to evaluate the scalability of SpArX. Method
Finally, we used COMPAS to assess cognitive complexity. Ratio Iris Cancer COMPAS
For the experiments with the first three datasets we used HAP 0.23 9.54 0.10
0.2
MLPs with 2 hidden layers and 50 hidden neurons each, SpArX 0.00 0.83 0.02
whereas for the experiments with the last we used 1-5 hid- HAP 0.89 61.57 0.24
den layers with 100, 200 or 500 neurons. For all experiments, 0.4
SpArX 0.00 1.04 0.03
we used the RELU activation function for the hidden neurons
and the softmax activation function for the output neurons. HAP 1.37 61.57 0.46
0.6
We give classification performances for all MLPs and aver- SpArX 0.00 1.40 0.04
age run-times for generating local explanations in the SM. HAP 3.00 116.20 1.20
When experimenting with SpArX, one needs to choose 0.8
SpArX 0.02 2.34 0.05
the number of clusters/cluster-neurons at each hidden layer:
we do so by specifying a compression ratio (for example, a Table 2: Global structural unfaithfulness of sparse MLPs generated
compression ratio of 0.5 amounts to obtaining half cluster- by HAP vs our SpArX method.
neurons than the original neurons).
by Al,i < Al0 ,i0 if l ≤ l0 and i < i0 . The update order As usual, we use the fact that the partial derivatives at a mini-
then corresponds to the standard forward propagation mum must be 0. By setting the derivative with respect to b to
procedure in MLPs. Then, the claim follows by induc- 0 and solving for b, we find that we must have
tion over the depth d of M. For the input arguments at P l
depth d = 0, we do not have any attackers or supporters. vl,i ∈C bi
b= , (2)
Therefore, the aggregation function will just return 0 and |C|
we have σ(A0,i ) = ϕ ϕ−1 (xi ) = xi = OxM (v0,i ),
which corresponds to our bias aggregation function.
which proves the base case. For the induction step, con-
Thinking about the weights is slightly more complicated
sider an argument at depth d + 1. We have σ(Ad+1,i) =
P because the weights represent the connection strength be-
ϕ ϕ−1 (ϕ(bd+1
i )) + (Ad,i0 ,Ad+1,i )∈E Wid0 ,i · σ(Ad,i0 ) = tween two cluster neurons. When considering an edge
P
ϕ bd+1
i + d M
(Ad,i0 ,Ad+1,i )∈E Wi0 ,i · Ox (vd,i )
0 = (vC1 , vC2 ) in the clustered MLP, we have to take account of
the fact that every neuron v2 ∈ C2 is potentially influenced
OxM (vd+1,i ), which completes the proof. by all neurons in the previous layer. In particular, it is po-
tentially influenced by all v1 ∈ C1 . Hence, while v2 will
Motivation of Aggregation Functions potentially receive |C1 | inputs from the neurons in C1 , vC2
Our global aggregation function is motivated as follows. Ev- will receive only one input from vC1 . When comparing the
ery neuron in the original MLP M is represented by a clus- weight w of the edge (vC1 , vC2 ) with a weight Wj,i
l
between
ter in the corresponding clustered MLP µ. Ideally, we want neurons vl,i ∈ C1 and vl,j ∈ C2 , it is therefore important to
that the output of the cluster in µ is equal to the output of all divide w by |C1 | (since (vC1 , vC2 ) represents |C1 | edges, it
neurons in M. However, obviously, this is only possible if should be |C1 | times larger than the weight of the individual
all neurons in the cluster would have the same output in the edges). Hence, given a pair of clusters (C1 , C2 ) with corre-
first place. The next best thing to do is then to minimize the sponding weight w, we want to minimize
expected deviation. However, due to the layerwise dependen- X X
l w 2
cies (the activation of a neuron depends on all weights and (Wj,i − ) . (3)
|C1 |
biases in all previous layers) and non-linear activation func- vl,i ∈C1 vl+1,j ∈C2
tions, the resulting objective function does not possess a sim- Setting the derivative with respect to w to 0 yields the neces-
ple closed-form solution and the solution is not necessarily sary condition
unique. We, therefore, choose a heuristic approach to define X X w
the aggregation functions that should result in a small, but not 0 = −2 · l
(Wj,i − ). (4)
necessarily minimal deviation. |C1 |
vl,i ∈C1 vl+1,j ∈C2
Dividing both sides of the equation by 2 and rearranging #Neurons
terms yields #Layers
100 200 500
X X X X w 1 0.7009 0.7064 0.6996
l
Wj,i = = |C2 | · w. 2 0.6934 0.6660 0.6622
|C1 |
vl,i ∈C1 vl+1,j ∈C2 vl,i ∈C1 vl+1,j ∈C2
3 0.6699 0.6738 0.6690
(5) 4 0.6715 0.6561 0.6701
Dividing by |C2 | leads to the expression in our weight aggre- 5 0.6623 0.6873 0.6714
gation function
Table 3: Average F1-scores for trained MLPs with different numbers
1 X X of hidden layers (#Layers) and hidden neurons (#Neurons) using
l
w= Wj,i . (6)
|C2 | the COMPAS dataset.
vl,i ∈C1 vl+1,j ∈C2
2 Experiments #Layers
#Neurons
100 200 500
This section includes more detailed experimental results for
1 0.8035 0.8324 0.8795
Section 7. The code to run the experiments is part of this sub-
2 0.8622 0.9068 0.9457
mission and will be made publicly available upon acceptance.
3 0.8862 0.9272 0.9519
4 0.9025 0.9356 0.9507
Performance of Trained Models 5 0.9100 0.9400 0.9529
Tables 1, 2, 3, and 4 show the average F1-scores of the trained Table 4: Average F1-scores for trained MLPs with different numbers
MLPs using 90% of the data for training the models. Only for of hidden layers (#Layers) and hidden neurons (#Neurons) using
the forest covertype dataset, increasing the number of the forest covertype dataset.
hidden layers and the number of hidden neurons in an MLP
leads to a higher F1-score.
Table 2: Average F1-scores for trained MLPs with different numbers Table 5: Evaluating scalability of SpArX: local input-output un-
of hidden layers (#Layers) and hidden neurons (#Neurons) using faithfulness of SpArX (with 20% compression ratio) and LIME
the cancer dataset. using various MLP architectures with different numbers of hid-
den layers (#Layers) and neurons (#Neurons) using the forest
covertype dataset.
#Neurons #Neurons
#Layers Method #Layers Method
100 200 500 100 200 500
1 LIME 0.2375 0.2919 0.3123 1 LIME 0.1612 0.1633 0.1645
1 SpArX 0.0000 0.0006 0.0000 1 SpArX 0.0747 0.1047 0.3999
2 LIME 0.2509 0.2961 0.3638 2 LIME 0.1612 0.1633 0.1645
2 SpArX 0.0012 0.0026 0.0031 2 SpArX 0.1865 0.2269 0.8985
3 LIME 0.3130 0.3285 0.3127 3 LIME 0.1612 0.1633 0.1645
3 SpArX 0.0028 0.0021 0.0000 3 SpArX 0.2881 0.3809 1.5887
4 LIME 0.3395 0.3459 0.3243 4 LIME 0.1612 0.1633 0.1645
4 SpArX 0.0001 0.0044 0.0000 4 SpArX 0.3430 0.4368 2.5342
5 LIME 0.3665 0.3178 0.3288 5 LIME 0.1612 0.1633 0.1645
5 SpArX 0.0030 0.0045 0.0000 5 SpArX 0.3729 0.5275 3.3333
Table 6: Evaluating scalability of SpArX: local input-output un- Table 8: Average run-times in seconds (over different compression
faithfulness of SpArX (with 40% compression ratio) and LIME ratios 20%, 40%, 60%, 80%) of SpArX and LIME (in seconds) for
using various MLP architectures with different numbers of hid- generating local explanations from MLPs with different numbers of
den layers (#Layers) and neurons (#Neurons) using the forest hidden layers (#Layers) and neurons (#Neurons) for the forest
covertype dataset. covertype dataset.