Professional Documents
Culture Documents
Nist Ir 8367
Nist Ir 8367
Nist Ir 8367
Psychological Foundations of
Explainability and Interpretability in
Artificial Intelligence
David A. Broniatowski
Psychological Foundations of
Explainability and Interpretability in
Artificial Intelligence
David A. Broniatowski
Information Technology Laboratory
April 2021
Abstract
In this paper, we make the case that interpretability and explainability are distinct require-
ments for machine learning systems. To make this case, we provide an overview of the
literature in experimental psychology pertaining to interpretation (especially of numerical
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
stimuli) and comprehension. We find that interpretation refers to the ability to contextualize
a model’s output in a manner that relates it to the system’s designed functional purpose, and
the goals, values, and preferences of end users. In contrast, explanation refers to the ability
to accurately describe the mechanism, or implementation, that led to an algorithm’s output,
often so that the algorithm can be improved in some way. Beyond these definitions, our
review shows that humans differ from one another in systematic ways, that affect the extent
to which they prefer to make decisions based on detailed explanations versus less precise
interpretations. These individual differences, such as personality traits and skills, are asso-
ciated with their abilities to derive meaningful interpretations from precise explanations of
model output. This implies that system output should be tailored to different types of users.
Key words
1
______________________________________________________________________________________________________
Table of Contents
1 Introduction 1
1.1 Why Define Interpretability and Explainability? 1
1.2 Proposed Definitions of Interpretability and Explainability 2
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
2
______________________________________________________________________________________________________
References 40
List of Tables
Table 1 Performance of three hypothetical models to detect online malicious behaviors. 11
Table 2 Example of SBRL output, which seeks to explain whether a customer will
leave the service provider. PP = Probability that the label is positive. Source:
[147] 27
3
______________________________________________________________________________________________________
List of Figures
Fig. 1 Diagram of the interaction between a machine learning model and human
cognition. For example, the initial training dataset may contain records of
the hourly rainfall at a nearby beach. This trained model is then used to
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
4
______________________________________________________________________________________________________
Fig. 7 An example of output from SHAP which indicates which indicates the model’s
baseline value, the marginal contributions of each of its features, and the
final prediction. Such an approach is analogous to a graphical interpreta-
tion of linear regression coefficients. The original image may be found at
https://github.com/slundberg/shap 24
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
5
______________________________________________________________________________________________________
1. Introduction
This paper draws upon literature in computer science, systems engineering, and experi-
mental psychology to better define the concepts of interpretability and explainability for
complex engineered systems. Our specific focus is on systems enabled by artificial intelli-
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
1
______________________________________________________________________________________________________
perform tasks like evaluation, trusting, predicting, or improving a model.” (Singh, as cited
in [61]). Gilpin et al. [54] posit that a good explanation occurs when modelers or con-
sumers “can no longer keep asking why” in regards to some ML model behavior. Finally,
Rudin [128] defines an interpretable machine learning model as one that is “constrained
in model form so that it is either useful to someone, or obeys structural knowledge of the
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
2Amental representation of a stimulus (e.g., data or model output) is a symbolic image of that stimulus in a
human’s mind.
2
______________________________________________________________________________________________________
vs.
Evaluation
Data
Decision
Technical
Expertise
Fig. 1. Diagram of the interaction between a machine learning model and human cognition. For
example, the initial training dataset may contain records of the hourly rainfall at a nearby beach.
This trained model is then used to generate predictions from new evaluation data, such as a
probability distribution over the amount of rain that a beachgoer might expect in a given hour.
These predictions and other model output are then provided to the human as a stimulus. The
human encodes the stimulus into multiple mental representations. The verbatim representation, is a
detailed symbolic representation of the stimulus such as a graphical representation of the
probability distribution over rainfall amounts per hour. In parallel, humans employ their
background knowledge to encode a meaningful gist interpretation from the stimulus. For example,
a simple categorical gist might be the distinction between “essentially no chance of rain” vs. “some
chance of rain”. Additionally, humans with appropriate expertise might be able to examine the
form of the model to determine how it arrived at its conclusion. For example, a meteorologist with
domain expertise might be able to examine the coefficients of a model’s time series equations and
recognize it as an indication of an incoming cold front. A human would then make a decision (e.g.,
whether or not to go to the beach) based on a combination of these representations. For example, a
human without technical expertise might look at the stimulus and determine that the probability of
rain is essentially nil, leading them to go to the beach (since the beach with no rain is fun, and the
beach with some rain is not fun, and having fun is good). On the other hand, a human with
meteorology expertise and data science expertise might recognize the signs of an oncoming cold
front and realize that rain is a non-negligible possibility,
3 leading them to choose another activity.
______________________________________________________________________________________________________
appropriate for technical practitioners, who can rely on extensive background knowledge
to enable debugging tasks.
Although they are not necessarily verbatim processes themselves, explanations are, in
this way, closer to verbatim mental representations than interpretations are. Whereas an
interpretation seeks to make sense of a stimulus presented to a human subject, an expla-
nation seeks to describe the process that generated an output.3 Thus, an explanation of an
algorithm’s output is justified relative to an implementation, or technical process, that was
used to generate a specific output. In contrast, an interpretation is justified relative to the
functional purpose of the algorithm.
Explanations of ML algorithms can provide implementation details for how the algo-
rithm carries out a known set of requirements. In contrast, interpretations justify these
implementations in terms of the system’s functional purpose. For example, the purpose of
a Support Vector Machine classifier is to map datapoints into discrete classes, a task that
must be justified in terms of the utility of the classification to a human decision-maker such
as if this classifier is used to assign job applicants’ resumes into merit-based categories dur-
ing an interview process. The quality of the classification would then be evaluated relative
to the requirements of this interview process – a classifier that is biased (e.g., that makes
classifications based upon categories that are not merit-based, such as age, race, ethnicity,
etc.) or that has a high error rate would be considered a poor classifier because it does not
meet its functional purpose. In contrast, the explanation for why a particular classification
decision was made is typically justified relative to its implementation. For example, when
asking how a particular job candidate was classified as “not eligible”, one must seek an
explanation in terms of the algorithm’s details, such as that the algorithm selected a set of
candidates’ profiles as “minimally acceptable” – i.e., they were support vectors – based on
training data, and that this particular candidate’s qualifications were, on the whole, inferior
to those reference candidates. An even more detailed explanation would entail examining
specific mathematical parameter values, such as the algorithm’s regularization weights, to
understand how specific attributes were combined and how support vectors were chosen.
In this paper, we make the case that explanations and interpretations are distinct mental
representations that are encoded simultaneously, and in parallel, in the minds of the sys-
tem’s users. Furthermore, users differ from one another in the degree to which they are
willing and able to utilize their own background knowledge to interpret detailed technical
information. In effect, interpretable systems should provide no more detail than necessary
to make a consequential decision, with the information provided justified relative to sys-
3 Consider a car with a “check engine” light that is illuminated. An explanation might indicate that the check
engine light turned on because the car’s internal programming detected fuel flow irregularities. However,
the interpretation for the driver is that the car needs to be taken to a mechanic for further evaluation.
4
______________________________________________________________________________________________________
The above definitions suggest that the efficacy of interpretations and explanations may
differ between individuals, and indeed, we will review literature showing that they do so
in systematic ways. That is, the audiences for these different types of outputs are likely to
differ, such that developers who lack domain knowledge would be able to use a detailed
mechanistic explanation to ensure that their design meets a specific functional requirement
(e.g., a certain accuracy target), but may not understand the implications of this requirement
for human users. In contrast, users who lack machine learning expertise but possess domain
knowledge would likely find these detailed mechanistic explanations confusing, instead
preferring a simple description of model output in terms of constructs with which they
are familiar. Finally, a developer with domain knowledge can often bring this combined
expertise to bear to make sense of a detailed mechanistic explanation in terms of its ultimate
use case, thus ensuring that the algorithm moves beyond the rote requirements to best
address the user’s needs.
Disentangling explainability – whether one can describe the mechanistic description of
how a system made a specific prediction – from interpretability – whether a human can
derive meaning from a system’s output for a specific use case – may form the basis for ro-
bust and reliable standards for explainable and interpretable ML system design, and should
allow the development of standards that isolate technical design features from specific sys-
tem functional requirements. This, in turn, should allow developers to segment the design
process, such that system requirements may be defined at the appropriate level of abstrac-
tion. Additionally, we expect that better definitions of these terms will allow the ultimate
development of metrics to assure compliance with these standards, thus enabling the cre-
ation of coherent regulatory policies for artificial intelligence that promote innovation while
building public trust.
denied because of improper data cleaning, recording, transportation, etc., and the difficulties that
some renters applicants face in correcting these errors and, consequently, their personal records:
https://themarkup.org/series/locked-out
5
______________________________________________________________________________________________________
For example, consider an algorithm that recommends rejection of a rental applicant. The
algorithm would make this determination based on a family of mathematical models fit
to training data, followed by evaluation of a model output generated from an additional
held-out datapoint that represents the applicant’s case. An interpretation of the algorithm’s
recommendation would contextualize the datapoint representing the applicant. A human
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
would use their background knowledge to generate this context. For example, a human as-
sessor might conclude that the applicant was a risk, based on the absence of the applicant’s
rental history. In contrast, a machine learning model would use a combination of training
data and the model selected by the machine learning algorithm (including any associated
sources of bias). Here, the algorithm might associate length of rental history with success,
and therefore also categorize the applicant with a short history represents a financial risk.
As will be discussed below, human interpretations differ from algorithmic ones in that the
former are flexible whereas the latter tend to be brittle. Importantly, both interpretations
are justified relative to a higher-level construct – a “rental history” – that contextualizes
the decision relative to domain knowledge. Furthermore, this output provides the user with
actionable insight. The solution to the problem is not to change the algorithm’s implemen-
tation, but rather for the applicant to establish a rental history. In order to understand the
meaning of this output, the applicant does not need to have any experience in AI or ML;
rather, they must possess sufficient domain expertise to understand why rental histories
are important indicators of approvals (we will discuss how interpretability may vary with
domain expertise in section 2.3).
In contrast, an explanation of the same algorithm’s output would start with the obser-
vation that the applicant was rejected and then seek to answer the question of how that
decision came to be. For example, the explanation might specify that the algorithm was
trained using a logistic regression classifier with specific coefficient values. Given the ap-
plicant’s datapoint, one could then plug the values into the logistic regression equation,
generate the model’s probability of success for the applicant, and then observe that it is
below the decision threshold. This explanation would not necessarily highlight the specific
role of rental history, but a human analyst with access to this equation, and the expertise to
interpret it, might observe that the largest marginal contribution to the algorithm’s decision
is the rental history.6 Similarly, a human, asked to explain a rejection, might provide a
causal explanation (“Your application was rejected because you don’t have a rental history.
People without rental histories are higher risk because they don’t have any experience with
paying rent on time, and because we don’t have any evidence that they are responsible. As
a rule, we prefer to rent to people with a reliable record of payments”). However, as we
will discuss below, humans, and especially subject-matter experts, routinely violate such
causal rules when making judgments. Arguably, this is because they are able to recognize
necessary exceptions through the application of educated intuition (however, these same
processes may also be a source of systematic bias if the underlying intuition is uneducated
or otherwise inapplicable).
6 Importantly,many modern machine learning algorithms, and especially deep neural networks, are not as
easily analyzed by humans.
6
______________________________________________________________________________________________________
that this model is implemented using a logistic regression classifier with two classes cor-
responding to recommendations for, and against, antibiotic prescription. Finally, given the
data reported to the system, suppose that the model has determined that the probability
that a patient has a bacterial illness is 5%-10% [Cen]. The system would then provide the
prescribing physician with a stimulus – the recommendation not to prescribe.
An explanation of this recommendation would reference the model’s implementation.
For example, the system might list the coefficients of the logistic regression model and
the values of all of the model’s variables (whether the patient has a sore throat, pain when
swallowing, fever, red and swollen tonsils with white patches or streaks of pus, tiny red
spots on the roof of the mouth, swollen lymph nodes in the front of the neck, cough, runny
nose, hoarseness, or conjunctivitis [Wor]). Given these coefficients, the system might fur-
ther explain that, when one multiplies the coefficients by the variable values, and then sums
the result, the combined probability that the illness is bacterial is 5%-10%, indicating “No
further testing nor antibiotics” [Wor]. This is where the explanation would stop.8
In contrast, an interpretation of the system’s recommendation would reference simple,
categorical representations of the relative risk and then link these to values. For example,
the following values could apply: when the patient is sick getting better is good, whereas
the staying sick is bad. Additionally uncomfortable side effects are bad (they will make the
patient feel worse) and no side effects are good. Finally, unnecessary prescription promotes
antibiotic resistance, potentially harming others (bad), whereas not prescribing has no ef-
fect on others. Given these values, the system would state: 1) the likelihood antibiotics
would help is virtually nil; 2) antibiotics, if prescribed, could lead to uncomfortable side
effects; 3) using antibiotics when they are not necessary could harm others [23, 24, 72],
indicating “No further testing nor antibiotics”.
Despite these recommendations, there are several reasons why an expert physician
might prescribe antibiotics under these circumstances. For example, the expert might rec-
ognize that a patient is especially susceptible to bacterial infection, or might simply make
the gist-based strategic choice that “it’s better to be safe than sorry” [23].
7 For example, a recent report [132] indicates how an attempt to adjust a statistical model to account for racial
disparities in training data may have led to under-treatment for African-American patients needing kidney
transplants.
8 In practice an expert physician would probably not rely on rote output from this system, instead perhaps
acknowledging that patient followup is needed if the symptoms don’t clear up, since there is a possibility
that the illness could be bacterial. Depending on the specific circumstances and context, this possibility
could even lead the physician to prescribe “just in case” [24].
7
______________________________________________________________________________________________________
intensive algorithms – sometimes trained on terabytes of data – that are being deployed to
solve real-world problems.
This development is not unique to AI; rather it is the consequence of an increasingly
complex infusion of technology into all aspects of society. Although our focus here is
restricted to computational, and especially machine learning, technologies, these develop-
ments are part of a larger trend that extends throughout all areas of technology. 9 Pervasive
embedded computation has accelerated this trend. It is rare to find a piece of technology
that doesn’t have some kind of computational component – from learning thermostats to
credit score adjudications to visa applications. These technologies also require several dif-
ferent types of expertise to regulate appropriately. First, technical expertise is required to
understand how these technologies operate and, because the technologies are so complex,
this expertise is restricted to a relatively small number of people, whereas the number of
people whose lives are directly impacted has grown significantly. However, many types of
expertise are relevant. For example, evaluating the legal consequences of AI/ML technolo-
gies requires in-depth familiarity with relevant areas of law. Similar concerns may apply to
financial credit assessment, job application review, issues of political and social equity, and
other ethical concerns. Thus, it is not sufficient to query the experts in a specific area. Ef-
fectively assessing, and thus regulating, interpretable and explainable AI systems requires
pooling expertise from across fields that have not traditionally interacted.
We can expect the pace of this trend towards increasingly complex systems to acceler-
ate. This evolution in major technological trends has been documented by the Engineer-
ing Systems movement. [39] Started in the early 2000s, this movement recognized that
technology and modern society are highly intertwined and that the pace of technological
and social change requires that the design of complex systems must adapt to account for
what these scholars have called the “ilities”.10 Explainability and interpretability are both
“ilities” and exhibit similar difficulties associated with their measurement. “Ilities” have
historically been subject to problems of both polysemy, meaning that the same terms are
frequently used to describe distinct concepts, and synonymy, meaning that different terms
9 Consider automotive technologies: A mechanically-inclined person in the 1950s could often diagnose and
fix problems with their own automobiles. However, as these automotive systems increased in complexity,
professional expertise became increasingly necessary. By the early 2000s, with the rise of embedded com-
puting, specialized tools became necessary to even diagnose, let alone fix, problems. For example, a car’s
internal computer may trigger a check-engine light, but provide no additional information to the driver. Fur-
ther diagnosing this problem instead requires a specialized tool, which provides a diagnostic code that must
be interpreted by trained mechanics who specialize in that particular brand of automobile.
10 So called because several of them end with the suffix “ility”. Key “ilities” include flexibility – the need for
a system to adapt to changes in its environment – survivability – the need for a system to continue operation
despite disruption, etc.
8
______________________________________________________________________________________________________
sometimes refer to the same underlying construct [28]. Furthermore, these terms entail a
significant social component that cannot be disentangled from core values of users, design-
ers, and decision-makers. Finally, “ilities” have a strong policy component because they
cannot be studied in isolation from their effects on the public, and especially on vulnerable
populations. Thus, attempts to define explainability and interpretability in artificial intelli-
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
gence are comparable to challenges faced by scholars studying other complex engineered
systems [39], for which definitions of abstract, yet important, concepts such as flexibility,
resilience, etc., strongly depend on social evaluations. Two decades of research in this area
have discovered that these highly-abstract requirements may be difficult to measure in a
standard way because of their highly context-sensitive, socially-contingent nature. Never-
theless, their importance justifies the challenge of establishing standards that are flexible
enough to be responsive in different contexts.
The definitions of interpretability and explainability proposed in this paper build upon
decades of empirical research in experimental psychology (see [115, 118] for reviews).
We draw upon this extensive literature to posit a distinction between interpretation – the
process of deriving meaning from a stimulus (such as a model’s output) – and explanation
– the process of generating a detailed description of how an outcome was reached (see [55]
for preliminary data supporting this claim). We argue that the relationship between human
decision making and algorithmic decision making is analogous to different levels of mental
representation. Human individual differences also consistently predict which human sub-
jects prefer to rely on these different representations when making consequential decisions,
especially about how to use numerical information [27].11
11 For example, a recent preprint provides some preliminary data suggesting that one leading algorithm for
explainable AI [126] did not improve decision-makers’ objective accuracy – i.e., their ability to retrieve
the right (verbatim) number – whereas a more accurate AI system did. Similarly, a recent human subjects
experiment [107] shows that model transparency – i.e., a model with a small number of features – did not
improve humans’ prediction accuracy and may have actually impaired their ability to correct inaccurate
predictions due to information overload (see also [55]).
9
Fig. 2. Multiple Levels of Mental Representation Encoded in Parallel.
10
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
______________________________________________________________________________________________________
Humans tend to make decisions based on the simplest of these representations – the
gist interpretations of stimuli. Humans may encode multiple gists at varying levels of pre-
cision, forming a hierarchy of gists [27]. In contrast, algorithms follow only rote verbatim
processes when making predictions. Humans also encode verbatim mental representations,
which are simply detailed representations of the stimulus itself (e.g., raw system output).
These different levels of mental representation of a stimulus are encoded simultaneously
and in parallel [115]. Furthermore, these representations may compete with, or build upon
one another [22] when providing input into human decision-making.
11
______________________________________________________________________________________________________
12
______________________________________________________________________________________________________
indicator of a problem in the algorithm’s implementation. For example, one might achieve
perfect precision if only a very small number of cases are being correctly classified. The
corresponding gist would be “too good to be true”. As described above, the kNN classifier
would be ruled out because it has a gist of no accuracy which, regardless of precision, is
problematic. Thus, the human expert might rely on an ordinal gist to choose the model
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
with “better” precision – the Logistic Regression model – because contextual cues indicate
that the other two models are inferior for the functional purpose – i.e., the goal – of the
machine learning task. Research based on Fuzzy-Trace Theory has shown that models
that emphasize the gist – such as by displaying output in ways that more readily allow
users to draw meaningful conclusions – inspire greater trust, confidence, and understanding
[37, 146? ]. This implies a clear design goal for ML system designers who are concerned
with interpretability – system output must communicate the gist.
and diversity of stimuli can help subjects derive general principles that go beyond the surface forms of the
causal structures that one might infer. These general principles draw upon extensive prior knowledge that
make certain explanations more probable than others. This implies that attempts to define “explainability”
(here, understood as encompassing both explanation and interpretation) entirely based on causality (or
conversely, on counterfactual reasoning) may be missing an essential element – generalizability/gist.
13
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
14 Notably, these authors identify several structures beyond causal chains, including abstractions, “swarm”,
etc., and relate these to different types of explanations, concluding that “The property of ’being an ex-
planation’ is not a property of statements: It is an interaction of statements with knowledge, context, and
intention.”
14
Fig. 4. Representational Hierarchy for a Support Vector Machine.
15
Model Output
Mathematical
Surface Form
(Code Base)
Basis
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
______________________________________________________________________________________________________
users seeking to understand or explain model output would be assisted by coherent causal
explanations that would explain how a system came to a certain conclusion.15
This interpretation of the literature is supported by an extensive body of work on mental
models, which studies how technical experts represent, and make decisions about, com-
plex systems. Although a full review of this literature is beyond the scope of this paper,
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
15 Thesestructures, in turn, can facilitate extraction of gist of the story (see [25], for a review); however, the
causal structures are not, themselves, the gist.
16
______________________________________________________________________________________________________
matters of skill. For example, a professional computer scientist with years of training is
endowed with a skillset that is quite different from that of a professional legal scholar.
Individuals with relevant skills may prefer to rely on more precise levels of mental repre-
sentation if they have the ability to process them. For example “numeracy” – mathematical
ability [44, 84, 104] – allows individuals to make sense of complex numeric data such as
percentages and fractions, such that they are less susceptible to statistical bias when making
decisions. Similarly, in a machine learning context, [67] found that users with computer
science backgrounds (and especially doctoral-level training) were more likely to agree that
the system was useful and trustworthy if they understood how the system worked (and vice
versa), and Linderholm et al. [85] found that more skilled readers, and those with more
relevant background knowledge, were better able to extract the gist from narratives with
poorly-defined causal structures. These interpretive processes are associated with domain
expertise [115] – a hallmark of gist processing [121].
Other differences are matters of personality traits. For example, some individuals pre-
fer to rely on their “gut feelings” – i.e., their intuitive judgments – when making a decision,
whereas others prefer to engage in extensive deliberation. The Cognitive Reflection Test
(CRT; [46]) is a measure of this trait (although it is also correlated with numeracy and in-
telligence [84, 104]), and researchers have found that individuals with high CRT are less
susceptible to decision biases that oppose intuitive to deliberative modes of thought (such
as the well-known “framing effect”; [141]). Similarly, the Need for Cognition (NFC) Scale
[31, 32] measures subjects’ preferences to exert mental effort. For example, [57] describes
evidence for a model of narrative comprehension in which multiple levels of mental rep-
resentation are encoded, with some readers preferring to use coherence-building strategies
relying on effortful “close-to-the-text” reading and those who utilize a more interpretive
strategy that is “farther” from the text. In the domain of decision-making under risky, re-
searchers have found that individuals possessing high NFC scores are more likely to answer
several risky choice framing problems consistently [38, 82], presumably because they ex-
ert effort to notice similarities or contradictions between different problems with similar
structure. This explanation of these findings is supported by evidence that within-subjects
comparisons between stimuli can lead subjects to censor gist-based responses when con-
tradictions are detected, thus encouraging subjects to focus on more detailed features [27].
Similarly, research has shown that some human subjects have difficulty making determina-
tions about whether models are ”fair” or ”just” – both categorical gists – in the abstract (i.e.,
absent important context), and instead compare those explanations to prior experience or to
a second system, enabling ordinal comparisons (”more fair/just” vs. ”less fair/just”) [12].
For this reason, Mittelstadt and colleagues [97] argue that models should be contrastive to
facilitate interpretability. However, these authors also take pains to emphasize that such
17
______________________________________________________________________________________________________
contrastive explanations frequently miss important context – i.e., they spur reliance on de-
contextualized verbatim representations.
The above discussion implies that there is no single measure for interpretability or ex-
plainability that applies to all humans; however, there may be a measure that can be de-
fined relative to the expected distribution of skills and personality traits for each intended
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
audience. Future work should therefore focus on characterizing these factors within user
communities.
18
______________________________________________________________________________________________________
Fuzzy-Trace Theory explains these contradictory findings with the core theoretical dis-
tinction between gist and verbatim mental representations, that are encoded distinctly, yet
in parallel (gist representations are not derived from verbatim representations). Although
humans prefer to rely on gists, they also encode, and can therefore recognize, verbatim
representations. In contrast, algorithms are verbatim by their very nature. Thus, humans
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
working together with ML algorithms may get the best of both worlds – applying gist-based
structured background knowledge to interpret association-based algorithmic output.
19
______________________________________________________________________________________________________
mental representation [15] meaning that recognition does not guarantee that a decision will
rely on expert intuition. Whereas recognition is a rote verbatim strategy (theorized by as-
sociationism), gist representations bring background knowledge to bear, contextualizing
scenarios such that they make sense and therefore providing insight to the human decision-
maker. In fact, an extensive body of literature shows that humans can recognize both gist
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
and verbatim representations in parallel, and yet prefer to rely on the gist when making
decisions [18, 19, 115, 119].
Thus, an extensive body of literature supports the contention that Fuzzy-Trace Theory
is both more parsimonious and more predictive than competing theoretical accounts of the
role of interpretation in judgments and decisions. These findings apply to both texts, such
as are found in the legal reasoning domain, and numerical stimuli such as are found in the
engineering domain [90] or generated by machine learning models.
The above discussion emphasizes that interpretability and explainability are functions of
the user, the use case, and other contextual factors, as much as they are functions of the
system being used. However, the psychometric properties of users are generally not under
designers’ control. Here, we discuss the state-of-the-art for explainable AI algorithms, and
how systems might be designed to promote interpretability and explainability.
20
______________________________________________________________________________________________________
though explainability and interpretability are sometimes used interchangeably in the com-
puter science literature, this review provides data supporting the contention that “in [the]
ML community the term ‘interpretable’ is more used than ‘explainable”’ (p. 52141; see
also [42]), especially when compared to the usage of these terms by the general public.
Consistent with the psychological definitions outlined above, this finding may indicate that
producers of AI products are more able to interpret the output of these systems because
they posses specialized background knowledge. Indeed, Bhatt et al. [11] posit that this
distinction may belie a difference in the design goals of these user groups: algorithm de-
velopers generally seek explanations so that they may debug or otherwise improve their
algorithms, and they might therefore develop explainable AI tools for that purpose. Thus,
an explanation is generally understood by computer scientists to indicate how a computa-
tional system arrived at, or generated, a certain output. A good explanation is often causal
and justified relative to a system’s implementation – e.g., “the algorithm is biased towards
visa application refusal BECAUSE the training data are unbalanced”. This sort of expla-
nation is quite useful for debugging these complex systems, but only if the user has the
appropriate background knowledge and technical expertise to do so.16 For example, the
explanation given above would lead a developer to gather more balanced data and retrain
the algorithm, but would not suggest an immediate action to an end user, except perhaps to
abandon use of the algorithm.
16 Considerour prior analogy of an automotive system. Just as a diagnostic code on a modern automobile can
be used to determine whether a car’s check engine light is on because of an electrical problem in the sensor,
or because of a legitimate engine malfunction, a tradition within computer science identifies an explanation
with an outcome that enables other trained technical experts to debug, assess, or otherwise improve a faulty
system. Nevertheless, interpreting a car’s diagnostic code, and knowing what to do next, usually requires
some specialized expertise (either mechanical, computational, or both – or sometimes, just deep familiarity
with the system).
21
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
Fig. 5. An example of output from LIME which emphasizes text features that led a specific
paragraph to be classified as about atheism, rather than Christianity. The original image may be
found at this URL: https://github.com/marcotcr/lime/blob/master/doc/images/twoclass.png
22
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
Fig. 6. An example of output from LIME which emphasizes input features that are diagnostic of a
specific image classification. In this case, a picture of a husky was incorrectly classified as a wolf
because of the presence of snow in the background. Such information may help tool developers
debug overfit classifiers. This image was originally presented in [125]
classifier in Figure 6 would make accurate predictions on images that did not have snow in
their backgrounds.
The authors of LIME argue that these simplified models (e.g., regression models with
a small number of coefficients) are inherently more interpretable because they “provide
qualitative understanding between the input variables and the response.” While this goal
is broadly consistent with Fuzzy-Trace Theory’s definition of gist, gists, when educated,
capture a human expert’s insight regarding which features are most likely to generalize.
Techniques such as LIME may aid humans in generating these representations, and indeed,
preliminary experiments seem to suggest that human subjects could use these techniques
to remove features that interfered with predictive accuracy – i.e., they could make a better
classifier – and that a small sample of human subjects with with data science expertise
(and, in particular, familiarity with the concept of spurious correlation) might be able to
use LIME to derive better explanations.
SHapley Additive exPlanations (SHAP). Like LIME, the SHAP [87] family of models
start from the premise that “The best explanation of a simple model is the model itself” and
attempts to therefore represent complex models with simpler models. SHAP therefore
returns importance scores for each feature, which are analogous to regression coefficients.
For a given prediction, SHAP scores indicate how much any of these features contributed
to that prediction, as in Figure 7.
As such, SHAP models possess many of the same strengths and weaknesses of LIME,
23
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
Fig. 7. An example of output from SHAP which indicates which indicates the model’s baseline
value, the marginal contributions of each of its features, and the final prediction. Such an approach
is analogous to a graphical interpretation of linear regression coefficients. The original image may
be found at https://github.com/slundberg/shap
albeit with the ability to generalize to a larger class of machine-learning models. These
models are verbatim in the most concrete sense – they output a set of rules (feature impor-
tance scores), which may be applied in a rote manner to generate a post-hoc description of
the desired prediction. However, they do not communicate causal mechanisms, and they
are prone to unknown error as the model is applied outside of the local neighborhood of a
specific prediction. Individual human subjects – such as informed practitioners – that have
the willingness and the ability to examine these findings in depth may be able to leverage
their own background knowledge to generate an explanation, but SHAP does not provide
enough information to help these practitioners figure out when the model no longer applies.
In effect, these techniques provide human users with a stimulus that they must then explain
or interpret whereas true “black box” models do not even provide this stimulus.
Explainable Neural Networks. Whereas SHAP and LIME seek to explain complex
models using a regression-like paradigm (i.e., a linear additive function), Explainable Neu-
ral Networks (XNNs) [144] use a more general formulation based on an “additive index
model” [127]. Here, the algorithm seeks to return a function that describes how model pre-
dictions vary with changes to individual parameters (or, more recently, pairs of parameters
[148]). As in LIME and SHAP, these models can help data scientists with the appropriate
training to understand how changing a specific feature might change the model’s predic-
tion, albeit at the risk of inferring spurious correlations. These approaches have especially
been applied to deep neural network models, in which one neural network is used to pro-
vide a simplified representation of another, and then rendered into a table that is analogous
to an Analysis of Variance, showing main effects and, in some cases, two-way interactions
[35].
However, LIME is not without limitations: the explanations that analysts might draw
from applying these tools may, themselves, be based on spurious correlations or may engen-
der false confidence in model predictions outside the scope of the immediate neighborhood
of the datapoint that LIME is attempting to explain17 . Worse, these misleading explana-
tions may be engineered by adversaries seeking to take advantage of humans’ tendency to
17 Precisely this sort of inappropriate model generalization led NASA engineers to conclude that an impact
from foam insulation upon the space shuttle Columbia’s wing from its external tank did not pose a threat,
precipitating the loss of the space shuttle upon re-entry into Earth’s atmosphere [16]
24
______________________________________________________________________________________________________
25
Fig. 8. An example of output from Grad-CAM, indicating which pixels in an image are diagnostic
of the predicted class (dog or cat). The original image may be found at [129]
26
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
______________________________________________________________________________________________________
Table 2. Example of SBRL output, which seeks to explain whether a customer will leave the
service provider. PP = Probability that the label is positive. Source: [147]
primarily correlational in nature and may help domain experts to select features – for exam-
ple, the authors note that pneumonia readmission risk decreases, rather than increases, with
asthma – a counterintuitive finding. This model surfaces that finding. However, domain
experts must then explain that finding post-hoc, as follows:
[P]atients with a history of asthma who presented with pneumonia usually were
admitted not only to the hospital but directly to the ICU (Intensive Care Unit).
The good news is that the aggressive care received by asthmatic pneumonia pa-
tients was so effective that it lowered their risk of dying from pneumonia com-
pared to the general population. The bad news is that because the prognosis
for these patients is better than average, models trained on the data incorrectly
learn that asthma lowers risk, when in fact asthmatics have much higher risk
(if not hospitalized)[34].
The discussion above indicates that these concerns apply to explainability – where the
goal is to assist a data scientist to understand how a model works – but may apply less to
interpretability, where the aim is fundamentally to assist a decision maker to link a model
output to a meaningful distinction that allows them to leverage their values, goals, and
preferences to make a choice. Specifically, the explanation given above might help a user
to debug the model, or even to decide whether or not to trust the model; however, it may
not explicitly provide the user with meaningful information that can inform their ultimate
treatment decision.
Monotonically Constrained Gradient-Boosting Machines Gradient-Boosting Ma-
chines seek to use an ensemble of “weak learners” – i.e., models with low predictive
accuracy – to jointly make accurate predictions. This approach leads to significant im-
provements in predictive capability, at the cost of model complexity. In order to cope with
this complexity, Monotonically Constrained Gradient-Boosting Machines impose a con-
straint that any given feature in the model must have a monotonic relationship with the
output. This is theorized to increase explainability because these monotonic relationships
restrict the relationship between features and predictions to have clear qualitative directions
– an increase in the feature must consistently lead to either an increase or decrease in the
27
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
Fig. 9. An example of output from a GA2 M which indicates how several features (horizontal axes)
vary with relative risk of readmission for pneumonia at 30 days (vertical axis). Pairwise
interactions are shown in the heatmaps at the bottom of the figure. The original image is in [34].
28
______________________________________________________________________________________________________
prediction. As above, these models assume that simpler functional forms are inherently
more explainable. However, these models, in their current form, may simply apply a form
of regularization that is not necessarily grounded in domain knowledge. Monotonicity may
be appropriate in some cases – such as a dose-response curve – but not in others – such as
in modeling waves or other sinusoidal behavior. Domain knowledge is required to deter-
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
mine whether monotonicity constraints, or any other constraints, are appropriate. Absent
that domain knowledge, application of such constraints may indeed simplify the model, but
may do so in a misleading way that can foster inference of incorrect explanations.
29
______________________________________________________________________________________________________
relationships may mislead users into imputing a causal mechanism within the model where
none exists. This problem is not limited to computational systems, but is a general fea-
ture of complex engineered system with multiple interacting parts [29]. Thus, future work
in explainable artificial intelligence may productively focus on how to help data scientists
and domain experts to accurately impute causal claims while avoiding drawing inferences
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
30
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
Fig. 10. A visualization of Latent Dirichlet Allocation output from [13]. Probabilistic topic models
such as LDA map each word in a text corpus to a topic. The most frequent words in that topic are
then presented to humans for interpretation.
that are supposed to contain semantic content held in common across multiple documents.
In practice, these topics are actually a probability distribution over words in the text corpus
upon which the LDA model is trained. Humans use topic models by inspecting the top
words, or top documents, for any given topic and then assigning meaning to those top-
ics [59], with some even going so far as to claim that topic models explicitly measure the
gist of text [60]. However, more recent work has shown that humans have difficulty inter-
preting some topic model output [36], especially when they are not familiar with how the
algorithm works [83]. Although computer scientists have developed measures to improve
the coherence (hypothesized to increase interpretability) of topic model output [81, 96],
the resulting output does not explicitly provide an interpretation to human users, but re-
mains a list of words with associated topic probabilities, which humans must interpret (see
Figure 10). Nevertheless, topic models are perhaps unique among ML algorithms in that
their users have attempted to explicitly engineer interpretability into their structure and
output using tasks that are evaluated by non-expert humans with no knowledge of how the
algorithm works. Future work should focus on evaluating this approach and potentially
applying it to other algorithmic paradigms.
How might one evaluate the explainability and interpretability of AI systems in a manner
that is psychologically plausible? How might we design systems that satisfy these psycho-
31
______________________________________________________________________________________________________
proaches to evaluating the quality of mathematical models, and “rational” behavior more
broadly.
32
______________________________________________________________________________________________________
tained. This is analogous to the machine learning paradigm, which emphasizes prediction
over explanation [131, 149]. Standard machine learning techniques aim to optimize spe-
cific predictive metrics, such as accuracy, precision, recall, F-score, etc. Furthermore, any
number of algorithms may be employed regardless of whether the underlying theory of the
algorithm is a good description of the process generating the data. This approach is consis-
tent with Hammond’s definition of correspondence since it privileges predictive accuracy
over a specific causal theory. Deep neural nets, in particular, have been criticized – but also
lionized – because they often achieve significant predictive predictive performance at the
cost of explainability. Thus, like ML, the weaknesses of the correspondence approach are
fundamentally connected to low explainability – a method may achieve the right answers
for the wrong reasons – i.e., dut to spurious correlation – thus, there is no confidence that
future model results will be correct. As Hammond [62] states,
Scientific research seeks both coherence and correspondence, but gets both
only in advanced, successful work. Most scientific disciplines are forced to
tolerate contradictory facts and competitive theories...But policymakers find
it much harder than researchers to live with this tension because they are ex-
pected to act on the basis of information. (p. 54).
33
______________________________________________________________________________________________________
a black box model) is also not required. This middle road is achieved by experts’ com-
municating the gists of their decision-making processes, rather than trying to explain all of
the details of their structured mental models. Specifically, experts can communicate how
what they are doing is consistent with the values of users in easy-to-understand categori-
cal terms while not necessarily possessing the ability to describe the exact mechanisms in
every detail. We propose grey box model design as a goal for interpretable AI.
Human decision-making exhibits varying degrees of coherence and correspondence.
Specifically, experts’ gist representations tend to correspond to better real-world outcomes,
exhibiting correspondence; however, experts may violate coherence [7] – i.e., they can
provide an explanation for an action in a particular context, but that explanation may not
necessarily generalize to all contexts, roughly analogous to a linear estimate of a nonlinear
model. Unlike these linear estimates which provide explanatory power over a narrow pa-
rameter range, experts can be queried for their rationales. The above discussion emphasizes
how, far from being a unique feature of machine learning models, high correspondence with
low explainability may also be a feature of some types of human expertise. In fact, there is
a significant body of literature in engineering management that discusses the “tacit” nature
of human expertise [100–102, 105, 106]. In other words, like the most complex models,
human experts may not be consciously aware of how they have obtained a certain outcome.
Nevertheless, they are often able to describe why they did what they did – for example
expert tennis players were more likely justify their actions relative to the goals of the game
whereas novices focused more conscious attention on the mechanics of executing specific
maneuvers [92]. Thus, experts’ decisions show a high degree of empirical correspondence
despite being subject to “biases” under predictable circumstances [112]. Furthermore, sub-
ject matter experts tend to rely on, consume, and prefer to use model output that interprets,
rather than explains, relevant results.
This discussion suggests that designers may be able to enhance interpretability by cre-
ating “grey-box models”, that can provide the rationale for a given decision relative to a
set of functional requirements. In section 4.2, we argue that this goal is aligned with has
long dealt with similar problems when attempting to integrate large-scale systems from
across several different complex domains of expertise into a common artifact to be used by
consumers, including policymakers, with varying levels of technical sophistication [21].
34
______________________________________________________________________________________________________
supporting this claim. This literature primarily focuses on drawing an analogy between the
concept of mental representation and levels of abstraction.
Computer science depends on successive levels of abstraction for its successful oper-
ation simply because the operations of computational systems are far too complex, at the
physical level, to explain to even the most expert computer scientists. Consider that a com-
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
35
______________________________________________________________________________________________________
Finally, recent findings [47] show that increasing psychological distance, including by pro-
viding more abstract representations – lead to better decisions because these more distant
representations make use of gist interpretations. This has direct implications for design:
interpretable AI systems – i.e., those that help end users to make meaningful decisions –
may, in fact, be less explainable, and vice versa, at least for end users without data science
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
expertise.
36
______________________________________________________________________________________________________
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
Functional Purpose
(Loan
How? recommendation)
Why?
Abstract Function (Credit
score determination)
How?
Why?
General Function (Deep neural net
implementation)
How?
Why?
Physical Function (computation)
How?
Why?
Physical Form (workstation and data storage units)
Fig. 11. An example of an Abstraction Hierarchy for a ML system designed to make loan
recommendations. Each higher level implemented by the level immediately below it and each
lower level implements a technical solution to carry out a function specified by the higher level.
edge base (usually, a programming language, but see also [20, 45, 91]). However, the term
abstraction is itself polysemous (for a review, see [30]) and we emphasize that the form of
abstraction that we refer to is not just simplification, but rather is augmented by the user’s
background knowledge [120].
37
______________________________________________________________________________________________________
that have a meaning in terms of the system’s ultimate function; not its implementation.
This form of design is familiar to machine learning researchers who frequently use several
different algorithms to obtain the same function. For example, the function may be classi-
fication, which has well-defined metrics – precision, recall, accuracy, etc. – with the best
metric selected based on the task. Given the function required, and the metric benchmark
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
that the system is supposed to achieve (i.e., the requirement), system designers may use
any number of algorithms to achieve this function. For classification, candidate algorithms
might include logistic regression classifiers, Naive Bayes classifiers, support vector clas-
sifiers, k-nearest neighbor classifiers, convolutional neural nets, etc., all perform the same
function, yet using very different implementations.
The concept of solution neutrality is equally familiar to legal scholars and regulators
and legal scholars, who have significant experience with evaluating complex systems, such
as human-drug interactions, in other highly technical domains [80, 140]. Thus, the key to
a good design is matching the system’s evaluation metric to the end-user’s goals. Stan-
dard ML metrics, such as those defined above, may be insufficient to address the system’s
functional requirements. Systems engineering, and especially requirements engineering, is
the discipline that focuses on evaluating end-user needs and translating these into a suite
of metrics. However, for sufficiently complex systems, groups of engineers must pool
their knowledge to achieve better design outcomes. Attempts to generate unified models
seamlessly integrating input from different engineering fields (“Model-Based Systems En-
gineering”) have yielded mixed results [21]. In contrast, techniques that seek to translate
meaningful information between engineering specialties are both more widely accepted by
practicing engineers. Although there is no guarantee that these techniques generate op-
timal outcomes, they are, at a minimum, acceptable [22, 69, 70]. Thus, other fields of
engineering have developed methods to ensure that expert input from multiple fields can
be integrated into a larger, complex project. Specifically, systems engineering is the field
of engineering that is concerned with coordinating these multiple experts from multiple
domains. Since, nowadays, no one person can be an expert in all fields of human inquiry,
systems engineers have developed a set of tools and techniques to increase the confidence
with which expertise is deployed when developing complex systems.
5. Conclusion
Our review indicates that explainability and interpretability should be distinct requirements
for ML systems, with the former primarily being of value to developers who seek to de-
bug systems or otherwise improve their design, while the latter more useful to regulators,
policymakers, and general users. However, this dichotomy is not absolute – individual dif-
ferences may be associated with reliance on one representation of a system more so than
another. For developers, the explanation may be very similar to the interpretation – similar
to how numerate individuals derive decisions that are informed by verbatim calculations
– however, if developers lack domain expertise, they may be unable to contextualize their
interpretations in terms that are meaningful to end users. Similarly, end users lacking de-
38
______________________________________________________________________________________________________
velopers’ expertise may be unable to understand how the system arrived at its conclusion
(but they also may not desire to).
The history of modern engineering demonstrates that systems can be designed to facilitate
both explainability and interpretability and, indeed, there are examples of such throughout
recent history, ranging from drug approvals to the automotive sector. Although significant
effort has been devoted to developing automated approaches to create explanations of AI
algorithms, comparatively little attention has focused on interpretability. Since interpreta-
tions differ between individuals, more research is needed in order to determine how best
to link model output to specific gists so that users can appropriately contextualize this out-
put. The extent to which this process can be fully automated, or would require curation by
domain experts, remains an open question.
The above discussion indicates that interpretable algorithms are those that contextual-
ize data by placing it in the context of structured background knowledge, representing it
in a simple manner that captures essential, insightful distinctions, and then justifying the
corresponding output relative to values elicited from human users. Such representations
contextualize the model’s output and provide meaning to the human user in terms of val-
ues stored in long-term memory. Typically, these values (or similar preference-generating
constructs, such as goals) cannot be elicited directly from data based on rote, brittle, ver-
batim association. Thus, techniques to simplify complex models are likely to share this
brittleness [98]. In contrast, gist representations are simple, yet flexible and insightful; they
bring to bear contextual elements – such as goals and values – that are not explicitly rep-
resented in the data. Future work may therefore productively focus on eliciting these gist
representations from experts in the form of mappings from structured background knowl-
edge to meaningful, yet simple, categories – associated with goals, values, principles, or
other preferences. Previous “expert systems” lacked the ability to scale precisely because
of the difficulty eliciting these essential distinctions [33]. To move beyond this impasse,
the discussion in this paper highlights the need to account for multiple levels of mental
representation when generating interpretable AI output. In short, rather than assimilating
human cognition to machine learning, we might benefit from designing machine learning
models to better reflect empirical insights about human cognition [116]. In between a ver-
batim, data-driven approach, and an inflexible top-down schematic approach lies one in
which human users engage in a process of contextualizing model output that is then used
to select between, and refine, existing background knowledge structures. There is some
preliminary evidence that these “communicative” approaches — in which human users in-
teract with and curate the output of AI systems — may show some promise. Furthermore,
users’ needs vary with individual differences, e.g., in metacognition and training. Future
work should therefore focus on characterizing these factors within user communities. This
review provides the theoretical basis for such an approach, and provides explicit directions
for future work: such approaches must communicate the gist of the data to the user.
39
______________________________________________________________________________________________________
Acknowledgments
The authors thank Andrew Burt, Carina Hahn, Patrick Hall, Sharon Laskowski, P. Jonathon
Phillips, Mark Przybocki, Valerie F. Reyna, Reva Schwartz, Brian , and Paul Witherell for
their insightful comments and discussions.
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
References
[Com]
[Cen] Centor score (modified/mcisaac) for strep pharyngitis - mdcalc. https://
www.mdcalc.com/centor-score-modified-mcisaac-strep-pharyngitis. (Accessed on
10/07/2020).
[Wor] Worried your sore throat may be strep? — cdc. https://www.cdc.gov/groupastrep/
diseases-public/strep-throat.html. (Accessed on 10/07/2020).
[4] Abbott, R. (2006). Emergence explained: Abstractions: Getting epiphenomena to do
real work. Complexity, 12(1):13–26.
[5] Abbott, R. (2007). Putting complex systems to work. Complexity, 13(2):30–49.
[6] Adadi, A. and Berrada, M. (2018). Peeking Inside the Black-Box: A Survey on Ex-
plainable Artificial Intelligence (XAI). IEEE Access, 6:52138–52160.
[7] Adam, M. B. and Reyna, V. F. (2005). Coherence and correspondence criteria for
rationality: Experts’ estimation of risks of sexually transmitted infections. Journal of
Behavioral Decision Making, 18(3):169–186.
[8] Alba, J. W. and Hasher, L. (1983). Is memory schematic? Psychological Bulletin,
93(2):203.
[Ammermann] Ammermann, S. Adverse action notice requirements under the ecoa and
the fcra - consumer compliance outlook: Second quarter 2013 - philadelphia fed.
[10] Axelrod, R. (2015). Structure of decision: The cognitive maps of political elites.
Princeton university press.
[11] Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., Jia, Y., Ghosh, J., Puri, R.,
Moura, J. M. F., and Eckersley, P. (2019). Explainable machine learning in deployment.
arXiv:1909.06342 [cs, stat]. arXiv: 1909.06342.
[12] Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J., and Shadbolt, N. (2018).
“it’s reducing a human being to a percentage”; perceptions of justice in algorithmic
decisions. Proceedings of the 2018 CHI Conference on Human Factors in Computing
Systems - CHI ’18, page 1–14. arXiv: 1801.10408.
[13] Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM,
55(4):77–84.
[14] Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal
of machine Learning research, 3(Jan):993–1022.
[15] Bless, H., Betsch, T., and Franzen, A. (1998). Framing the framing effect: The impact
of context cues on solutions to the ‘asian disease’problem. European Journal of Social
Psychology, 28(2):287–291.
40
______________________________________________________________________________________________________
41
______________________________________________________________________________________________________
[35] Chen, J., Vaughan, J., Nair, V. N., and Sudjianto, A. (2020). Adaptive explainable
neural networks (axnns). arXiv:2004.02353 [cs, stat]. arXiv: 2004.02353.
[36] Cheng, H.-F., Wang, R., Zhang, Z., O’Connell, F., Gray, T., Harper, F. M., and Zhu, H.
(2019). Explaining Decision-Making Algorithms through UI: Strategies to Help Non-
Expert Stakeholders. In Proceedings of the 2019 CHI Conference on Human Factors
in Computing Systems, CHI ’19, pages 1–12, Glasgow, Scotland Uk. Association for
Computing Machinery.
[37] Cozmuta, R., Wilhelms, E., Cornell, D., Nolte, J., Reyna, V., and Fraenkel, L. (2018).
Influence of explanatory images on risk perceptions and treatment preference. Arthritis
care & research, 70(11):1707–1711.
[38] Curseu, P. L. (2006). Need for cognition and rationality in decision-making. Studia
Psychologica, 48(2):141.
[39] De Weck, O. L., Roos, D., and Magee, C. L. (2011). Engineering systems: Meeting
human needs in a complex technological world. Mit Press.
[40] Dick, J., Hull, E., and Jackson, K. (2017). Requirements engineering. Springer.
[41] Diehl, J. J., Bennetto, L., and Young, E. C. (2006). Story recall and narrative coher-
ence of high-functioning children with autism spectrum disorders. Journal of abnormal
child psychology, 34(1):83–98.
[42] Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable
machine learning. arXiv preprint arXiv:1702.08608.
[43] Edwards, L. and Veale, M. (2018). Enslaving the algorithm: From a “right to an
explanation” to a “right to better decisions”? IEEE Security Privacy, 16(3):46–54.
[44] Fagerlin, A., Zikmund-Fisher, B. J., Ubel, P. A., Jankovic, A., Derry, H. A., and
Smith, D. M. (2007). Measuring numeracy without a math test: development of the
subjective numeracy scale. Medical Decision Making, 27(5):672–680.
[45] Feldman, J. (2009). Bayes and the simplicity principle in perception. Psychological
Review, 116(4):875.
[46] Frederick, S. (2005). Cognitive reflection and decision making. The Journal of Eco-
nomic Perspectives, 19(4):25–42.
[47] Fukukura, J., Ferguson, M. J., and Fujita, K. (2013). Psychological distance can
improve decision making under information overload via gist memory. Journal of Ex-
perimental Psychology: General, 142(3):658.
[48] Galesic, M. and Garcia-Retamero, R. (2011). Graph literacy: A cross-cultural com-
parison. Medical Decision Making, 31(3):444–457.
[49] Gallo, D. (2013). Associative illusions of memory: False memory research in DRM
and related tasks. Psychology Press.
[50] Gernsbacher, M. A. (1996). The structure-building framework: What it is, what it
42
______________________________________________________________________________________________________
The diverse representational substrates of the predictive brain. Behavioral and Brain
Sciences, 43:e121.
[53] Gilovich, T., Griffin, D., and Kahneman, D. (2002). Heuristics and biases: The
psychology of intuitive judgment. Cambridge university press.
[54] Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L. (2018).
Explaining explanations: An overview of interpretability of machine learning. In 2018
IEEE 5th International Conference on data science and advanced analytics (DSAA),
pages 80–89. IEEE.
[55] Gleaves, L. P., Schwartz, R., and Broniatowski, D. A. (2020). The role of individ-
ual user differences in interpretable and explainable machine learning systems. arXiv
preprint arXiv:2009.06675.
[56] Goebel, R., Chander, A., Holzinger, K., Lecue, F., Akata, Z., Stumpf, S., Kieseberg,
P., and Holzinger, A. (2018). Explainable ai: The new 42? In Holzinger, A., Kieseberg,
P., Tjoa, A. M., and Weippl, E., editors, Machine Learning and Knowledge Extraction,
Lecture Notes in Computer Science, page 295–303. Springer International Publishing.
[57] Goldman, S. R., McCarthy, K. S., and Burkett, C. (2015). 17 interpretive inferences
in literature. Inferences during reading, page 386.
[58] Graesser, A. C., Singer, M., and Trabasso, T. (1994). Constructing inferences during
narrative text comprehension. Psychological review, 101(3):371.
[59] Griffiths, T. L. and Steyvers, M. (2004). Finding scientific topics. Proceedings of the
National academy of Sciences, 101(suppl 1):5228–5235.
[60] Griffiths, T. L., Steyvers, M., and Tenenbaum, J. B. (2007). Topics in semantic repre-
sentation. Psychological review, 114(2):211.
[61] Hall, P., Gill, N., and Schmidt, N. (2019). Proposed guidelines for the responsible use
of explainable machine learning. arXiv:1906.03533 [cs, stat]. arXiv: 1906.03533.
[62] Hammond, K. R. (2000). Coherence and correspondence theories in judgment and
decision making. Judgment and decision making: An interdisciplinary reader, page
53–65.
[63] Hans, V. P., Helm, R. K., and Reyna, V. F. (2018). From meaning to money: Trans-
lating injury into dollars. Law and human behavior, 42(2):95.
[64] Hoffman, R., Miller, T., Mueller, S. T., Klein, G., and Clancey, W. J. (2018). Explain-
ing explanation, part 4: a deep dive on deep nets. IEEE Intelligent Systems, 33(3):87–95.
Publisher: IEEE.
[65] Hoffman, R. R. and Klein, G. (2017). Explaining explanation, part 1: theoretical
foundations. IEEE Intelligent Systems, 32(3):68–73. Publisher: IEEE.
[66] Hoffman, R. R., Mueller, S. T., and Klein, G. (2017). Explaining explanation, part 2:
Empirical foundations. IEEE Intelligent Systems, 32(4):78–86. Publisher: IEEE.
43
______________________________________________________________________________________________________
[67] Hoffman, R. R., Mueller, S. T., Klein, G., and Litman, J. (2019). Metrics for explain-
able ai: Challenges and prospects. arXiv:1812.04608 [cs]. arXiv: 1812.04608.
[68] Jones, N. A., Ross, H., Lynam, T., Perez, P., and Leitch, A. (2011). Mental models:
an interdisciplinary synthesis of theory and methods. Ecology and Society, 16(1).
[69] Katsikopoulos, K. V. (2009). Coherence and correspondence in engineering de-
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
sign: informing the conversation and connecting with judgment and decision-making
research. Judgment and Decision Making, 4(2):147–153.
[70] Katsikopoulos, K. V. (2012). Decision methods for design: insights from psychology.
Journal of Mechanical Design, 134(8):084504–1–084504–4.
[71] Kintsch, W. (1974). The representation of meaning in memory.
[72] Klein, E. Y., Martinez, E. M., May, L., Saheed, M., Reyna, V., and Broniatowski,
D. A. (2017). Categorical risk perception drives variability in antibiotic prescribing in
the emergency department: A mixed methods observational study. Journal of General
Internal Medicine.
[73] Klein, G. (2008). Naturalistic decision making. Human factors, 50(3):456–460.
[74] Klein, G. (2018). Explaining explanation, part 3: The causal landscape. IEEE Intel-
ligent Systems, 33(2):83–88. Publisher: IEEE.
[KLEIN et al.] KLEIN, G., HOFFMAN, R., and MUELLER, S. Naturalistic Psycholog-
ical Model of Explanatory Reasoning: How People Explain Things to Others and to
Themselves.
[76] Klein, G. A., Orasanu, J., Calderwood, R., Zsambok, C. E., et al. (1993). Decision
making in action: Models and methods. Ablex Norwood, NJ.
[77] Kleinberg, J. and Mullainathan, S. (2019). Simplicity creates inequity: implications
for fairness, stereotypes, and interpretability. In Proceedings of the 2019 ACM Confer-
ence on Economics and Computation, pages 807–808.
[78] Kolmogorov, A. N. (1965). Three approaches to the definition of the concept “quan-
tity of information”. Problemy peredachi informatsii, 1(1):3–11.
[79] Kroll, J. A. (2018). The fallacy of inscrutability. Philosophical Transac-
tions of the Royal Society A: Mathematical, Physical and Engineering Sciences,
376(2133):20180084.
[80] Kroll, J. A., Barocas, S., Felten, E. W., Reidenberg, J. R., Robinson, D. G., and Yu, H.
(2016). Accountable Algorithms. University of Pennsylvania Law Review, 165(3):633–
706.
[81] Lau, J. H., Newman, D., and Baldwin, T. (2014). Machine reading tea leaves: Auto-
matically evaluating topic coherence and topic model quality. In Proceedings of the 14th
Conference of the European Chapter of the Association for Computational Linguistics,
pages 530–539.
[82] LeBoeuf, R. A. and Shafir, E. (2003). Deep thoughts and shallow frames: On the
susceptibility to framing effects. Journal of Behavioral Decision Making, 16(2):77–92.
[83] Lee, T. Y., Smith, A., Seppi, K., Elmqvist, N., Boyd-Graber, J., and Findlater, L.
(2017). The human touch: How non-expert users perceive, interpret, and fix topic mod-
els. International Journal of Human-Computer Studies, 105:28–42.
44
______________________________________________________________________________________________________
[84] Liberali, J. M., Reyna, V. F., Furlan, S., Stein, L. M., and Pardo, S. T. (2012). Individ-
ual differences in numeracy and cognitive reflection, with implications for biases and fal-
lacies in probability judgment. Journal of behavioral decision making, 25(4):361–381.
[85] Linderholm, T., Everson, M. G., van den Broek, P., Mischinski, M., Crittenden, A.,
and Samuels, J. (2000). Effects of causal text revisions on more- and less-skilled readers’
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
45
______________________________________________________________________________________________________
[103] Pennington, N. and Hastie, R. (1993). The story model for juror decision making.
Cambridge University Press Cambridge.
[104] Peters, E., Västfjäll, D., Slovic, P., Mertz, C. K., Mazzocco, K., and Dickert, S.
(2006). Numeracy and decision making. Psychological Science, 17(5):407–413.
[105] Polanyi, M. (1962). Tacit knowing: Its bearing on some problems of philosophy.
Reviews of modern physics, 34(4):601. 20.
[106] Polanyi, M. (1967). The tacit dimension.
[107] Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Vaughan, J. W., and Wal-
lach, H. (2019). Manipulating and measuring model interpretability. arXiv:1802.07810
[cs]. arXiv: 1802.07810.
[108] Rasmussen, J. (1985). The role of hierarchical knowledge representation in decision-
making and system management. IEEE Transactions on systems, man, and cybernetics,
(2):234–243.
[109] Rasmussen, J. and Lind, M. (1981). Coping with complexity. Technical Report
Risø-M-2293, Risø National Laboratory, Roskilde, Denmark.
[110] Reese, E., Haden, C. A., Baker-Ward, L., Bauer, P., Fivush, R., and Ornstein, P. A.
(2011). Coherence of personal narratives across the lifespan: A multidimensional model
and coding method. Journal of Cognition and Development, 12(4):424–462.
[111] Regulation, G. D. P. (2018). General data protection regulation (gdpr). Intersoft
Consulting, Accessed in October 24, 1.
[112] Reyna, V. (2018). When irrational biases are smart: A fuzzy-trace theory of complex
decision making. Journal of Intelligence, 6(2):29.
[113] Reyna, V. F. (2004). How people make decisions that involve risk: A dual-processes
approach. Current directions in psychological science, 13(2):60–66.
[114] Reyna, V. F. (2008). A theory of medical decision making and health: fuzzy trace
theory. Medical decision making, 28(6):850–865.
[115] Reyna, V. F. (2012). A new intuitionism: Meaning, memory, and development in
fuzzy-trace theory. Judgment and Decision making.
[116] Reyna, V. F. (2020). Of viruses, vaccines, and variability: Qualitative meaning
matters. Trends in Cognitive Sciences.
[117] Reyna, V. F. and Adam, M. B. (2003). Fuzzy-trace theory, risk communication, and
product labeling in sexually transmitted diseases. Risk Analysis, 23(2):325–342. 29.
[118] Reyna, V. F. and Brainerd, C. J. (1995). Fuzzy-trace theory: An interim synthesis.
Learning and individual Differences, 7(1):1–75.
[119] Reyna, V. F. and Brainerd, C. J. (1998). Fuzzy-trace theory and false memory: New
frontiers. Journal of experimental child psychology, 71(2):194–209.
[120] Reyna, V. F. and Broniatowski, D. A. (2020). Abstraction: An alternative neurocog-
46
______________________________________________________________________________________________________
nitive account of recognition, prediction, and decision making. Behavioral and Brain
Sciences, 43.
[121] Reyna, V. F., Chick, C. F., Corbin, J. C., and Hsia, A. N. (2014). Developmental
reversals in risky decision making intelligence agents show larger decision biases than
college students. Psychological Science, 25(1):76–84.
This publication is available free of charge from: https://doi.org/10.6028/NIST.IR.8367
[122] Reyna, V. F. and Farley, F. (2006). Risk and rationality in adolescent decision making
implications for theory, practice, and public policy. Psychological science in the public
interest, 7(1):1–44.
[123] Reyna, V. F. and Lloyd, F. J. (2006). Physician decision making and cardiac risk:
effects of knowledge, risk perception, risk tolerance, and fuzzy processing. Journal of
Experimental Psychology: Applied, 12(3):179.
[124] Reyna, V. F., Lloyd, F. J., and Brainerd, C. J. (2003). Memory, development, and
rationality: An integrative theory of judgment and decision making. Emerging perspec-
tives on judgment and decision research, pages 201–245.
[125] Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). ” why should i trust you?”
explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD
international conference on knowledge discovery and data mining, pages 1135–1144.
[126] Ribeiro, M. T., Singh, S., and Guestrin, C. (2018). Anchors: High-precision model-
agnostic explanations. In AAAI, volume 18, pages 1527–1535.
[127] Ruan, L. and Yuan, M. (2010). Dimension reduction and parameter estimation for
additive index models. Statistics and its Interface, 3(4):493–499.
[128] Rudin, C. (2019). Stop explaining black box machine learning models for high
stakes decisions and use interpretable models instead. arXiv:1811.10154 [cs, stat].
arXiv: 1811.10154.
[129] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D.
(2019). Grad-cam: Visual explanations from deep networks via gradient-based localiza-
tion. arXiv:1610.02391 [cs]. arXiv: 1610.02391.
[130] Shadish, W. R., Cook, T. D., and Campbell, D. T. (2002). Experimental and quasi-
experimental designs for generalized causal inference. Wadsworth Cengage learning.
[131] Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3):289–310.
Zbl: 1329.62045.
[132] Simonite, T. (2020). How an algorithm blocked kidney transplants to black patients.
Wired.
[133] Singer, M. and Remillard, G. (2008). Veridical and false memory for text: A multi-
process analysis. Journal of Memory and Language, 59(1):18–35.
[134] Slack, D., Hilgard, S., Jia, E., Singh, S., and Lakkaraju, H. (2020). Fooling lime
and shap: Adversarial attacks on post hoc explanation methods. arXiv:1911.02508 [cs,
stat]. arXiv: 1911.02508.
[135] Slovic, P., Finucane, M. L., Peters, E., and MacGregor, D. G. (2004). Risk as anal-
ysis and risk as feelings: Some thoughts about affect, reason, risk, and rationality. Risk
Analysis: An International Journal, 24(2):311–322.
[136] Tomsett, R., Braines, D., Harborne, D., Preece, A., and Chakraborty, S. (2018).
47
______________________________________________________________________________________________________
48