Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

DRAFT

A Review on Explainable Artificial Intelligence


for Healthcare: Why, How, and When?
Subrato Bharati, M. Rubaiyat Hossain Mondal, Member, IEEE, and Prajoy Podder


Abstract—Artificial intelligence (AI) models are increasingly Index Terms— Explainable Artificial Intelligence, healthcare,
finding applications in the field of medicine. Concerns have been deep learning, medical care, medicine, medical imaging
raised about the explainability of the decisions that are made by
these AI models. In this article, we give a systematic analysis of I. INTRODUCTION
explainable artificial intelligence (XAI), with a primary focus on
models that are currently being used in the field of healthcare. The
literature search is conducted following the preferred reporting
items for systematic reviews and meta-analyses (PRISMA)
A dvances in Artificial Intelligence (AI) have taken place in
recent years, with the technology becoming more
complicated, more autonomous, and more sophisticated.
standards for relevant work published from 1 January 2012 to 02 As data grows exponentially, so does the speed at which
February 2022. The review analyzes the prevailing trends in XAI computer power advances. This combination gives a massive
and lays out the major directions in which research is headed. We
boost to the development of AI. People's lives have already
investigate the why, how, and when of the uses of these XAI models
and their implications. We present a comprehensive examination been impacted by AI's practical accomplishments in a range of
of XAI methodologies as well as an explanation of how a fields (e.g., voice recognition, self-driving cars, and
trustworthy AI can be derived from describing AI models for recommendation systems) [1-4]. AI can improve people's well-
healthcare fields. The discussion of this work will contribute to the being and health by increasing clinicians’ diagnostic work,
formalization of the XAI field. highlighting prevention options, and delivering tailored
treatment suggestions on electronic health records (EHRs).
Impact Statement — XAI is emerging in various sectors and the Despite the employment of several useful technologies in health
literature has recognized numerous concerns and problems.
Security, performance, vocabulary, evaluation of explanation, and
care [5, 6], AI is not widely deployed [7, 8]. As a result, AI,
generalization of XAI in the field of healthcare are the most machine learning (ML) and deep learning (DL) algorithms
significant problems recognized and discussed in the literature. continue to be mysterious in many cases [9-11]. Decision-
The paper has identified a number of obstacles associated with making processes are generally hard to explain. Moral and
XAI in the healthcare area, including System Evaluation, practical challenges arise. Without understanding AI diagnoses,
Organizational, Legal, socio-relational, and Communication it is impossible to tell if diagnostic disparities between patients
issues. Although a few studies in the literature have proposed some
answers for the stated problems, others have characterized it as a
reflect diagnostically important variations or bias, diagnostic
future research gap that must be filled. In healthcare, the errors, or under/over-diagnosis.
organizational difficulty posed by XAI can be mitigated via AI
deployment management plans. The doctor's understanding of A number of governmental and commercial organizations
how patients would interpret the system and double-checking have recently released recommendations and guidelines for the
health facts with patients might help alleviate communication appropriate use of AI in response to these issues. The study,
problems. The socio-organizational issue or difficulty can be
overcome through patient instruction that facilitates the use of
however, reveals a considerable discrepancy in the alleged
artificial intelligence. The applications of AI in healthcare, transparency characteristics. Many critical difficulties in
including diagnosis and prognosis, medication discovery, medicine may be resolved with the use of AI. Numerous cases
population health, healthcare organization, and patient-facing have gained pace in recent years as a result of significant
apps, have been reviewed critically. This article addresses a research on automated prognosis, diagnosis, testing, and
number of AI's limitations, including the difficulties of medicine formulation [12-14]. Along with genetic data,
authenticating AI's output. Active collaboration between
physicians, theoretical researchers, medical imaging specialists,
biosensors, and medical imaging, these sources create a large
and medical image analysis specialists will be crucial for the future quantity of data [15-17]. A critical objective of AI in medicine
development of machine learning and deep learning techniques. is to personalize medical decisions, health practices, and
This paper contributes to the body of knowledge in identifying medications for unique people. Nevertheless, the current state
different techniques for XAI, in determining the issues and of AI in medicine has been described as "strong on promise but
difficulties associated with XAI, and finally in reviewing XAI in rather lacking in evidence and demonstration". Real-world
healthcare.
clinical settings have been tested to detect wrist fractures,

Subrato Bharati was with the Institute of Information and Communication M. Rubaiyat Hossain Mondal is with the Institute of Information and
Technology, Bangladesh University of Engineering and Technology, Dhaka- Communication Technology, Bangladesh University of Engineering and
1205, Bangladesh, and now with the Department of Electrical and Computer Technology, Dhaka-1205, Bangladesh (e-mail: rubaiyat97@iict.buet.ac.bd).
Engineering, Concordia University, Montreal, QC, Canada. (e-mail: Prajoy Podder is with the Institute of Information and Communication
subratobharati1@gmail.com) (Corresponding Author). Technology, Bangladesh University of Engineering and Technology, Dhaka-
1205, Bangladesh (e-mail: prajoypodder@gmail.com).
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current
or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective
works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Bharati et al.

diabetic retinopathy, cancer metastases, histologic breast, Figure 1 depicts the literature search conducted for this
congenital cataracts, and colonic polyps; however, many of the article in accordance with the PRISMA (preferred reporting
systems that have been demonstrated to be equivalent to or items for systematic reviews and meta-analyses) standards.
superior to experts in experimental settings have demonstrated PRISMA guidelines and the expanded version of PRISMA
high false-positive rates in real-world clinical settings [18]. scoping reviews were used to create this scoping review (PRIS-
Additionally, prejudice, security, privacy, and a deficit of MA-ScR). The number of records considered, records included,
transparency exist, as do trust, fairness, informativeness, and records deleted for further evaluation are shown in a flow
transferability, and causality [19]. Numerous critical difficulties diagram in Figure 1. After conducting a thorough literature
in medicine may be resolved with the use of AI [3, 20-25]. review, all relevant research publications were found. The dates
Together with medical imaging, biosensors, genetic data, and 1 January 2012 to 02 February 2022 were used as the unit of
electronic medical records, these sources create a large quantity measurement. Records in English were considered, while those
of data [18]. To make a precise diagnosis in precision medicine, in other languages were eliminated. Records were examined to
doctors need substantially more information than a simple see whether they met the requirements of the PICO (patient,
binary prediction [19, 27]. ML in medicine has grown more intervention, comparator, and outcome) study. Initial attention
concerned with explainability, interpretability, and openness was given to studies that met all four of the following criteria:
[19]. There is sufficient evidence that ML-based methods are Using XAI for screening or diagnosing patients, as well as
helpful; however, their widespread adoption is unlikely until comparing it to conventional approaches, was shown to be
these issues are solved, most likely by providing systems with effective. Two Boolean operators, OR and AND, were used to
adequate explanations for their judgments [28]. Because of this, find the most important terms. Many terms were searched for,
it is difficult to develop generalizable solutions that can be used including XAI, explainable AI, interpretable AI, medical
in a variety of contexts. For example, various applications often medicine, medical, etc. In addition, the publications' titles,
have distinct interpretability and explainability demands [19, abstracts, and keywords were all taken into account while doing
26-28]. the search. No papers on XAI fundamental scientific,
epidemiology, or clinical aspects were reviewed. It was then
Explainable AI (XAI) and interpretable AI (IAI) are not the decided to eliminate any content that covered the same topic
same concepts. Although some studies used the terms XAI and twice. Extracting information and synthesizing data elements
IAI interchangeably, the difference and relations between these such as research objectives, and performance outcomes were
two are described in the literature in multiple ways. This is the next step in the process.
presented in the following. XAI outlines why the choice was
made but not how the decision was reached. The term IAI Elsevier (ScienceDirect), IEEE, MDPI, SpringerNature,
outlines how the choice was made but not why the criteria used Bentham Science, Wiley, Medrxiv, Arxiv and many more were
were reasonable [171]. Explainability means that a ML model among the reputable publishers and preprint servers that yielded
and its results can be explained in a way that a person can a total of 17,300 publications on XAI for medical care or
understand. XAI permits the explanation of learning models medicine purposes. The Appendix contains the search syntax
and focuses on why the system made a given decision, used in this article. Out of a total of 17,300 documents, 16,850
analyzing its logical paradigms. On the other hand, the were manually excluded because they did not correspond to our
interpretability of ML enables users to comprehend the results primary research area. Therefore, the healthcare or medicine or
of the learning models by revealing the rationale for its medical treatment application submitted by XAI did not match
decisions [171, 172]. According to some literature [173], the necessary criteria. In total, 450 papers were evaluated;
explainability is the concept that a ML model and its findings however, only 195 were chosen for inclusion in the research and
may be communicated to a person in words that make sense. In the report. From the 195 papers assessed, 110 were chosen for
one work, Adadi and Berrada et al. [173] described that inclusion in the summary of the findings as these reported the
interpretable systems are explainable if their processes can be performance measurements and the performance outcomes of
comprehended by humans, demonstrating that the idea of XAI for healthcare or medicine. To put it another way, any of
explainability is closely related to that of interpretability. the following problems have been covered by any of these 110
Nonetheless, Gilpin et al. The authors of [174] noted that publications. Our classification of XAI in healthcare and
interpretability and fidelity are both essential components of medicine into five categories synthesized by the method of the
explainability. They stated that a good explanation must be authors of [135], which includes (1) dimension reduction using
human-comprehensible (interpretability) and accurately XAI, (2) feature selection using XAI, (3) attention mechanism
characterize model behavior over the whole feature space using XAI, (4) knowledge distillation using XAI, and (5)
(fidelity). Interpretability is necessary to handle the social surrogate representations using XAI, is presented in this review.
interaction of explainability, whereas fidelity is necessary to Although some recent studies have reviewed XAI, ongoing
assist in confirming other model prerequisites or discovering surveys and studies are necessary because the field of XAI in
new explanations. In other words, the fidelity of an explanation healthcare is always changing. With this in consideration, this
relates to how reliably and exactly the behavior of a model is review paper extends the findings reported by previous survey
explained. So, an explanation is explainable if it is easily papers in the field of XAI [19, 26, 73, 80, 81, 87, 195].
understood by humans and accurately explains the model.
A Review on Explainable Artificial Intelligence for Healthcare 3

literature [35-37]. An AI-powered device [38] may assist in


clinical decision-making by delivering recent medical
information from conferences, manuals, and journals. AI
systems may also assist in eliminating therapeutic and medical
errors that are inescapable (such as those that are more
repeatable and objective) in human clinical practice [38-40]. An
AI system may also use data from a large number of patients to
provide real-time insights for outcome estimation and health
risk warning. For decades, rule-based algorithms have made
great strides and have been used to interpret ECGs, detect
disorders, choose the best treatment option for each patient, and
support clinicians in establishing diagnostic theories and
hypotheses in difficult instances of patients. Because they need
the explicit specification of necessitating human-authored
updates, decision rules and rule-based systems are both time
and cost-intensive to construct. Thus, the effectiveness of the
structures is limited by the extent to which past medical
knowledge encompasses higher-order relationships between
material published in different specialties [20, 41]. Likewise, it
was challenging to include a strategy that combines
probabilistic and deterministic reasoning methods to minimize
the suitable recommended therapy, prioritize medical theories,
and psychological environment [42-43]. In contrast to the 1st
generation of AI programs, which depended only on medical
information curation by specialists as well as the construction
Fig. 1. PRISMA guideline for manuscript selection for this review of state-of-the-art AI research, strict decision rules have utilized
The research mainly focuses on the emerging field of XAI ML methodologies, which can be used to explain complex
and its challenges in health care. This paper also reports some relationships [44]. By studying the patterns produced from all
IAI models, as some literature uses the terms XAI and IAI the labeled input-output pairings, ML algorithms learn to offer
interchangeably. The main contributions of this paper is in the appropriate output for a provided input in new cases [45].
presenting the answers to the following questions: Algorithms for supervised ML are designed to find the model
 Why use XAI in health care and medicine, and what parameters that minimize the discrepancy from their predicted
problems cannot be solved with standard AI but can only impacts in their training cases to the observed effects in these
be solved using XAI? situations, in the hopes that the contexts established will be
 What are the best ways to accomplish our XAI? transferable to other datasets. The test dataset may be used to
 When is the best time to perform XAI? determine the generalizability of the model. For an unlabeled
The remaining sections are organized as follows. Background collection of data, unsupervised learning may help to uncover
information is provided in Section II of this review work. In sub-clusters of the main dataset or recognize outliers in the
Section III, we go through the benefits of utilizing XAI as well dataset. As a result, it is worth noting that low-dimensional
as the appropriate times to do so. Section IV presents the XAI depictions of labeled datasets may be recognized in a
methods describing how it is used. In Section V, several supervised way. It is possible to design applications of AI that
different XAI application situations are broken down and allow for the earlier study of unrecognized patterns in data apart
examined. The overview of various XAI aspects and the most from the requirement to specify decision-making rules for every
recent research developments in XAI are covered in the next task or take into consideration sophisticated relations between
part, Section VI. In Section VII, the final thoughts are input elements. Because of this, ML has become the most
presented. commonly used approach for constructing AI applications [2,
46-48].
II. BACKGROUND The current resurgence of AI is largely due to the active
application of DL, which comprises training a deep neural
By mimicking human mental processes, AI presents a
network on large datasets [49]. Fortunately, the processing
paradigm shift in healthcare data-driven by increasing data
resources necessary for DL have been enabled by recent
availability in healthcare as well as fast developments in
improvements in computer processor design [50]. Due to the
mathematical methodologies. This article examines healthcare
establishment of a range of large-scale research [51, 52], data
AI applications that are currently being developed. General AI
collection platforms [53], these algorithms have only recently
techniques for healthcare are discussed in the literature [29],
become available. A wide range of DL-based models has been
[30], [31], while medical image analysis is described in [32-34].
created for prediction and diagnosis, image classification,
The consequences of AI have been disputed in the medical
biomarker identification, genomic interpretation, and
Bharati et al.

monitoring via automated robotic surgery wearables and life- unsolved [66]. It can be noted that risk and obligations are only
logging devices to improve digital healthcare [54]. Due to the two examples of concerns that are not recognized in other
quick expansion of AI, healthcare data may be used to construct domains when it comes to clinical interpretations [30, 67].
DL-based strong algorithms that can automate diagnosis and Letting AI algorithms make life-or-death medical choices
provide a more precise technique of medicine and health by without providing enough transparency and accountability is
customizing pharmaceuticals and aiming for services in a irresponsible [68]. A number of recent studies [6, 69-73] have
dynamic and timely manner. A non-exhaustive list of potential examined the issue of explainability in medical AI. COVID-19
uses is shown in Figure 2. Medical AI malpractice claims will classification and detection, emotion analysis, chest
continue to get precise instructions from the legal system as to radiography, as well as the relevance of interpretability in the
which agency is liable if they arise. As part of the AI medical industry have all been studied in detail [74-75]. Figure
framework, healthcare decisions are made in part by healthcare 3 depicts an overview of XAI system. Some research works rely
practitioners with malpractice insurance [31, 55, 56]. It is mostly on keeping the interpretability of AI models intact while
important to minimize bias while maximizing patient enhancing their performance via optimization and refining
satisfaction with the implementation of AI modules in these procedures [76-79]. The authors of [80] studied XAI in
areas [57, 58]. It is shown in Figure 2 that an auxiliary healthcare, whereas the authors of [81] studied XAI for
advancement of XAI might possibly solve the problem of electronic health records. Therefore, the purpose of this study is
limited sample learning by removing clinically insignificant to provide a synthesis of the categories of XAI in healthcare and
characteristics. DL models, on the other hand, may often medicine by analyzing them in terms of dimension reduction,
provide results that are beyond the comprehension of non- feature selection, attention mechanism, knowledge distillation,
experts. A DL model may have millions of parameters, which and surrogate representations utilizing XAI.
might make understanding what the machine monitors in
clinical health data, such as CT scans, challenging [3, 48 59].

Fig. 3. An overview of XAI system step by step

III. REASONS FOR USING XAI


Recently, AI methods like DL have played a revolutionary
role in healthcare, particularly in the areas of diagnostics and
surgery. Some deep-learning-based diagnosis jobs are even
Fig. 2. Applications of XAI
more accurate than those performed by doctors. However, the
DL model's lack of transparency makes it difficult to explain
Researchers have noted specifically that being a major
their results and implement them in clinical settings. A growing
restriction for AI in dermatology due to its inability to perform
number of experts in the intersection of AI and healthcare have
a tailored assessment by doctors [60]. There must be proof that
concluded that an AI model's capacity to be explained to
a high-performance DL algorithm genuinely recognizes the
humans is more important than the model's accuracy when it
correct region of the image and does not overemphasize
comes to its practical application in a clinical setting. It is
insignificant discoveries. In recent years, new ways to describe
important to understand how medical AI applications work
AI models have been created, including methods for visualizing
before they are adopted and used. As a result, there is an
them. Attention maps [61], class activation maps [62-63],
incentive to conduct a survey of the medical XAI, and XAI is
salience maps [64], and occlusion maps [65] are some of the
necessary for the acceptability of AI applications in medicine.
most frequently used levers in the field of visual perception.
Because the result is a picture, localization and segmentation
IV. XAI METHODS
techniques are easier to understand. A key problem in
understanding deep neural network (DNN) based models This section describes the methods of XAI described in the
trained on non-imaging medical data other than images is yet literature [12, 84, 85-87]. For example-based, attribution-based,
and model-based explanations, the qualities of explainability,
A Review on Explainable Artificial Intelligence for Healthcare 5

such as interpretability characterized by simplicity and clarity, and complexity of the principal effects.
as well as fidelity characterized by soundness and
B. Analyzing explanations based on attribution
completeness, are evaluated in this section.
Figure 3 depicts an overview of XAI system. Some research Attribution-based descriptions only give a limited
works rely primarily on keeping the interpretability of AI description of the model, and hence do not meet the requirement
models intact while enhancing their performance via of completeness. There were no metrics available to measure
optimization and refining procedures [76-79]. The work in[80] the completeness of an explanation based on attribution.
and [81] studied XAI for healthcare and electronic health However, the majority of attribution-based explanation
records, respectively. The works of [82] and [83] focus on evaluation approaches may be utilized to quantify soundness.
specialized applications, such as digital pathology. The effectiveness of the attribution technique may be
empirically evaluated. In certain cases, this is accomplished via
A. Analyzing explanations based on models the use of ground truth created artificially [66, 96]. Other
Model-based descriptions meet the completeness criteria approaches use perturbation analysis to measure performance
since they offer enough information to calculate the outcome decline [97,98]. It is the goal of axiomatic evaluations to
for a given input. Fidelity can be quantified by looking at the establish what an ideal attribution technique should accomplish,
proportion of predictions that are consistent with the task and then to assess whether this has been done. In order to see
model's post-hoc explanations. Global explanations, which give whether an explanation is affected by a nonsensical change in
a single reason for the whole model, are more likely to meet the the input data, the authors of [98] offer a test. Sensitivity and
criterion of clarity. Because of this, an example-based, implementation invariance are also examined by Sundararajan,
attribution-based, or global model-based approach is the best Taly, and Yan [99]. When a feature's non-zero attribution leads
option for providing clarity. Using rule overlap and feature to various forecasts for a given baseline, they designate an
space coverage, the researchers of [132] established a measure attribution approach as sensitive. Implementation invariant
of unambiguity that may be applied to this particular scenario. means that an attribution method always returns the same
On the other hand, local explanations are plagued by the issue attribution for functionally similar models (i.e., the similar
of the great variation in explanations for the same or output is produced by the similar input). Explanation continuity
comparable cases. Local model-based explanations' clarity and selectivity are essential qualities discussed by Montavon et
could not be measured using any existing metric. As a measure al. [100]. The authors of [101] report that an attribution
of model interpretability and as an estimate of model approach meets sensitivity-N when the total of the attributions
complexity [85, 94, 133], model size is often utilized. For for any subset of cardinality N features is equal to the change in
example, the number of features (the number of rules, non-zero the output produced by eliminating the features in the subset.
weights utilized in splits), as well as the complexity of
C. Analyzing explanations based on example
relationships (the depth of the tree, the length of rules,
interactions), are examples of metrics, however, they are model An evaluation methodology for describing examples may be
dependent. For example, the authors of [91] suggested insufficient because there are fewer tools for obtaining
assessing model complexity by counting the number of influential cases and prototypes. Global examples are clear and
arithmetic and boolean operations required for the given input. example-based explanations are efficient if the case is
They also examined how complexity affected model accuracy. understandable. Other conditions were not met, and no
Molnar et al. [95] calculated the level of model complexity by assessment techniques were found. Table I summarizes the
looking at the number of features, the intensity of interactions, results of this section.

TABLE I
AI APPROACHES THAT CAN BE EXPLAINED IN TERMS OF THE CHARACTERISTICS OF EXPLAINABILITY
Method Explanation Scope Fidelity Interpretability
type Soundness Completeness Parsimony Clarity
Post-hoc Attribution Local General quantitative metric Unavailable Satisfied if an instance or General quantitative metric
explanation matrices feature is human-intelligible.
Attribution Global General quantitative metric Unavailable Satisfied if an instance or Satisfied
matrices feature is human-intelligible.
Model Local General quantitative metric Satisfied General quantitative metric Unavailable matrices
Model Global General quantitative metric Satisfied General quantitative metric If the model is incapable of
providing various rationales,
the model is satisfied.
Example Local Unavailable matrices Unavailable Satisfied if an instance or Unavailable matrices
matrices feature is human-intelligible.
Example Global Unavailable matrices Unavailable Satisfied if an instance or Satisfied
matrices feature is human-intelligible.
Bharati et al.

distilled information from this technique. An explainable


V. APPLICATIONS SCENARIOS
description of the risk of pneumonia-related mortality was
This section presents the current and potential application proposed by Caruana et al. in the work of XAI and IAI [119].
scenarios of XAI. Bayesian rule lists were introduced by Letham et al. as IAI
A. Dimension reduction using XAI [120] and claimed to be able to predict strokes. A model
Deciphering AI models by depicting their most essential induction technique developed by the authors of [121] applied
aspects is regularly and customarily accomplished via the use visualization to create rules by the probability of a complex
of dimension reduction methods such as Laplacian Eigenmaps algorithm in a variety of tasks, including the classification of
[102, 103], independent component analysis (ICA) [104], and diabetes and the detection of breast cancer. By converting EHR
principal component analysis (PCA) [105]. The ideal occurrences into embedded clinical principles, DL was used by
dimensions of the input characteristics may be used to estimate the authors of [122] to break dynamic connections between
side effects pharmacologically by merging multi-label as well potential risk factors and hospital readmission for patients. An
as k-nearest-neighbor approaches [106]. Unsupervised XAI system based on rule extraction was created by Davoodi
classification of 1H MR spectroscopy brain tumor data was and Moradi [123] to predict ICU mortality, while the authors of
improved using a nonlinear dimension reduction approach [124] employed a comparable XAI technique for Alzheimer's
suggested by the authors of [107], which used Laplacian disease diagnosis. Textual reasoning was applied to the LSTM-
Eigenmaps to identify the most significant characteristics. based breast mass categorization by the authors of [125].
Patients with stage-one lung cancer were categorized by the Argumentation theory was used in [126], where the XAI
authors of [108] using a supervised learning approach, which algorithm was applied to the training process for risk and stroke
defined the most informative cases. A hybrid technique for prediction.
dimension reduction, integrating pattern recognition and C. Feature importance or selection using XAI
regression analytics, was presented to recognize a collection of Using feature importance as a tool, researchers have been
"exemplars" in order to build a "dense data matrix." In the end, able to better describe the properties and importance of
they used the most accurate instances in their model. On the extracted features, as well as the relationships between those
basis of their expertise in the field of cell-type-specific features and the outcomes they predict [12, 113, 127]. The
enhancers, Kim and his colleagues [109] used DL to identify authors of [128] employed weights in features to identify the
which traits were most significant and ranked them in order of top 10 extractable characteristics or features for predicting
importance in the model. Researchers of [110] presented a mortality in the critical care unit. The researchers of [129]
sparse DL approach with pathway-associated to investigate developed an algorithm for clinically relevant and risk-
gene pathways as well as their correlations in brain tumor calculating prostate cancer [130] by estimating the importance
patients. Using dimension reduction techniques, it is possible to of features by applying Shapley's value. The relevance of a
understand the model's underlying behavior by reducing the feature may be gauged by doing a sensitivity analysis on the
information to a tiny subset. extracted features; the more sensitive the result, the more
B. Rule extraction and knowledge distillation using XAI significant the feature. To identify the most important elements
Learning a student model that is easy to understand with a of a microbiota-based diagnostic job, Eck et al. [131] reduced
trainer model that is difficult to explain is called knowledge the features and tested their influence on model performance.
distillation. This is a special kind of model-specific XAI that The other authors in [132] used a backpropagation-based
involves extracting information from a complex model and technique to achieve interpretability by using DL important
simplifying it. Dimension reduction and tree regularization features (DeepLIFT). As a result of the backpropagation
[111] or a combination technique of model compression and method, it is possible to determine the importance of a feature.
dimension reduction [112] may be used to achieve this goal By calculating the nucleotide contribution score for each
[113]. Hinton et al. [114] are among the researchers who have nucleoside, the authors of [133] created interpretable DL that
studied this strategy for some time, however, recent are IAI models for predicting splice sites using the DeepLIFT
advancements in XAI and IAI [115-117] have given it new life. approach. Layer-wise relevance propagation (LRP) [134],
In addition to knowledge distillation, rule extraction is a guided backpropagation (GBP) [135], SHapley Additive
commonly applied XAI technology that is well-suited for usage exPlanations (SHAP) [136], DeepLIFT [132], and other XAI
in digital healthcare. For example, the Model Understanding via and IAI methods, were compared for diagnosing ophthalmic
Subspace Explanations (MUSE) technique [118] has been [137]. By extracting feature significance, XAI not only
created to characterize the projections of the global model by provides an explanation of features but also reflects their
examining several subgroups of instances. For example, rule relative relevance to clinical interpretation. However,
sets or decision sets have been explored for interpretability [89]. numerical weights can be either misinterpreted or difficult to
In the context of XAI and IAI, mimic-learning is a fundamental grasp.
strategy for distilling information that uses gradient-boosting D. Attention mechanism in XAI
trees to learn interpretable structures and make the original Global and local attention mechanisms use all words to
model understandable. An interpretable model for ICU produce context, and self-attention mechanisms use several
outcomes, such as death or ventilator use, was built using the
A Review on Explainable Artificial Intelligence for Healthcare 7

mechanisms simultaneously to locate every link between words interpretation. Clinical end users are not given precise
[138]. There is evidence that the attention mechanism aids in responses by attention-based XAI approaches, however, these
both the improvement of interpretability and the advancement methods do indicate regions of increased concern, making it
of visualization technology [139]. It has been shown in the simpler for them to make decisions. However, the primary
patients of ICU utilizing the attention mechanism by the work drawbacks, such as information overload and warning fatigue.
of [140] that some input characteristics have a greater impact It may not be desirable to highly convey this information to a
on clinical event predictions than others. Patients in the clinical end user.
intensive care unit (ICU) may be assessed for severity using an
interpretable acuity score framework developed by Shickel et
E. Surrogate representation using XAI
al. [141] that makes use of attention and DL-based evaluation
on sequential organ failure. HIV genome integration sites were Individual health-related factors that lead to disease
accurately predicted by Hu and colleagues [142] using prediction are effectively recognized by using XAI in the
mechanistic theories. In addition, using a similar XAI strategy, medical field by applying the local interpretable model-agnostic
the authors of [143] constructed a system for learning how to explanation (LIME) method [66]. This method provides
represent EHR data in a way that might explain the relationship explanations for any classifier by approximating it with a
between clinical results for individual patients. The Reverse surrogate interpretable and locally faithful representation.
Time Attention Model (RETAIN) was built by Choi et al. [144] Linear models in the neighborhood are learned by LIME in
and comprised two sets of attention weights: one at the visit order to explain the disruption of an instance [151]. LIME was
level to account for the effect of each visit, and another at the utilized by Pan et al. [152] to examine the impact of additional
variable level to account for the influence of each variable. In cases on the prediction of central precocious puberty in
order to retain interpretability, reproduce clinical activities, and children. When it comes to AI survival models, Kovalev et al.
integrate sequential information, RETAIN was developed as a [153] came up with an approach called SurvLIME to describe
reverse attention mechanism. Predictive models for heart it. According to the authors of [154], local post-hoc explanation
failure and cataract risk have been suggested by Kwon and model was applied in their investigation of the segmented lung
colleagues [145] based on RETAIN (RetainVis). DL models' suspicious items. An AI system built by Panigutti and his
interpretability may be improved by identifying certain colleagues [155] can forecast the patient's return to the hospital
positions (e.g., DNA, visits, time) in a sequence (e.g., DNA, as well as their diagnosis and prescription order. LIME's
visits, time) where those input attributes might impact the implemented system uses a rule-based explanation to train a
output of prediction. Since 2016, XAI researchers have been local surrogate model, which may be mined employing a
studying the complementary and alternative medicine (CAM) decision tree with many labels. This system is similar to LIME's
[146] approach and its modifications [147], which have since implementation. For the purpose of predicting acute critical
been put to use in digital healthcare, particularly in medical illness via the use of EHRs, Lauritsen et al. [156] investigated
picture analysis. According to the authors of [148], a limited an XAI technique employing Layer-wise Relevance
dataset of CT scans was utilized for training Grad-CAM to Propagation [157]. The white-box estimation needs to precisely
detect appendicitis. To diagnose hypoglycemia, Porumb et al. reflect the black-box algorithm in order to provide a trustworthy
[149] used a combination of RNN and CNN for an description, which is why XAI uses surrogate representation so
electrocardiogram (ECG) experiment with Grad-CAM to often. Comprehension by the physician may be compromised if
identify the heartbeat segments that were most relevant. The the surrogate algorithms are very complex. Table II depicts the
coronavirus classification method was applied with a multiscale overview of the above methods. In addition, Table III presents
camera to identify the diseased regions in a work [150]. These different XAI algorithms and Table IV describes particularly
saliency maps are suggested because of their ease of visual SHAP XAI algorithm reported in the literature.

TABLE II
XAI APPROACHES USED IN MEDICINE AND HEALTHCARE AS WELL AS THEIR TYPES
Type Method Post-hoc Intrinsic Local Global Application Ref.
Dimension Optimum feature ✘ ✔ ✘ ✔ Prediction of enhancers that are cell type specific [109]
reduction selection
using XAI Laplacian Eigenmaps ✘ ✔ ✘ ✔ Classification of Brain tumor from MRI [107]
Sparse-balanced SVM ✘ ✔ ✘ ✔ Diagnosis of type 2 diabetes at an early stage [158]
Cluster analysis and ✘ ✔ ✘ ✔ Classification of lung cancer patients [108]
LASSO
Sparse DL ✘ ✔ ✘ ✔ Glioblastoma multiforme long-term survival prediction [110]
Optimal feature ✘ ✔ ✘ ✔ Estimation of side effect for drug [106]
selection
Rule Decision rules ✘ ✔ ✘ ✔ Stroke Prediction [126]
extraction and Rule-based system ✘ ✔ ✘ ✔ Forecasts for 30-day readmissions and pneumonia risk. [14]
knowledge Mimic learning ✔ ✘ ✔ ✔ Predicting outcomes in the ICU for severe lung injury [120]
Bharati et al.

Type Method Post-hoc Intrinsic Local Global Application Ref.


distillation Textual or visual ✔ ✘ ✔ ✔ Classification of breast mass [125]
using XAI justification
Visualization rules ✔ ✘ ✘ ✔ Clinical diagnosis of diabetes and breast cancer [121]
Lists of Bayesian rule ✘ ✔ ✘ ✔ Stroke prediction [120]
Fuzzy rules ✘ ✔ ✘ ✔ Prediction of in-hospital mortality for all cases [123]
Feature DeepLIFT ✔ ✘ ✘ ✔ Detection of splice site [133]
importance or Feature weighting ✔ ✘ ✘ ✔ Prediction of ICU mortality for all-causes [128]
selection
using XAI
DeepLIFT ✔ ✘ ✔ ✔ Ophthalmic diagnosis [137]
Feature marginalization ✔ ✘ ✔ ✔ Diagnosis of inflammatory or microbiota bowel diseases [131]
from skin and gut
Attention Attention ✘ ✔ ✔ ✘ Prediction of future hospitalization from EHR [143]
mechanism in ✘ ✔ ✔ ✔ ICU clinical events predictions [140]
XAI
✘ ✔ ✔ ✘ HIV genome integration site prediction [142]
✘ ✔ ✔ ✔ Prediction of clinical risk for cardiac cataract or failure [145]
✘ ✔ ✔ ✘ Prediction of heart failure [144]
MLCAM ✘ ✔ ✔ ✘ Localization of brain tumor [145]
Grad-CAM ✘ ✔ ✔ ✘ Appendicitis diagnosis [147]
Grad-CAM ✘ ✔ ✔ ✘ Hypoglycaemia detection using ECG [149]
Surrogate LIME ✔ ✘ ✔ ✘ Prediction of early puberty in the central region [152]
representation ✔ ✘ ✔ ✘ Construction of survival models [153]
using XAI
✔ ✘ ✔ ✘ Diagnosis of autism spectrum disorder [159]
Rule-based XAI ✔ ✘ ✔ ✘ Patient medications, diagnosis and readmission prediction [155]

TABLE III
APPLICATIONS OF XAI ALGORITHMS IN DIFFERENT SCENARIOS
Algorithm Ref. Location Mode Remarks
LRP [175] Brain MRI employs LRP to locate Alzheimer's disease-causing brain areas, where LRP is proved to be
more specific than guided backpropagation in detecting the disease areas.
Multiple instance [176] Dental Eye Fundus Image Presents a cutting-edge DL-based diabetic retinopathy grading system based
learning on a medically comprehensible explanation; and infers an image grade.
Trainable attention [177] Dental Eye Fundus Image proposes an attention-based CNN for removing high redundancy in fundus images for
glaucoma detection.
Trainable attention [178] Chest X-ray Proposes two novel neural networks for the detection of chest radiographs containing
pulmonary lesions; both the architectures make use of a large number of weakly-labelled
images combined with a smaller number of manually annotated X-rays.
Grad-CAM , LIME [179] Chest X-ray Presents several examples of CXRs with pneumonia infection, together with the results from
the grad-CAM and LIME; LIME's explanatory output is presented as the superpixels with
the highest positive weights superimposed on the original input.
CAM [180] Prostate Histology constructs a patch-wise predictive model using convolutional neural networks to detect
cancerous Gleason patterns, and examines the interpretability of trained models using Class
Activation Maps (CAMs).
Saliency propagation [181] Skin Vitiligo image Proposes a unique weakly supervised approach to segmenting vitiligo lesions with preserved
fine borders; the activation map can be paired with superpixel-based saliency propagation to
generate segmentation with well-preserved edges.
Guided [182] Gastrointestinal Endoscopy Uses Guided Backpropagation based Fully Convolutional Networks to investigate the pixels
backpropagation necessary for identifying polyps (FCNs).
TCAV [183] Cardiovascular MRI using TCAV (Testing with Concept Activation Vectors) to show which biomarkers are
already known to be linked to cardiac disease.
TCAV [184] Breast and Eye histopathology and proposing the addition of regression concept vector to TCAV in identifying tumor size, while
retinopathy TCAV represented binary ideas, regression concept vectors indicated continuous-valued
measurements.

TABLE IV
DISCUSSION OF SHAP XAI ALGORITHMS IN THE LITERATURE
Ref Disease Diagnosis ML Models Used Work Summary
[185] Chronic Obstructive Gradient boosting machine (GBM) The work ranks the attributes crucial to the GBM model according to their average
Pulmonary Disease absolute SHAP values that reflect the impact of each feature on a prediction.
[186] Parkinson's disease deep forest (gcForest), extreme The classifiers determine how much the SHAP value contributes to the features.
diagnosis gradient boosting (XGBoost), light When 150 features are considered, SHAP-gcForest achieves 91.78 percent
gradient boosting machine classification accuracy. With 50 features, SHAP-LightGBM combined with
(LightGBM) and random forest (RF) LightGBM yields 91.62% accuracy.
[187] predicting acute XGboost , Logistic Regression (LR) The SHAP value increases as creatinine levels climb until it reaches about 5 mg/dL.
kidney injury As the Furosemide Stress Test (FST) is increased until it reaches 100 ml/min, the
progression SHAP result decreases.
A Review on Explainable Artificial Intelligence for Healthcare 9

[188] COVID-19 Prediction XGboost model, LGBM, gradient Harris hawks optimization (HHO) approach is employed to fine-tune the
boosting classifier (GBC), hyperparameters of a few cutting-edge classifiers, including the HHO-RF, HHO-
categorical boosting (CatBoost), RF XGB, HHO-LGB, and ensemble methods, in order to increase classification
accuracy.
[189] Prediction of COVID- GBM A gradient-boosting machine model created with decision-tree base-learners was
19 diagnosis based on used to produce predictions. SHAP values calculate the contribution of each feature
symptoms to overall model predictions by average across samples.
[190] Efficient analysis of Support Vector Machine (SVM), Simple classification techniques like random forest can outperform keras with
COVID-19 Naive Bayes (NB), Multiple Linear Boruta for feature selection when the dataset size is not too large. Computing the
Regression (MLP), K-Nearest information gain values for each attribute in Clinical Data1 shows the importance of
Neighbors (KNN), RF, LR, and each attribute.
Decision Tree (DT), Keras Classifier
[191] COVID-19 Diagnosis SqueezeNet, LIME LIME and SHAP are compared for COVID diagnosis using SqueezeNet to recognize
pneumonia, COVID-19, and normal lung imaging. Results show LIME and SHAP
can boost the model's transparency and interpretability.
[192] COVID-19 vaccine Random Forest, XGBoost classifiers, CovidXAI predicts a user's risk group using Random Forest and XGBoost classifiers.
prioritization XAI CovidXAI uses 24 criteria to define an individual's risk category and vaccine
urgency.
[193] COVID-19 XGBoost, Random Forest The goal is to provide grounds for understanding the distinctive COVID-19
Pneumonia radiographic texture features using supervised ensemble ML methods based on trees
Classification through the interpretable SHAP approach. SHAP recursive feature elimination with
cross-validation is used to select features. The best classification model was
XGBoost, with an accuracy of 0.82 and a sensitivity of 0.82.
[194] Lung cancer hospital Random Forest, logistic regression This paper introduces a predictive LOS framework for lung cancer patients utilizing
length of stay ML models. Using SHAP, the output of the predictive model (RF) with SMOTE
prediction class balancing approach is understood demonstrating the most relevant clinical
factors that contributed to predicting lung cancer LOS using the RF model.

VI. DISCUSSION ON DIFFERENT ASPECTS OF XAI A. Developing XAI to create trustworthy AI


It is still a work in progress to design and govern AI systems The reason for requiring explainability is that it dictates what
that can be trusted. We looked at the function that explainability ought to be communicated as well as a step-by-step guide for
plays in constructing trustworthy AI since the two concepts are developers with real diagram suggestions was offered. The
intertwined. A framework with actual suggestions for choosing development of post-hoc explanation techniques that contain
between classes of XAI approaches is provided by us as an argumentation evidence for their assertions [164] might be a
extension to previous recent surveys [12, 19, 85, 92, 113]. As solution to the problem of misleading post-hoc explanations.
a result, we presented useful definitions and made contributions Model-based explanations can be suitable when looking for a
to the literature on quantitative assessment measures. XAI is post-hoc explanation since they are full and contain quantitative
being discussed as a part of a broader effort to build confidence proxy criteria to assess their soundness. Many more studies are
in AI. required to better understand the performance of explainable
models in the healthcare industry [165] and to develop
Much work on automated diagnosis and prognosis has been explainable modeling approaches such as rule-based systems or
done in the medical area using ML techniques [29]. We can see GAMs with or without interaction terms. Thus, the importance
from grand-challenge.org that a number of medical issues have of feature engineering in the context of feature interpretation is
surfaced and energized ML and AI researchers. DL models further highlighted. Another potential research approach is to
[160], [161] that use U-Net for medical segmentation are create hybrid techniques that combine knowledge-driven and
effective. U-Net, on the other hand, is still a mystery since it is data-driven aspects for feature engineering or feature selection.
a DL neural network. There are several more approaches and The quality of explanations is critical when utilizing XAI to
special transformations (denoising, for example) that have been build trustworthy AI. According to Poursabzi-Sangdeh et al.
described in MICCAI papers, such as [162]. The issue of [84], improved results from human-machine tasks are not
interpretability in medicine goes much beyond idle speculation. always accompanied by greater performance. We discovered
Medical interpretability contains risks and duties that other that evaluating the clarity of local explanations is challenging
areas do not take into consideration [67] . There is a risk of and that there are currently no quantitative assessment measures
death associated with medical decisions that are made. The for example-based methodologies. Even while interpretability
pursuit of medical explainability has therefore resulted in is commonly recognized to be user-dependent, it has not been
several further publications [71]. To raise awareness about the measured as such when assessing the quality of explanations
significance of interpretability in the medical sector, they [85]. Future research should focus on establishing a criterion
summarize prior studies [163], consisting of subfield-specific for identifying when AI systems can be explained to a wide
reviews, i.e., [74] for sentiment analysis and chest radiography range of people.
in medicine [75]. As stated explicitly in [60], AI in dermatology B. Complementary approaches to the development of
is severely limited because it cannot deliver individualized trustworthy healthcare AI
assessments [60].
In some cases (e.g. [165]), explanations are not essential nor
Bharati et al.

adequate to build faith in AI. Perceived system capabilities, developing countries. Moreover, some studies are focused to
control, and predictability are other crucial factors [166]. developing hybrid approaches of XAI where individual XAI
Because of this, even while XAI has the potential to contribute techniques are combined for better performance. Furthermore,
to reliable AI, it is not without its limitations [167]. The some recent works are concentrating on research intersecting
following are some additional approaches that may be quantum computing and XAI as quantum computing has the
implemented in conjunction with AI in healthcare to ensure its potential to improve the performance of XAI for healthcare.
trustworthiness:
D. Challenges and Future Prospects
The first one is regarding the quality of data. It is important
for providing information on the quality of the data. Real-world Intractable systems may be simplified using models. A lack
data may include biases, inaccuracies, or be incomplete since it of utilization of DNNs with a large number of parameters may
was not gathered for study reasons. This means that knowing result in under-utilized ML capabilities. It is feasible to take
the quality of the data obtained and how it was acquired is at advantage of the excitement around predictive science by
least as crucial to the final model's constraints as its examining if it is viable to add new, more complex components
explainability [168]. Kahn et al. [169] provide a generally to an existing model, such that sections of these components
recognized approach for determining whether EHR data is may be recognized as fresh insights. The enhanced model must,
appropriate for a given use case. The second aspect is about of course, be similar to earlier demonstrations and models, with
conducting comprehensive testing. The use of standard data a clear interpretation of why the additional components
formats in health care has made external validation more viable, correlate to formally ignored insights.
even if it is still considered an area for the development of Some approaches that are deeply embedded in particular
models of clinical risk prediction [87, 170]. Stability, fairness, industries are still widely used, while powerful ML algorithms
and privacy are only a few of the topics being studied in this are finding new uses. Fragmented and experimental
field. Another aspect is the governing law. implementations of existing or custom-made interpretable
techniques are common in medical ML, which is still in its
C. Trends in Existing Research infancy. We may still have large undiscovered prospects in
Some interesting trends are visible in the works reported in interpretability research despite the current emphasis on
the literature. These studies have been described in detail in the increasing accuracy and performance in feature selection and
previous section, and now some of the insights are described in extraction.
the following. Literature indicates post-hoc and model-specific
explanations can be used to categorize XAI model explanations. VII. CONCLUSIONS
Gradient-based, perturbation-based, and rule-based attribution This review paper summarizes the state-of-the-art, evaluates
are three different methods. XAI can assist in explaining how the level of quality of recent research, identifies the research
the reduced features are connected to the original variables and gap, and makes recommendations for future studies that can
how they impact the model's performance in the context of significantly develop XAI. In particular, this review focuses on
dimension reduction. Laplacian Eigenmaps, ICA, and PCA the application of XAI techniques to the healthcare industry.
techniques for dimension reduction utilizing XAI are available. Discussions are provided on the importance of using XAI, the
Other similar techniques are feature importance, and partial scenarios where XAI is suitable, and the methods of
dependence plots. Studies so far indicate that in the field of implementing XAI. In this regard, this paper provides an
XAI, rule extraction and knowledge distillation are two crucial overview of the literature, connecting various perspectives and
concepts. The permutation importance method is one of the advocating specific design solutions. It is discussed here that AI
most widely used XAI strategies for feature importance. has the potential to alter medical practice and the role of
Another XAI technique for determining feature importance is physicians. Moreover, explainable models may be favored over
the SHAP, partial dependence plots, and methods based on posthoc explanations when XAI is used to develop trustworthy
decision trees, including the Gini index and information gain. AI for healthcare. Rigorous regulation and validation may be
An electrocardiogram experiment with Grad-CAM can be used required in addition to reporting data quality and extensive
to diagnose hypoglycemia by combining RNN and CNN. In validation, as limited evidence supports the practical utility of
XAI, examples of surrogate models found in the literature are explainability. In the future, efforts will be necessary to address
decision trees and linear regression models. the problems and concerns that have been raised with XAI.
Several XAI techniques are difficult to compute with and
may be challenging to scale to large datasets or complex APPENDIX
models. There is still a shortage of XAI methods interaction ((‘‘Explainable AI’’) OR (‘‘Interpretable AI’’) OR
with useful systems. There is not enough information on how (‘‘Explainable Artificial Intelligence’’) OR (‘‘Interpretable
humans interpret and apply XAI explanations. However, some Artificial Intelligence’’) OR (‘‘XAI’’)) AND ((‘‘medical’’) OR
recent studies discuss the idea of federated learning of XAI (‘‘healthcare’’))
models, which was created specifically to fulfill these two
criteria simultaneously. Recent studies also consider the ACKNOWLEDGMENT
application of XAI in the field of medical data standardization
This work has been carried out at the Institute of Information
which is a still a challenging issue in many underdeveloped or
A Review on Explainable Artificial Intelligence for Healthcare 11

and Communication Technology (IICT) of Bangladesh [19] A. B. Arrieta et al., "Explainable Artificial Intelligence (XAI):
Concepts, taxonomies, opportunities and challenges toward
University of Engineering and Technology (BUET), responsible AI," Information fusion, vol. 58, pp. 82-115, 2020.
Bangladesh. Hence, the authors would like to thank BUET for [20] R. Basnet, M. O. Ahmad, and M. N. S. Swamy, "A deep dense
providing the support. residual network with reduced parameters for volumetric brain
tissue segmentation from MR images," Biomedical Signal
Processing and Control, vol.70, p. 103063, 2021.
REFERENCES [21] S. Bharati and M. R. Hossain Mondal, "12 Applications and
challenges of AI-driven IoHT for combating pandemics: a
[1] A. Gupta, A. Anpalagan, L. Guan, and A. S. Khwaja, "Deep learning reviewComputational Intelligence for Managing Pandemics," A.
for object detection and scene perception in self-driving cars: Khamparia, R. Hossain Mondal, P. Podder, B. Bhushan, V. H. C. d.
Survey, challenges, and open issues," Array, vol. 10, p. 100057, Albuquerque, and S. Kumar, Eds.: De Gruyter, 2021, pp. 213-230.
2021. [22] P. Podder, S. Bharati, M. R. H. Mondal, and U. Kose, "Application
[2] S. Bharati, P. Podder, and M. R. H. Mondal, "Artificial Neural of Machine Learning for the Diagnosis of COVID-19," in Data
Network Based Breast Cancer Screening: A Comprehensive Science for COVID-19: Elsevier, 2021, pp. 175-194.
Review," International Journal of Computer Information Systems [23] E. Jabason, M. O. Ahmad, and M. N. S. Swamy, "Classification of
and Industrial Management Applications, vol. 12, pp. 125-137, Alzheimer’s Disease from MRI Data Using a Lightweight Deep
2020. Convolutional Model," In 2022 IEEE International Symposium on
[3] M. R. H. Mondal, S. Bharati, and P. Podder, "CO-IRv2: Optimized Circuits and Systems (ISCAS), pp. 1279-1283. IEEE, 2022.
InceptionResNetV2 for COVID-19 detection from chest CT [24] S. Bharati, P. Podder, M. Mondal, and V. B. Prasath, "CO-ResNet:
images," PloS one, vol. 16, no. 10, p. e0259179, 2021. Optimized ResNet model for COVID-19 diagnosis from X-ray
[4] S. Bharati, P. Podder, and M. R. H. Mondal, "Hybrid deep learning images," International Journal of Hybrid Intelligent Systems, vol.
for detecting lung diseases from X-ray images," Informatics in 17, pp. 71-85, 2021.
Medicine Unlocked, p. 100391, 2020. [25] M. R. H. Mondal, S. Bharati, P. Podder, and P. Podder, "Data
[5] A. Rajkomar et al., "Scalable and accurate deep learning with analytics for novel coronavirus disease," Informatics in Medicine
electronic health records," NPJ Digital Medicine, vol. 1, no. 1, pp. Unlocked, vol. 20, p. 100374, 2020.
1-10, 2018. [26] G. Yang, Q. Ye, and J. Xia, "Unbox the black-box for the medical
[6] S. Tonekaboni, S. Joshi, M. D. McCradden, and A. Goldenberg, explainable AI via multi-modal and multi-centre data fusion: A
"What clinicians want: contextualizing explainable machine mini-review, two showcases and beyond," Information Fusion, vol.
learning for clinical end use," presented at the Machine learning for 77, pp. 29-52, 2022/01/01/ 2022.
healthcare conference, 2019, 2019. [27] B. Mahbooba, M. Timilsina, R. Sahal, and M. Serrano, "Explainable
[7] E. D. Peterson, "Machine learning, predictive analytics, and clinical artificial intelligence (xai) to enhance trust management in intrusion
practice: can the past inform the present?," Jama, vol. 322, no. 23, detection systems using decision tree model," Complexity, vol.
pp. 2283-2284, 2019. 2021, 2021.
[8] J. He, S. L. Baxter, J. Xu, J. Xu, X. Zhou, and K. Zhang, "The [28] S. R. Islam, W. Eberle, and S. K. Ghafoor, "Towards quantification
practical implementation of artificial intelligence technologies in of explainability in explainable artificial intelligence methods," In
medicine," Nature medicine, vol. 25, no. 1, pp. 30-36, 2019. The thirty-third international flairs conference 2020, 2020.
[9] Z. C. Lipton, "The Mythos of Model Interpretability: In machine [29] F. Jiang et al., "Artificial intelligence in healthcare: past, present and
learning, the concept of interpretability is both important and future," Stroke and vascular neurology, vol. 2, no. 4, pp. 230-243,
slippery," Queue, vol. 16, no. 3, pp. 31-57, 2018. 2017.
[10] J. Burrell, "How the machine ‘thinks’: Understanding opacity in [30] T. Panch, H. Mattie, and L. A. Celi, "The “inconvenient truth” about
machine learning algorithms," Big Data & Society, vol. 3, no. 1, p. AI in healthcare," NPJ digital medicine, vol. 2, no. 1, pp. 1-3, 2019.
2053951715622512, 2016. [31] K.-H. Yu, A. L. Beam, and I. S. Kohane, "Artificial intelligence in
[11] F. Doshi-Velez and B. Kim, "Considerations for evaluation and healthcare," Nature biomedical engineering, vol. 2, no. 10, pp. 719-
generalization in interpretable machine learning," in Explainable 731, 2018.
and interpretable models in computer vision and machine learning: [32] D. Shen, G. Wu, and H.-I. Suk, "Deep learning in medical image
Springer, 2018, pp. 3-17. analysis," Annual review of biomedical engineering, vol. 19, pp.
[12] E. J. Hwang et al., "Development and validation of a deep learning– 221-248, 2017.
based automated detection algorithm for major thoracic diseases on [33] G. Litjens et al., "A survey on deep learning in medical image
chest radiographs," JAMA network open, vol. 2, no. 3, pp. e191095- analysis," Medical image analysis, vol. 42, pp. 60-88, 2017.
e191095, 2019. [34] J. Ker, L. Wang, J. Rao, and T. Lim, "Deep learning applications in
[13] J. G. Nam et al., "Development and validation of deep learning– medical image analysis," Ieee Access, vol. 6, pp. 9375-9389, 2017.
based automatic detection algorithm for malignant pulmonary [35] S. E. Dilsizian and E. L. Siegel, "Artificial intelligence in medicine
nodules on chest radiographs," Radiology, vol. 290, no. 1, pp. 218- and cardiac imaging: harnessing big data and advanced computing
228, 2019. to provide personalized medical diagnosis and treatment," Current
[14] K. J. Geras et al., "High-resolution breast cancer screening with cardiology reports, vol. 16, no. 1, p. 441, 2014.
multi-view deep convolutional neural networks," arXiv preprint [36] V. L. Patel et al., "The coming of age of artificial intelligence in
arXiv:1703.07047, 2017. medicine," Artificial intelligence in medicine, vol. 46, no. 1, pp. 5-
[15] A. Khamparia et al., "Diagnosis of Breast Cancer Based on Modern 17, 2009.
Mammography using Hybrid Transfer Learning," Multidimensional [37] S. Jha and E. J. Topol, "Adapting to artificial intelligence:
Systems and Signal Processing, 2020. radiologists and pathologists as information specialists," Jama, vol.
[16] M. R. H. Mondal, S. Bharati, and P. Podder, "Diagnosis of COVID- 316, no. 22, pp. 2353-2354, 2016.
19 Using Machine Learning and Deep Learning: A Review," [38] E. Strickland, "IBM Watson, heal thyself: How IBM overpromised
Current Medical Imaging, 2021. and underdelivered on AI health care," IEEE Spectrum, vol. 56, no.
[17] S. Bharati, P. Podder, M. Mondal, and V. B. Prasath, "Medical 4, pp. 24-31, 2019.
Imaging with Deep Learning for COVID-19 Diagnosis: A [39] N. S. Weingart, R. M. Wilson, R. W. Gibberd, and B. Harrison,
Comprehensive Review," International Journal of Computer "Epidemiology of medical error," Bmj, vol. 320, no. 7237, pp. 774-
Information Systems and Industrial Management Applications, vol. 777, 2000.
13, pp. 91 - 112, 2021. [40] M. L. Graber, N. Franklin, and R. Gordon, "Diagnostic error in
[18] E. J. Topol, "High-performance medicine: the convergence of internal medicine," Archives of internal medicine, vol. 165, no. 13,
human and artificial intelligence," Nature medicine, vol. 25, no. 1, pp. 1493-1499, 2005.
pp. 44-56, 2019. [41] I. Doroniewicz et al., "Computer-based analysis of spontaneous
infant activity: A pilot study," in Information Technology in
Biomedicine: Springer, 2021, pp. 147-159.
Bharati et al.

[42] S. Sachan, F. Almaghrabi, J.-B. Yang, and D.-L. Xu, "Evidential [64] C. Roggeman, W. Fias, and T. Verguts, "Salience maps in parietal
reasoning for preprocessing uncertain categorical data for cortex: imaging and computational modeling," Neuroimage, vol. 52,
trustworthy decisions: An application on healthcare and finance," no. 3, pp. 1005-1014, 2010.
Expert Systems with Applications, vol. 185, p. 115597, 2021. [65] M. Aminu, N. A. Ahmad, and M. H. M. Noor, "Covid-19 detection
[43] D. Li et al., "Semi-supervised variational reasoning for medical via deep neural network and occlusion sensitivity maps," Alexandria
dialogue generation," presented at the Proceedings of the 44th Engineering Journal, vol. 60, no. 5, pp. 4829-4855, 2021.
International ACM SIGIR Conference on Research and [66] M. T. Ribeiro, S. Singh, and C. Guestrin, "Why should i trust you?"
Development in Information Retrieval, 2021, 2021. Explaining the predictions of any classifier," presented at the
[44] A. Rajkomar, J. Dean, and I. Kohane, "Machine learning in Proceedings of the 22nd ACM SIGKDD international conference on
medicine," New England Journal of Medicine, vol. 380, no. 14, pp. knowledge discovery and data mining, 2016, 2016.
1347-1358, 2019. [67] P. Croskerry, K. Cosby, M. L. Graber, and H. Singh, Diagnosis:
[45] K.-H. Yu and M. Snyder, "Omics profiling in precision oncology," Interpreting the shadows. CRC Press, 2017.
Molecular & Cellular Proteomics, vol. 15, no. 8, pp. 2525-2536, [68] T. P. Quinn, S. Jacobs, M. Senadeera, V. Le, and S. Coghlan, "The
2016. three ghosts of medical AI: Can the black-box present deliver?,"
[46] K. Roberts et al., "Biomedical informatics advancing the national Artificial Intelligence in Medicine, p. 102158, 2021.
health agenda: the AMIA 2015 year-in-review in clinical and [69] Z. Zhang, Y. Xie, F. Xing, M. McGough, and L. Yang, "Mdnet: A
consumer informatics," Journal of the American Medical semantically and visually interpretable medical image diagnosis
Informatics Association, vol. 24, no. e1, pp. e185-e190, 2017. network," presented at the Proceedings of the IEEE conference on
[47] W. Rogers et al., "Radiomics: from qualitative to quantitative computer vision and pattern recognition, 2017, 2017.
imaging," The British journal of radiology, vol. 93, no. 1108, p. [70] A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell, "What do
20190948, 2020. we need to build explainable AI systems for the medical domain?,"
[48] S. Bharati, P. Podder, M. R. H. Mondal, and N. Gandhi, "Optimized arXiv preprint arXiv:1712.09923, 2017.
NASNet for Diagnosis of COVID-19 from Lung CT Images," [71] A. Holzinger, G. Langs, H. Denk, K. Zatloukal, and H. Müller,
presented at the 20th International Conference on Intelligent "Causability and explainability of artificial intelligence in
Systems Design and Applications (ISDA 2020), 2020. medicine," Wiley Interdisciplinary Reviews: Data Mining and
[49] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT Knowledge Discovery, vol. 9, no. 4, p. e1312, 2019.
press, 2016. [72] M. S. Hossain, G. Muhammad, and N. Guizani, "Explainable AI and
[50] Y. E. Wang, G.-Y. Wei, and D. Brooks, "Benchmarking tpu, gpu, mass surveillance system-based healthcare framework to combat
and cpu platforms for deep learning," arXiv preprint COVID-I9 like pandemics," IEEE Network, vol. 34, no. 4, pp. 126-
arXiv:1907.10701, 2019. 132, 2020.
[51] K. Tomczak, P. Czerwińska, and M. Wiznerowicz, "The Cancer [73] E. Khodabandehloo, D. Riboni, and A. Alimohammadi,
Genome Atlas (TCGA): an immeasurable source of knowledge," "HealthXAI: Collaborative and explainable AI for supporting early
Contemporary oncology, vol. 19, no. 1A, p. A68, 2015. diagnosis of cognitive decline," Future Generation Computer
[52] C. Sudlow et al., "UK biobank: an open access resource for Systems, vol. 116, pp. 168-189, 2021.
identifying the causes of a wide range of complex diseases of middle [74] K. Kallianos et al., "How far have we come? Artificial intelligence
and old age," PLoS medicine, vol. 12, no. 3, p. e1001779, 2015. for chest radiograph interpretation," Clinical radiology, vol. 74, no.
[53] E. Williams et al., "Image Data Resource: a bioimage data 5, pp. 338-345, 2019.
integration and publication platform," Nature methods, vol. 14, no. [75] C. Zucco, H. Liang, G. Di Fatta, and M. Cannataro, "Explainable
8, pp. 775-781, 2017. sentiment analysis with applications in medicine," presented at the
[54] A. Esteva et al., "A guide to deep learning in healthcare," Nature 2018 IEEE International Conference on Bioinformatics and
medicine, vol. 25, no. 1, pp. 24-29, 2019. Biomedicine (BIBM), 2018, 2018.
[55] D. Schönberger, "Artificial intelligence in healthcare: a critical [76] G. Stiglic, S. Kocbek, I. Pernek, and P. Kokol, "Comprehensive
analysis of the legal and ethical implications," International Journal decision tree models in bioinformatics," PloS one, vol. 7, no. 3, p.
of Law and Information Technology, vol. 27, no. 2, pp. 171-203, e33812, 2012.
2019. [77] X. Peng, Y. Li, I. Tsang, H. Zhu, J. Lv, and J. T. Zhou, "XAI Beyond
[56] D. Higgins and V. I. Madai, "From bit to bedside: a practical Classification: Interpretable Neural Clustering," Journal of Machine
framework for artificial intelligence product development in Learning Research, vol. 22, pp. 1-28, 2021.
healthcare," Advanced intelligent systems, vol. 2, no. 10, p. [78] D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, and G.-Z.
2000052, 2020. Yang, "XAI—Explainable artificial intelligence," Science Robotics,
[57] M. DeCamp and C. Lindvall, "Latent bias and the implementation vol. 4, no. 37, p. eaay7120, 2019.
of artificial intelligence in medicine," Journal of the American [79] G. Valdes, J. M. Luna, E. Eaton, C. B. Simone, L. H. Ungar, and T.
Medical Informatics Association, vol. 27, no. 12, pp. 2020-2023, D. Solberg, "MediBoost: a patient stratification tool for interpretable
2020. decision making in the era of precision medicine," Scientific reports,
[58] P. Esmaeilzadeh, "Use of AI-based tools for healthcare purposes: a vol. 6, no. 1, pp. 1-8, 2016.
survey study from consumers’ perspectives," BMC medical [80] E. Tjoa and C. Guan, "A survey on explainable artificial intelligence
informatics and decision making, vol. 20, no. 1, pp. 1-19, 2020. (xai): Toward medical xai," IEEE transactions on neural networks
[59] J. R. England and P. M. Cheng, "Artificial intelligence for medical and learning systems, vol. 32, no. 11, pp. 4793-4813, 2020.
image analysis: a guide for authors and reviewers," American [81] S. N. Payrovnaziri et al., "Explainable artificial intelligence models
journal of roentgenology, vol. 212, no. 3, pp. 513-519, 2019. using real-world electronic health record data: a systematic scoping
[60] A. Gomolin, E. Netchiporouk, R. Gniadecki, and I. V. Litvinov, review," Journal of the American Medical Informatics Association,
"Artificial intelligence applications in dermatology: where do we vol. 27, no. 7, pp. 1173-1185, 2020.
stand?," Frontiers in medicine, vol. 7, p. 100, 2020. [82] M. Pocevičiūtė, G. Eilertsen, and C. Lundström, "Survey of XAI in
[61] D. C. Somers and S. L. Sheremata, "Attention maps in the brain," digital pathology," in Artificial intelligence and machine learning
Wiley Interdisciplinary Reviews: Cognitive Science, vol. 4, no. 4, for digital pathology: Springer, 2020, pp. 56-88.
pp. 327-340, 2013. [83] A. B. Tosun, F. Pullara, M. J. Becich, D. Taylor, S. C.
[62] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and Chennubhotla, and J. L. Fine, "Histomapr™: An explainable ai (xai)
D. Batra, "Grad-cam: Visual explanations from deep networks via platform for computational pathology solutions," in Artificial
gradient-based localization," presented at the Proceedings of the Intelligence and Machine Learning for Digital Pathology: Springer,
IEEE international conference on computer vision, 2017, 2017. 2020, pp. 204-227.
[63] B. N. Patro, M. Lunayach, S. Patel, and V. P. Namboodiri, "U-cam: [84] F. Doshi-Velez and B. Kim, "Towards a rigorous science of
Visual explanation using uncertainty based class activation maps," interpretable machine learning," arXiv preprint arXiv:1702.08608,
presented at the Proceedings of the IEEE/CVF International 2017.
Conference on Computer Vision, 2019, 2019. [85] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and
D. Pedreschi, "A survey of methods for explaining black box
A Review on Explainable Artificial Intelligence for Healthcare 13

models," ACM computing surveys (CSUR), vol. 51, no. 5, pp. 1-42, [109] S. G. Kim, N. Theera-Ampornpunt, C.-H. Fang, M. Harwani, A.
2018. Grama, and S. Chaterji, "Opening up the blackbox: an interpretable
[86] M. Nazar, M. M. Alam, E. Yafi, and M. S. Mazliham, "A Systematic deep neural network-based classifier for cell-type specific enhancer
Review of Human-Computer Interaction and Explainable Artificial predictions," BMC systems biology, vol. 10, no. 2, pp. 243-258,
Intelligence in Healthcare with Artificial Intelligence Techniques," 2016.
IEEE Access, 2021. [110] J. Hao, Y. Kim, T.-K. Kim, and M. Kang, "PASNet: pathway-
[87] A. Das and P. Rad, "Opportunities and challenges in explainable associated sparse deep neural network for prognosis prediction from
artificial intelligence (xai): A survey," arXiv preprint high-throughput data," BMC bioinformatics, vol. 19, no. 1, pp. 1-13,
arXiv:2006.11371, 2020. 2018.
[88] J. Huysmans, K. Dejaeger, C. Mues, J. Vanthienen, and B. Baesens, [111] M. Wu, M. Hughes, S. Parbhoo, M. Zazzi, V. Roth, and F. Doshi-
"An empirical evaluation of the comprehensibility of decision table, Velez, "Beyond sparsity: Tree regularization of deep models for
tree and rule based predictive models," Decision Support Systems, interpretability," presented at the Proceedings of the AAAI
vol. 51, no. 1, pp. 141-154, 2011. conference on artificial intelligence, 2018, 2018.
[89] I. Lage et al., "An evaluation of the human-interpretability of [112] A. Polino, R. Pascanu, and D. Alistarh, "Model compression via
explanation," arXiv preprint arXiv:1902.00006, 2019. distillation and quantization," arXiv preprint arXiv:1802.05668,
[90] F. Poursabzi-Sangdeh, D. G. Goldstein, J. M. Hofman, J. W. 2018.
Wortman Vaughan, and H. Wallach, "Manipulating and measuring [113] D. V. Carvalho, E. M. Pereira, and J. S. Cardoso, "Machine learning
model interpretability," presented at the Proceedings of the 2021 interpretability: A survey on methods and metrics," Electronics, vol.
CHI conference on human factors in computing systems, 2021, 8, no. 8, p. 832, 2019.
2021. [114] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a
[91] D. Slack, S. A. Friedler, C. Scheidegger, and C. D. Roy, "Assessing neural network," arXiv preprint arXiv:1503.02531, vol. 2, no. 7,
the local interpretability of machine learning models," arXiv 2015.
preprint arXiv:1902.03501, 2019. [115] N. Frosst and G. Hinton, "Distilling a neural network into a soft
[92] S. Mohseni, N. Zarei, and E. D. Ragan, "A survey of evaluation decision tree," arXiv preprint arXiv:1711.09784, 2017.
methods and measures for interpretable machine learning," arXiv [116] X. Yang, H. Zhang, and J. Cai, "Auto-encoding and distilling scene
preprint arXiv:1811.11839, vol. 1, 2018. graphs for image captioning," IEEE Transactions on Pattern
[93] H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, "Interpretable Analysis and Machine Intelligence, 2020.
& explorable approximations of black box models," arXiv preprint [117] X.-H. Li et al., "A survey of data-driven and knowledge-aware
arXiv:1707.01154, 2017. explainable ai," IEEE Transactions on Knowledge and Data
[94] M. Barandas, D. Folgado, R. Santos, R. Simão, and H. Gamboa, Engineering, vol. 34, no. 1, pp. 29-49, 2020.
"Uncertainty-Based Rejection in Machine Learning: Implications [118] H. Lakkaraju, E. Kamar, R. Caruana, and J. Leskovec, "Faithful and
for Model Development and Interpretability," Electronics, vol. 11, customizable explanations of black box models," presented at the
no. 3, p. 396, 2022. Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and
[95] C. Molnar, G. Casalicchio, and B. Bischl, "Quantifying Society, 2019, 2019.
interpretability of arbitrary machine learning models through [119] R. Caruana, Y. Lou, J. Gehrke, P. Koch, M. Sturm, and N. Elhadad,
functional decomposition," 2019. "Intelligible models for healthcare: Predicting pneumonia risk and
[96] L. Arras, A. Osman, K.-R. Müller, and W. Samek, "Evaluating hospital 30-day readmission," presented at the 21th ACM SIGKDD
recurrent neural network explanations," arXiv preprint international conference on knowledge discovery and data mining,
arXiv:1904.11829, 2019. 2015.
[97] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, and K.-R. [120] B. Letham, C. Rudin, T. H. McCormick, and D. Madigan,
Müller, "Explaining nonlinear classification decisions with deep "Interpretable classifiers using rules and bayesian analysis: Building
taylor decomposition," Pattern recognition, vol. 65, pp. 211-222, a better stroke prediction model," The Annals of Applied Statistics,
2017. vol. 9, no. 3, pp. 1350-1371, 2015.
[98] S. Hooker, D. Erhan, P.-J. Kindermans, and B. Kim, "A benchmark [121] Y. Ming, H. Qu, and E. Bertini, "Rulematrix: Visualizing and
for interpretability methods in deep neural networks," Advances in understanding classifiers with rules," IEEE transactions on
neural information processing systems, vol. 32, 2019. visualization and computer graphics, vol. 25, no. 1, pp. 342-352,
[99] M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic attribution for 2018.
deep networks," pp. 3319-3328: PMLR. [122] C. Xiao, T. Ma, A. B. Dieng, D. M. Blei, and F. Wang,
[100] G. Montavon, W. Samek, and K.-R. Müller, "Methods for "Readmission prediction via deep contextual embedding of clinical
interpreting and understanding deep neural networks," Digital concepts," PloS one, vol. 13, no. 4, p. e0195024, 2018.
Signal Processing, vol. 73, pp. 1-15, 2018. [123] R. Davoodi and M. H. Moradi, "Mortality prediction in intensive
[101] M. Ancona, E. Ceolini, C. Öztireli, and M. Gross, "Towards better care units (ICUs) using a deep rule-based fuzzy classifier," Journal
understanding of gradient-based attribution methods for deep neural of biomedical informatics, vol. 79, pp. 48-59, 2018.
networks," arXiv preprint arXiv:1711.06104, 2017. [124] D. Das, J. Ito, T. Kadowaki, and K. Tsuda, "An interpretable
[102] S. Zhang, J. R. Leistico, R. J. Cho, J. B. Cheng, and J. S. Song, machine learning model for diagnosis of Alzheimer's disease,"
"Spectral clustering of single-cell multi-omics data on multilayer PeerJ, vol. 7, p. e6543, 2019.
graphs," bioRxiv, 2022. [125] H. Lee, S. T. Kim, and Y. M. Ro, "Generation of multimodal
[103] M. Belkin and P. Niyogi, "Laplacian eigenmaps and spectral justification using visual word constraint model for explainable
techniques for embedding and clustering," Advances in neural computer-aided diagnosis," in Interpretability of Machine
information processing systems, vol. 14, 2001. Intelligence in Medical Image Computing and Multimodal Learning
[104] P. Comon, "Independent component analysis, a new concept?," for Clinical Decision Support: Springer, 2019, pp. 21-29.
Signal processing, vol. 36, no. 3, pp. 287-314, 1994. [126] N. Prentzas, A. Nicolaides, E. Kyriacou, A. Kakas, and C. Pattichis,
[105] R. Bro and A. K. Smilde, "Principal component analysis," "Integrating machine learning with symbolic reasoning to build an
Analytical methods, vol. 6, no. 9, pp. 2812-2831, 2014. explainable ai model for stroke prediction," presented at the 2019
[106] W. Zhang, F. Liu, L. Luo, and J. Zhang, "Predicting drug side effects IEEE 19th International Conference on Bioinformatics and
by multi-label learning and ensemble learning," BMC Bioengineering (BIBE), 2019, 2019.
bioinformatics, vol. 16, no. 1, pp. 1-11, 2015. [127] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis,
[107] G. Yang, F. Raschke, T. R. Barrick, and F. A. Howe, "Manifold "Explainable ai: A review of machine learning interpretability
Learning in MR spectroscopy using nonlinear dimensionality methods," Entropy, vol. 23, no. 1, p. 18, 2021.
reduction and unsupervised clustering," Magnetic resonance in [128] W. Ge, J.-W. Huh, Y. R. Park, J.-H. Lee, Y.-H. Kim, and A. Turchin,
medicine, vol. 74, no. 3, pp. 868-878, 2015. "An Interpretable ICU Mortality Prediction Model Based on
[108] L. P. Zhao and H. Bolouri, "Object-oriented regression for building Logistic Regression and Recurrent Neural Networks with LSTM
predictive models with high dimensional omics data from units," presented at the AMIA Annual Symposium Proceedings,
translational studies," Journal of biomedical informatics, vol. 60, 2018, 2018.
pp. 431-445, 2016.
Bharati et al.

[129] M. Stojadinovic, B. Milicevic, and S. Jankovic, "Improved for hypoglycemic events detection based on ECG," Scientific
predictive performance of prostate biopsy collaborative group risk reports, vol. 10, no. 1, pp. 1-16, 2020.
calculator when based on automated machine learning," Computers [150] S. Hu et al., "Weakly supervised deep learning for covid-19
in Biology and Medicine, vol. 138, p. 104903, 2021. infection detection and classification from ct images," IEEE Access,
[130] S. M. Lundberg et al., "From local explanations to global 2020.
understanding with explainable AI for trees," Nature machine [151] Y. Liang, S. Li, C. Yan, M. Li, and C. Jiang, "Explaining the black-
intelligence, vol. 2, no. 1, pp. 56-67, 2020. box model: A survey of local interpretation methods for deep neural
[131] A. Eck et al., "Interpretation of microbiota-based diagnostics by networks," Neurocomputing, vol. 419, pp. 168-182, 2021.
explaining individual classifier decisions," BMC bioinformatics, [152] L. Pan et al., "Development of prediction models using machine
vol. 18, no. 1, pp. 1-13, 2017. learning algorithms for girls with suspected central precocious
[132] A. Shrikumar, P. Greenside, and A. Kundaje, "Learning important puberty: retrospective study," JMIR medical informatics, vol. 7, no.
features through propagating activation differences," presented at 1, p. e11728, 2019.
the International conference on machine learning, 2017, 2017. [153] M. S. Kovalev, L. V. Utkin, and E. M. Kasimov, "SurvLIME: A
[133] J. Zuallaert, F. Godin, M. Kim, A. Soete, Y. Saeys, and W. De Neve, method for explaining machine learning survival models,"
"SpliceRover: interpretable convolutional neural networks for Knowledge-Based Systems, vol. 203, p. 106164, 2020.
improved splice site prediction," Bioinformatics, vol. 34, no. 24, pp. [154] A. Meldo, L. Utkin, M. Kovalev, and E. Kasimov, "The natural
4180-4188, 2018. language explanation algorithms for the lung cancer computer-aided
[134] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and diagnosis system," Artificial Intelligence in Medicine, vol. 108, p.
W. Samek, "On pixel-wise explanations for non-linear classifier 101952, 2020.
decisions by layer-wise relevance propagation," PloS one, vol. 10, [155] C. Panigutti, A. Perotti, and D. Pedreschi, "Doctor XAI: an
no. 7, p. e0130140, 2015. ontology-based approach to black-box sequential data classification
[135] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, explanations," presented at the Proceedings of the 2020 conference
"Striving for simplicity: The all convolutional net," arXiv preprint on fairness, accountability, and transparency, 2020, 2020.
arXiv:1412.6806, 2014. [156] S. M. Lauritsen et al., "Explainable artificial intelligence model to
[136] H. Chen, S. Lundberg, and S.-I. Lee, "Explaining models by predict acute critical illness from electronic health records," Nature
propagating Shapley values of local components," in Explainable AI communications, vol. 11, no. 1, pp. 1-11, 2020.
in Healthcare and Medicine: Springer, 2021, pp. 261-270. [157] G. Montavon, A. Binder, S. Lapuschkin, W. Samek, and K.-R.
[137] A. Singh, A. R. Mohammed, J. Zelek, and V. Lakshminarayanan, Müller, "Layer-wise relevance propagation: an overview,"
"Interpretation of deep learning using attributions: application to Explainable AI: interpreting, explaining and visualizing deep
ophthalmic diagnosis," presented at the Applications of machine learning, pp. 193-209, 2019.
learning 2020, 2020. [158] M. Bernardini, L. Romeo, P. Misericordia, and E. Frontoni,
[138] L. Putelli, A. E. Gerevini, A. Lavelli, and I. Serina, "Applying self- "Discovering the type 2 diabetes in electronic health records using
interaction attention for extracting drug-drug interactions," the sparse balanced support vector machine," IEEE Journal of
presented at the International Conference of the Italian Association Biomedical and Health Informatics, vol. 24, no. 1, pp. 235-246,
for Artificial Intelligence, 2019, 2019. 2019.
[139] D. Mascharka, P. Tran, R. Soklaski, and A. Majumdar, [159] S. Ghafouri-Fard, M. Taheri, M. D. Omrani, A. Daaee, H.
"Transparency by design: Closing the gap between performance and Mohammad-Rahimi, and H. Kazazi, "Application of single-
interpretability in visual reasoning," presented at the Proceedings of nucleotide polymorphisms in the diagnosis of autism spectrum
the IEEE conference on computer vision and pattern recognition, disorders: a preliminary study with artificial neural networks,"
2018, 2018. Journal of Molecular Neuroscience, vol. 68, no. 4, pp. 515-521,
[140] D. A. Kaji et al., "An attention based deep learning model of clinical 2019.
events in the intensive care unit," PloS one, vol. 14, no. 2, p. [160] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional
e0211057, 2019. networks for biomedical image segmentation," pp. 234-241:
[141] B. Shickel, T. J. Loftus, L. Adhikari, T. Ozrazgat-Baslanti, A. Springer.
Bihorac, and P. Rashidi, "DeepSOFA: a continuous acuity score for [161] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O.
critically ill patients using clinically interpretable deep learning," Ronneberger, "3D U-Net: learning dense volumetric segmentation
Scientific reports, vol. 9, no. 1, pp. 1-12, 2019. from sparse annotation," presented at the International conference
[142] H. Hu et al., "DeepHINT: understanding HIV-1 integration via deep on medical image computing and computer-assisted intervention,
learning with attention," Bioinformatics, vol. 35, no. 10, pp. 1660- 2016, 2016.
1667, 2019. [162] C. Ulas, G. Tetteh, S. Kaczmarz, C. Preibisch, and B. H. Menze,
[143] J. Zhang, K. Kowsari, J. H. Harrison, J. M. Lobo, and L. E. Barnes, "DeepASL: Kinetic model incorporated loss for denoising arterial
"Patient2vec: A personalized interpretable deep representation of spin labeled MRI via deep residual learning," presented at the
the longitudinal electronic health record," IEEE Access, vol. 6, pp. International conference on medical image computing and
65333-65346, 2018. computer-assisted intervention, 2018, 2018.
[144] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. [163] Y. Xie, G. Gao, and X. A. Chen, "Outlining the design space of
Stewart, "Retain: An interpretable predictive model for healthcare explainable intelligent systems for medical diagnosis," arXiv
using reverse time attention mechanism," Advances in neural preprint arXiv:1902.06019, 2019.
information processing systems, vol. 29, 2016. [164] B. Mittelstadt, C. Russell, and S. Wachter, "Explaining explanations
[145] B. C. Kwon et al., "Retainvis: Visual analytics with interpretable in AI," presented at the Proceedings of the conference on fairness,
and interactive recurrent neural networks on electronic medical accountability, and transparency, 2019, 2019.
records," IEEE transactions on visualization and computer [165] P. Hall, N. Gill, and N. Schmidt, "Proposed guidelines for the
graphics, vol. 25, no. 1, pp. 299-309, 2018. responsible use of explainable machine learning," arXiv preprint
[146] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, arXiv:1906.03533, 2019.
"Learning deep features for discriminative localization," presented [166] D. Holliday, S. Wilson, and S. Stumpf, "User trust in intelligent
at the Proceedings of the IEEE conference on computer vision and systems: A journey over time," presented at the Proceedings of the
pattern recognition, 2016, 2016. 21st international conference on intelligent user interfaces, 2016,
[147] H. Lee et al., "An explainable deep-learning algorithm for the 2016.
detection of acute intracranial haemorrhage from small datasets," [167] M. Sendak et al., "" The human body is a black box" supporting
Nature biomedical engineering, vol. 3, no. 3, pp. 173-182, 2019. clinical decision-making with deep learning," presented at the
[148] P. Rajpurkar et al., "AppendiXNet: Deep learning for diagnosis of Proceedings of the 2020 conference on fairness, accountability, and
appendicitis from a small dataset of CT exams using video transparency, 2020, 2020.
pretraining," Scientific reports, vol. 10, no. 1, pp. 1-7, 2020. [168] Q. V. Liao, M. Singh, Y. Zhang, and R. Bellamy, "Introduction to
[149] M. Porumb, S. Stranges, A. Pescapè, and L. Pecchia, "Precision explainable AI," presented at the Extended Abstracts of the 2021
medicine and artificial intelligence: a pilot study on deep learning CHI Conference on Human Factors in Computing Systems, 2021,
2021.
A Review on Explainable Artificial Intelligence for Healthcare 15

[169] M. G. Kahn et al., "A harmonized data quality assessment segmentation of colorectal polyps," Medical image analysis 60
terminology and framework for the secondary use of electronic (2020): 101619.
health record data," Egems, vol. 4, no. 1, 2016. [183] J. R. Clough et al., "Global and local interpretability for cardiac MRI
[170] B. A. Goldstein, A. M. Navar, M. J. Pencina, and J. Ioannidis, classification," International Conference on Medical Image
"Opportunities and challenges in developing risk prediction models Computing and Computer-Assisted Intervention. Springer, Cham,
with electronic health records data: a systematic review," Journal of 2019.
the American Medical Informatics Association, vol. 24, no. 1, pp. [184] M. Graziani et al., "Concept attribution: Explaining CNN decisions
198-208, 2017. to physicians," Computers in biology and medicine 123 (2020):
[171] V. Vishwarupe, P. M. Joshi, N. Mathias, S. Maheshwari, S. 103865.
Mhaisalkar, V. Pawar, “Explainable AI and Interpretable Machine [185] C.-T. Kor, Y.-R. Li, P.-R. Lin, S.-H. Lin, B.-Y. Wang, and C.-H. Lin,
Learning: A Case Study in Perspective,” Procedia Computer “Explainable Machine Learning Model for Predicting First-Time
Science, Volume 204, 2022, Pages 869-876. Acute Exacerbation in Patients with Chronic Obstructive Pulmonary
[172] T. Miller, Explanation in Artificial Intelligence: Insights from the Disease,” Journal of Personalized Medicine, vol. 12, no. 2, p. 228,
Social Sciences. Artif. Intell. 2019, 267, 1–38. Feb. 2022, doi: 10.3390/jpm12020228.
[173] A. Adadi, M. Berrada, "Peeking Inside the Black-Box: A Survey on [186] Y. Liu, Z. Liu, X. Luo, H. Zhao, "Diagnosis of Parkinson's disease
Explainable Artificial Intelligence (XAI)," IEEE Access 2018, 6, based on SHAP value feature selection," Biocybernetics and
52138–52160. Biomedical Engineering, Volume 42, Issue 3, 2022, Pages 856-869.
[174] L.H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, L. Kagal, [187] C. Wei, L. Zhang, Y. Feng et al., "Machine learning model for
"Explaining Explanations: An Overview of Interpretability of predicting acute kidney injury progression in critically ill patients,"
Machine Learning," In Proceedings of the 2018 IEEE 5th BMC Med Inform Decis Mak 22, 17 (2022).
International Conference on Data Science and Advanced Analytics [188] K. Debjit et al., “An Improved Machine-Learning Approach for
(DSAA), Turin, Italy, 1–4 October 2018; pp. 80–89. COVID-19 Prediction Using Harris Hawks Optimization and
[175] M. Böhle, et al., "Layer-wise relevance propagation for explaining Feature Analysis Using SHAP,” Diagnostics, vol. 12, no. 5, p. 1023,
deep neural network decisions in MRI-based Alzheimer's disease Apr. 2022.
classification," Frontiers in aging neuroscience 11 (2019): 194. [189] Y. Zoabi, , S. Deri-Rozov, N. Shomron, "Machine learning-based
[176] T. Araújo et al., "DR| GRADUATE: Uncertainty-aware deep learning- prediction of COVID-19 diagnosis based on symptoms," npj Digit.
based diabetic retinopathy grading in eye fundus images," Medical Med. 4, 3 (2021).
Image Analysis 63 (2020): 101715. [190] S. Ali, Y. Zhou, M. Patterson, "Efficient analysis of COVID-19
[177] L. Li et al., "A large-scale database and a CNN model for attention- clinical data using machine learning models," Med Biol Eng Comput
based glaucoma detection," IEEE transactions on medical imaging 60, 1881–1896 (2022).
39.2 (2019): 413-424. [191] J. H. Ong, K. M. Goh and L. L. Lim, "Comparative Analysis of
[178] E. Pesce et al., "Learning to detect chest radiographs containing Explainable Artificial Intelligence for COVID-19 Diagnosis on
pulmonary lesions using visual attention networks," Medical image CXR Image," 2021 IEEE International Conference on Signal and
analysis 53 (2019): 26-38. Image Processing Applications, 2021, pp. 185-190.
[179] S. Rajaraman et al., "Visualizing and explaining deep learning [192] D. Chowdhury, S. Poddar, S. Banarjee et al., "CovidXAI: explainable
predictions for pneumonia detection in pediatric chest radiographs," AI assisted web application for COVID-19 vaccine prioritization,"
Medical Imaging 2019: Computer-Aided Diagnosis. Vol. 10950. Internet Technology Letters, 2022; 5(4):e381.
SPIE, 2019. [193] L. V. De Moura et al., "Explainable Machine Learning for COVID-
[180] J. Silva-Rodríguez et al., "Going deeper through the Gleason scoring 19 Pneumonia Classification With Texture-Based Features
scale: An automatic end-to-end system for histology prostate Extraction in Chest Radiography," Frontiers in digital health 3
grading and cribriform pattern detection," Computer Methods and (2021).
Programs in Biomedicine 195 (2020): 105637. [194] B. Alsinglawi, O .Alshari, M. Alorjani et al., "An explainable machine
[181] Z. Bian et al., "Weakly supervised Vitiligo segmentation in skin learning framework for lung cancer hospital length of stay
image through saliency propagation," 2019 IEEE International prediction," Sci Rep 12, 607 (2022).
Conference on Bioinformatics and Biomedicine (BIBM). IEEE, [195] A. F. Markus, J. A. Kors, and P. R. Rijnbeek, "The role of explainability
2019. in creating trustworthy artificial intelligence for health care: a
[182] K. Wickstrøm, M. Kampffmeyer, and R. Jenssen, "Uncertainty and comprehensive survey of the terminology, design choices, and
interpretability in convolutional neural networks for semantic evaluation strategies," Journal of Biomedical Informatics, vol. 113,
2021.

AUTHORS’ BIOGRAPHIES as a Faculty Member at the Institute of Information and


Communication Technology (IICT) in BUET, Bangladesh. He has
Subrato Bharati was with the Institute of published a number of papers in journals, conferences and book
Information and Communication Technology, chapters, and edited several books published by reputed publishers.
BUET, Bangladesh, and now with the Department His research interests include artificial intelligence, image processing,
of Electrical and Computer Engineering, Concordia bioinformatics, wireless communications, and cryptography.
University, Montreal, QC, Canada. He authored/co-
authored over 50 Journal articles, Conference Prajoy Podder is currently a Researcher at the
Proceedings, and Book Chapters published by IEEE, Elsevier, Institute of Information and Communication
Springer, and others. He is an Associate Editor of Journal of the Technology, Bangladesh University of Engineering
International Academy for Case Studies and a Guest Editor of a special and Technology. He authored/co-authored over 45
issue in Journal of Internet Technology. His research interests include Journal articles, Conference Proceedings, and Book
health informatics, medical imaging, digital signal and image Chapters published by IEEE, Elsevier, Springer, and
processing, eXplainable AI, machine and deep learning. others. His research interests include Digital Image Processing, Data
Mining, IoT, Machine Learning, and Health Informatics.
M. Rubaiyat Hossain Mondal obtained the PhD
degree in 2014 from the Department of Electrical
and Computer Systems Engineering, Monash
University, Melbourne, Australia. From 2005 to
2010, and from 2014 to date, he has been working

You might also like