Finance Research Letters 58 (2023) 104309

Contents lists available at ScienceDirect

Finance Research Letters

journal homepage:

A user-centered explainable artificial intelligence approach for

financial fraud detection
Ying Zhou a, Haoran Li a, Zhi Xiao a, b, *, Jing Qiu a
School of Economics and Business Administration, Chongqing University, Chongqing 400030, China
Chongqing Key Laboratory of Logistics, Chongqing University, Chongqing 400030, China


Keywords: This paper aims to produce user-centered explanations for financial fraud detection models based
Financial fraud detection on Explainable artificial intelligence (XAI) methods. By combining an ensemble predictive model
Explainable artificial intelligence with an explainable framework based on Shapley values, we develop a financial fraud detection
approach that is accurate and explainable at the same time. Our results show that the explainable
framework can meet the requirements of different external stakeholders by producing local and
global explanations. Local explanations can help understand why a specific prediction is identi­
fied as fraud, and global explanations reveal the overall logic of the whole ensemble model.

1. Introduction

Recent years have witnessed the fact that artificial intelligence (AI)—specifically, machine learning (ML)—is being increasingly
used in financial fraud detection to improve predictive performance (Abbasi et al., 2012; Bao et al., 2020; Bertomeu et al., 2021; Perols
et al., 2017). The high predictive performance of ML models often comes at the expense of their explainability about why the models
generate a certain output (Gunning et al., 2019; Meske et al., 2022). Both regulators and ethicists are concerned with the explainability
of AI systems (EU, 2016; European Commission, 2019; FSB, 2017; Kim and Routledge, 2022). The lack of explainability gradually
becomes one of the major challenges in the practical applications of ML-based fraud detection models for stakeholders.
Explainable artificial intelligence (XAI) is the method that can achieve both explainability and high predictive performance (Bauer
et al., 2023). Providing explanations is the key element that enables human users to understand the way ML models mine and leverage
information. Implementing and using XAI methods are conducive to building justified confidence, enabling enhanced control,
improving applied models, and discovering new facts (Adadi and Berrada, 2018). XAI methods assist stakeholders in making more
informed decisions instead of blindly trusting the output of the ML model.
Producing user-centered explanations is significant for applications where complex ML models are part of how stakeholders come
to make decisions. However, most of the researches designed the XAI approach without evaluating whether the explanations satisfied
the needs of real users (Miller, 2019). In order to produce user-satisfied explanations, we seek to link the needs of intended users and
state-of-the-art XAI techniques in financial fraud detection. Furthermore, previous XAI studies focused more on developers to help
them improve the development process (Bhatt et al., 2020). In this study, we focus on stakeholders concerned about financial fraud,
external to the development.
This paper proposes a systematic analytical framework in response to the challenge that ML-based fraud detection models have to

E-mail address: (Z. Xiao).
Received 24 May 2023; Received in revised form 2 August 2023; Accepted 6 August 2023
Available online 9 August 2023
Y. Zhou et al. Finance Research Letters 58 (2023) 104309

be human-understandable. Our contribution is based on linking the XAI methods with the decision-making requirements of external
stakeholders concerned about financial fraud. The objective is to provide a bridge between ML applications in fraud detection and
external stakeholders to meet their decision needs. We develop an accurate and explainable financial fraud detection approach for
tackling the task. By combining a predictive model with an explainable framework, the approach can provide external stakeholders
with accurate prediction results and user-centered explanations.
Specifically, we build an ensemble model based on raw financial statement data to predict financial fraud. Furthermore, we
describe external stakeholders concerned about financial fraud and analyze their explainability requirements for ML-based detection
models, and then develop an explainable framework based on Shapley values. We present four types of explanations in a meaningful
context, which can be summarized in local and global explanations. Local explanations provide information about why a specific
prediction is identified as fraud and how stakeholders could make an investigation plan according to the machine’s logic. Global
explanations show degrees of confidence for the whole model and the relationships between predictions and features from a global

2. Data and sample selection

The dataset spans from 2007 to 2020 with 37,502 firm-year observations of Chinese non-financial companies, including 432
fraudulent observations that were convicted by regulators and 37,070 nonfraudulent observations.
Following Bao et al. (2020) and Achakzai and Juan (2022), we use raw financial data in financial statements as our features.
Machines may capture possibly unknown patterns from complex data by themselves(Agarwal and Dhar, 2014). In this study, we use a
more comprehensive feature set of raw financial data, directly taken from three key financial statements (balance sheet, income
statement, and cash flow statement), to involve more information and unleash the power of the ML model (Bertomeu et al., 2021; Chen
et al., 2022a). The feature list of raw financial data is shown in Appendix. The data in this study are collected from China Stock Market
and Accounting Research (CSMAR) database. As XGBoost can deal with missing values automatically, we do not handle this problem
We separate the dataset into a training dataset (2007–2017) and a testing dataset (2018–2020). The training dataset is further
divided into a training data subset (2007–2015) and a validation dataset (2016–2017). We use the training data subset to train a model
for some hyperparameters. The hyperparameters that generate the highest out-of-sample AUC on the validation dataset are chosen by
the grid search process. The whole training dataset is used to train the final model with selected hyperparameters.

3. Methodology

3.1. An explainable framework for external stakeholders

There are four external stakeholder groups concerned about financial fraud (Dechow et al., 2011). (i) Regulators. They need to
identify the companies suspected of being involved in fraud to strengthen investor protection and improve regulatory policies. (ii)
Auditors. They want to obtain more reliable audit evidence about financial fraud to support their audit opinions and reduce audit risks.
(iii) Analysts. They desire to improve the assessment accuracy of financial fraud to prevent reputational damage. (iv) Investors, e.g.
outside shareholders, banks, or other creditors. They hope to select companies with low fraud risks to avoid investment losses.
We further analyze the underlying explainability requirements of these external stakeholders. Table 1 outlines external stakeholder
groups and their requirements for explanations in financial fraud detection, e.g. the regulators may seek an explanation to assess
confidence for the whole detection model as well as an explanation to discover outliers in a specific prediction as further investigation
clues. The approach proposed in the next section is performed to tackle these explainability requirements.

3.2. Proposed approach

This section introduces the proposed approach procedure, as shown in Fig. 1.

Step1: Undersampling multiple training subsets

Financial fraud detection faces the class imbalance problem. We generate multiple undersampled training subsets. Each

Table 1
The explainability requirements for external stakeholders in financial fraud detection.
Explainability requirements Regulators Auditors Analysts Investors

Local explanations for a given observation

Which features contribute to individual predictions? ✓ ✓ ✓ ✓
How do models arrive at their predictions? ✓ ✓

Global explanations for the whole model

What are the impacts of important features? ✓ ✓ ✓ ✓
What are the relationships between outputs and features? Is there feature interaction? ✓ ✓ ✓

Y. Zhou et al. Finance Research Letters 58 (2023) 104309

Fig. 1. Schematic overview of the proposed approach.

undersampled training subset is comprised of all the fraudulent observations and a random subset of nonfraudulent observations of the
same size.

Step 2: Training base classifiers

For each training subset, an XGBoost classifier is trained. XGBoost is a variant of Gradient boosting that has shown good perfor­
mance in diverse domain applications.

Step 3: Combining base classifiers and selecting the right models

We use majority voting, which is proven to be robust compared to other combination methods (Genre et al., 2013; Sesmero et al.,
2021; Wang et al., 2022), to combine base classifiers. The final prediction result would be generated as follows:

y = argmax
̂ χ A (ft (x) = c) (1)

where χA(ft(x) = c) is the set of base classifiers of class c, and ft(x) is the predicted output of base classifier Ft.
Then, for each observation, the base classifier that generates identical predictions with the final result is considered as the right
model. The right model set is expressed as R = {f1,f2,…, fH}, where f h (x) = ̂ y.

Step 4: Computing explanations based on Shapley values

SHAP (SHapley Additive exPlanations) is a model-agnostic method that can provide explanations for complex models in various
fields, such as credit risk management (Bussmann et al., 2021; Wu et al., 2022) or crypto asset allocation (Babaei et al., 2022).
Compared to alternative model-agnostic XAI methods, SHAP is the only possible explanation model to guarantee desirable properties:
local accuracy, consistency, and missingness (Lundberg and Lee, 2017). Therefore, we use SHAP which is based on game-theoretic
Shapley values to generate explanations:

Y. Zhou et al. Finance Research Letters 58 (2023) 104309

f (x) = ϕ0 (f ) + ϕi (f , x) (2)

where ϕ0(f) is the expectation of the model output values, and the sum of ϕi(f,x) matches the original tree-based model output f(x).
The ϕi(f,x) is the contribution of feature xi that is calculated as follows:
1 ∑ v(S ∪ i) − v(S)
ϕi (f , x) = (3)
M S⊆N\i Cn−|S| 1

where v(S) is the value of a feature subset S of inputs.

The equation represents a weighted mean of the contribution differences between all feature subsets whether containing feature xi
or not.
Classic SHAP algorithm causes high computational cost. The Tree SHAP algorithm computes explanations in low order polynomial
time. Hence, we use the tree SHAP algorithm (Lundberg et al., 2020) to generate local explanations.
For each observation, we obtain the explanation set E = {ϕi(f1,x), ϕi(f2,x), …, ϕi(fT,x)} for each feature xi. The explanations
generated from the right models are expressed as RE = {ϕi(f1,x), ϕi(f2,x), …, ϕi(fH,x)}.

Step 5: Generating appropriate explanations

The explanations on different levels are conducted:

(1) Local explanations for a given observation. Shapley values satisfy the additivity property. We calculate mean value of the
explanations of the right models and consider explanations of the right models as the final local explanations. The explanations
generated by different base classifiers are used to compare for further analysis.
(2) Global explanation for the whole ensemble model. We average the absolute values of final Shapley values for all observations of
interest to generate a global explanation of the ensemble model. Then, we reveal the impact of a feature on the final results and
the interaction between two features on the outputs.

4. Empirical analysis

In this section, we apply the proposed approach for financial fraud detection to actual Chinese data. The hyperparameters used for
XGBoost are eta:0.5, gamma:0.2, n_estimators:700, max_depth:5, min_child_weight:1, subsample:0.6, nthread:3, colsample_bytree:0.9,
reg_alpha:1e-5, scale_pos_weight:1.1 In our experiments, proposed ensemble model has 10 base classifiers and generates an AUC of
0.79, which improves prediction performance than XGBoost as it is (AUC=0.58). We next examine the usefulness of random under­
sampling, random oversampling and SMOTE methods in our dataset.2 The results shows that random undersampling (AUC=0.76),
random oversampling (AUC=0.63) and SMOTE(AUC=0.69) methods do not perform better than our model. The explanation results
for the financial fraud detection model are presented in the following sub-sections.

4.1. Local explanations for a given observation

In this section, we randomly pick observation A and observation B from the predicted fraudulent observations in the testing set.

(1) Which features contribute to individual predictions?

Fig. 2 shows which features contribute more to the model output for each observation, and indicates the positive or negative
contribution of each feature. The red color represents a positive contribution, implying this feature increases fraud risk. In contrast, the
blue color indicates a negative impact on the probability of fraud. Fig.2 clearly shows the local explanations for individual observations
are personalized. For example, “Prepayments” has the largest positive contribution for observation A, while the positive contribution
of “Goodwill” is the largest for observation B.

Detailed explanations about hyperparameters can be found on the following page:
Other methods to address class imbalance problems, such as extreme value models (Calabrese and Giudici, 2015), may be effective in the fraud
setting. Future research can examine this and other alternative methods in a fraud context.

Y. Zhou et al. Finance Research Letters 58 (2023) 104309

Fig. 2. The local bar plots for samples predicted to be fraud.

(2) How do models arrive at their predictions?

We use decision plots to further interpret the predictions of the fraud detection ensemble model. In this part, we compare pre­
dictions from base classifiers of the ensemble model as clues for further investigations for stakeholders. Fig. 3 shows the decision plots
of local explanations, which depict how ML models arrive at their outputs. A legend identifies the prediction of each base classifier, in
which “1” represents the predicted label of fraud and “0” represents the predicted label of non-fraud. As shown in Fig. 3(b), different
base classifiers generate inconsistent results. It might be worth investigating the difference between the decision paths of different
predicted labels. These decision plots may provide more abundant information about the local explanations of the ensemble model for
supporting users’ decision-making.

4.2. Global explanations for the whole model

(1) What are the impacts of important features?

Stakeholders may intend to understand the feature contributions to fraud for the whole model. Fig. 4 provides a whole insight into
the global explanation of all observations in the testing dataset. The red color means the value of the feature is high, while the blue
color indicates the feature has a low value. Fig. 4 shows the positive or negative impacts of the features sorted by feature importance
and depicts the distributions of SHAP values for these features. The explanation for important features is roughly consistent with
existing domain knowledge from experts. For example, features associated with related-party transactions, such as prepayment, are
also used as important features in previous fraud detection studies (Achakzai and Juan, 2022; Wei et al., 2017). Moreover, we show
that goodwill, rarely used in previous fraud detection studies, is an important feature for fraud detection. This conclusion is in line with
the findings of the accounting literature, which show that managers could manipulate goodwill impairment to inflate earnings (Han
et al., 2021; Li and Sloan, 2017).

Y. Zhou et al. Finance Research Letters 58 (2023) 104309

Fig. 3. The decision plots for samples predicted to be fraud.

Y. Zhou et al. Finance Research Letters 58 (2023) 104309

Fig. 4. Overall contributions of the most important features to model outputs.

Fig. 5. Dependence scatter plots of single features.

Y. Zhou et al. Finance Research Letters 58 (2023) 104309

Fig. 6. Dependence scatter plots of two features.

(2) What are the relationships between outputs and features? Is there feature interaction?

We take the top four features as examples shown in Fig. 4, and present their dependence scatter plots based on SHAP values in
Fig. 5. The distribution of data values is depicted by the grey area. Fig.5 shows the intuitive relationships between predictions and
features of interest from a global perspective. For example, Fig. 5(a) reveals the impact of “Goodwill” on the predicted outcome. When
“Goodwill” is greater than about 2e8, the increase in “Goodwill” implies the probability of fraud risk is increased.
The vertical dispersion in Fig. 5(d) shows that the same value for “Operating profit” can make a different contribution to the model
output for different observations. This may indicate there are interaction effects between “Operating profit” and other features. To
further understand how the feature interactions affect the model predictions, we present the interactions between “Operating profit”
and two other features. Fig. 6 presents that different values of “Goodwill” or “Total assets” change the impact of “Operating profit” on
the model results because of the interactions between features.

5. Conclusions

This paper proposes an accurate and explainable financial fraud detection approach to meet the needs of external stakeholders. We
demonstrate the adoption of SHAP to provide stakeholders with user-centered explanations including local explanations and global
explanations. The generated explanations in this study are not equivalent to causal drivers of fraud, rather they are main drivers of ML-
based predictions with desirable theoretical properties. These explanations can be then supervised by accounting experts for further
Future research can apply other XAI techniques in our setting, such as Shapley Lorenz values (Giudici and Raffinetti, 2021), for a
detailed comparison. Future research can also extend our analysis by considering other potential informational requirements, such as
measuring fairness (see e.g. Chen et al., 2022b) in fraud detection models. Furthermore, an interesting extension would be to improve
our approach by developing a selection procedure based on predictive performance, which may need appropriate statistical testing.


This work was supported by the National Natural Science Foundation of China [grant numbers 72071021, 71671019]

CRediT authorship contribution statement

Ying Zhou: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing –
original draft. Haoran Li: Writing – review & editing. Zhi Xiao: Funding acquisition, Project administration, Resources, Supervision.
Jing Qiu: Validation.

Declaration of Competing Interest

The authors declare no competing interests.

Data availability

Data will be made available on request.

Y. Zhou et al. Finance Research Letters 58 (2023) 104309

Appendix. Feature list of raw financial data


(Balance sheet items) Non-controlling interests

Monetary funds Total shareholders’ equity
Financial assets held for trading (Income statement items)
Derivative financial assets Total Operating revenue
Notes receivable Operating revenue
Accounts receivable Total Operating costs
Accounts receivable financing Operating costs
Prepayments Taxes and surcharges
Other receivables Selling expenses
Inventories Administrative expenses
Contract assets Research and development expenses
Current portion of non-current assets Finance expenses
Other current assets Interest expenses
Total current assets Interest income
Debt investments Other income
Other debt investments Income from investments
Long-term receivables Investment income from associates and joint ventures
Long-term equity investment Derecognition of financial assets at amortized cost
Investment in other equity Gains or losses from net exposure hedging
Other non-current financial assets Gains or losses from changes in fair values
Investment properties Credit impairment losses
Fixed assets Impairment losses
Construction in progress Gains or losses from asset disposals
Productive biological assets Operating profit
Oil and gas assets Non-operating income
Right-of-use assets Non-operating expenses
Intangible assets Profit before tax
Development expenditure Income tax
Goodwill Net profit
Long-term deferred expenses Net profit attributable to parent company
Deferred tax assets Net profit attributable to non-controlling interests
Other non-current assets Basic earnings per share
Total non-current assets Diluted earnings per share
Total assets Total comprehensive income
Short-term borrowings Total comprehensive income attributable to parent company
Financial liabilities held for trading Total comprehensive income attributable to non-controlling interests
Derivative financial liabilities (Cash flow statement items)
Notes payable Cash received from sales and services
Accounts payable Tax and surcharge refunds
Payments received in advance Other cash receipts related to operating activities
Contract liabilities Cash paid for goods and services
Employee benefits payable Cash paid to and for employees
Tax payables Taxes and surcharges paid
Other payables Other cash payments related to operating activities
Current portion of non-current liabilities Net cash flows from operating activities
Other current liabilities Cash received from withdrawal of investments
Total current liabilities Cash received from investment income
Long-term borrowings Net proceeds from disposals of fixed assets, intangible assets and other long-term assets
Bonds payable Net proceeds from disposal of subsidiaries and other business units
Lease liabilities Other cash receipts related to investing activities
Long-term payables Cash paid for fixed assets, intangible assets and other long-term assets
Estimated Liabilities Cash paid for investments
Deferred tax liabilities Net cash paid for acquiring subsidiaries and other business units
Other non-current liabilities Other cash payments related to investing activities
Deferred income Net cash flows from investing activities
Total non-current liabilities Cash received from investments by others
Total liabilities Cash received by subsidiaries from non-controlling investors
Share capital Cash received from borrowings
Other equity instruments Other cash receipts related to other financing activities
Preferred stock Cash repayments for debts
Perpetual debt Cash paid for distribution of dividends and profit and for interest expenses
Capital reserves Dividends or profit paid by subsidiaries to non-controlling investors
Treasury stock Other cash payments related to financing activities
Other comprehensive income Net cash flows from financing activities
Special reserves Effect of changes in foreign exchange rates on cash and cash equivalents
Surplus reserve Net increase in cash and cash equivalents
Retained earnings Opening balance of cash and cash equivalents
Equity attributable to parent company Closing balance of cash and cash equivalents

Y. Zhou et al. Finance Research Letters 58 (2023) 104309


