Professional Documents
Culture Documents
ANDROID BASED Project Machine
ANDROID BASED Project Machine
Research Article
Keywords: Sanctions, Anti-Money Laundering, Machine Learning, Arti cial Neural Network, Basel
DOI: https://doi.org/10.21203/rs.3.rs-2511798/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
1. Introduction
The Basel Anti-Money Laundering (AML) Index combines the Financial Action Task
Force (FAFT) and World Bank statistics to analyze country-specific money laundering risks. The
Basel Institute of Governance ("Basel Institute") created the AML index to combat money
laundering and terrorist financing (ML/TF). This paper aims to predict Basel money laundering
sanctions using machine learning (ML) and artificial intelligence (AI) algorithms. This aim is
decomposed into two objectives: (1) predict the likelihood of AML sanctions using risk
indicators from Basel and the World Bank, and (2) find the best predictors for forecasting the
This paper makes two contributions to the literature on money laundering in finance.
Firstly, this paper shows how ML and AI models can be employed to fight money laundering by
identifying the important features in a set of financial and economic indicators. Second, many
countries have AML laws to detect and prevent ML/TF. These AML regulations are not
enforced, limiting their value. AML teams fine-tuned their parameters and risk models to
improve detection, which made compliance less agile and law enforcement more difficult. ML
and AI can fix these problems by using data to find risk factors, rank ML/TF risks, and make
predictions.
classification problems in financial crime detection, the literature on the use of these algorithms
to detect money laundering is still limited (Canhoto, 2021; Chen et al., 2018; Jullum et al., 2020;
Shou et al., 2021). The literature that has been assembled mostly focuses on statistical data
mining techniques to conduct fraud risk profiling and flag suspicious money laundering
transactions (Ferwerda et al., 2019; King et al., 2018; Lokanan, 2019; Sudjianto et al., 2012).
1
Even though there is a rich scholarship on the application of ML and AI to predict money
laundering transactions, this stream of research has mostly concentrated on data from banks and
financial institutions (Lokanan, 2022; Jullum et al., 2020; Zhang & Trubey, 2019) and is often
descriptive and suggestive with very little details on the models and the performance metrics
This paper is different for several reasons. First, it uses the same ML and AI-based
techniques to predict countries on the Basel AML sanction blocklist. Second, this project
measures the risks of ML/TF using indicators of countries' adherence to anti-money laundering
and combating the financing of terrorism (AML/CFT) regulations that were collated from FAFT,
Transparency International, the World Bank, and the World Economic Forum.
2. Research Methodology
The Basel AML Index is a prominent source of data for this project. Established by the
systems in place across 203 countries. Global economic and development performance data were
retrieved from the World Bank portal. The World Bank data provided more financial and
economic information about the countries and improved the predictive power of the ML and AI
algorithms. The time span for the data is for the year 2021.
economic and risk metrics for evaluating a country's AML performance. These variables were
selected for two reasons. Firstly, they comprise the risk scores contributing to the Basel AML
2
index of high-risk ML/TF across countries (see Basel, 2021). Secondly, the World Bank data
consists of standalone risk evaluation solutions and can be employed for independent
benchmarking to validate in-house ML/TF risk assessments (see Jullum et al., 2020; Lokanan,
2022). Taken together, these variables contribute to the factors that predict the probability of a
The dependent variable is the country sanctioned. Sanctions are levelled at countries that
ended up on the Basel AML blocklist. Sanctions were coded as 0 (no) when the country was not
2.3. SMOTE+ENN
This project involves working with an imbalanced dataset. Class imbalance occurs when
one label of the target variable has far more instances than the other, leading to an imbalance in
the model's ability to generalize on unseen data (Lokanan, 2022). Countries sanctioned account
3
for 27% of the data, while countries not sanctioned account for 73%. To address the class
imbalance problem, the synthetic minority over-sampling (SMOTe) and edited nearest neighbour
(ENN) algorithms were used to up-sample the data. SMOTe is a technique that helps to reduce
bias towards the minority class in a dataset by allowing the algorithm to better learn from and
classify elements from the data. The ENN algorithm aims to maximize accuracy and diversity in
the data by generating synthetic instances from existing observations while removing, if
necessary, those that could contaminate the minority class sample (Lokanan, 2022).
Some of the more representative classifiers used in money laundering prediction studies
are linear and probabilistic algorithms such as logistic regression, support vector machine
(SVM), and Naive Bayes (Cerulli, 2021; Chen et al., 2018; Zhang and Trubey, 2019). Others
have employed non-linear and more complex algorithms, namely ANN and tree-based
classifiers, to provide new insights into classification and predictive money laundering problems
(Chen et al., 2018; Jullum et al., 2020; Zhang and Trubey, 2019). ML and AI-based algorithms
can apply pre-set risk criteria to train, test, and validate the models on unseen data (Lokanan,
2022). These advanced ML and AI techniques can recognize patterns in large volumes of data
across various formats, making them highly effective for uncovering risks and abnormal
transactions (Bao et al., 2022; Jullum et al., 2020). Their ability to continually train and improve
through reinforcement learning allows these algorithms to become even more reliable over time
(Chen et al., 2018). Consequently, ML and AI methods are crucial for consolidating and
SVM, and Naive Bayes classifiers were employed to address linearity in the data. Random
4
forest, gradient descent, and ANN algorithms were used to approximate non-linear patterns
within the data (Breiman, 2001; Chen et al., 2018; Zhang and Trubey, 2019). These algorithms
and their variations allow for robust data analysis while preventing potential bias or unreasonable
assumptions (Cerulli, 2020). Appendix 1 shows the parameters used to optimize and tune the
models.
The confusion matrix is used to calculate the performance of ML and ANN models
(Lokanan, 2022). Four classes represent the confusion matrix for binary classification models:
True Positive (TP): The algorithm predicts the country will be sanctioned, and the country is
sanctioned.
True Negative (TN): The algorithm predicts that the country will not be sanctioned and was not
sanctioned.
False Positive (FP): The algorithm predicted that the country would be sanctioned, but it was
not sanctioned.
False Negative (FN): The algorithm predicted the country would not be sanctioned but was
sanctioned.
Table 2 presents information on the evaluation metrics used in this study. The most
important criterion for benchmarking the performance of ML and ANN models is accuracy;
however, because of the imbalanced dataset used in this study, balanced accuracy (BAC)
included the allowance of non-Gaussian errors, nonzero means, and serially and
more recent applications, population estimates, and model coefficients have been used to
evaluate and improve performance predictions (Clark and McCracken, 2013). While it exhibits
limitations, such as sensitivity to sample size and length of data used to estimate the model, the
Diebold–Mariano test offers a viable solution for comparing two predictive models for
5
forecasting accuracy without making assumptions about the underlying data distribution
(Diebold, 2015).
With imbalanced datasets, accuracy is not the best predictor for classification problems
because it ignores FPs and FNs (Cerulli, 2020; Lokanan, 2022). Sensitivity, specificity, accuracy,
F-scores, and receiver operating characteristics (ROC) curve are better metrics for imbalanced
datasets (Shou et al. 2021). The True Positive Rate (TPR) measures the model's sensitivity.
Specificity evaluates the proportion of cases accurately categorized as TN and presents the True
Negative Rate (TNR), or all predicted negative observations. The precision ratio measures the
model's accuracy in predicting positive classes, and the F-measure combines precision and
sensitivity/recall. The ROC curve is combined with the area under the curve (AUC) and the ROC
Metrics Formulae
TP+TN
TP+FP+FN+TN
Accuracy
TP FP
Balanced Accuracy ( + ) /2
TP+FN TN+FP
6
3. Empirical Results
Table 3 presents the summary statistics of the numerical features. Countries with high
unemployment rates and low total reserves are likelier to be on the Basel sanctions list than
others. Basel assigns a risk score ranging from 0 (low) to 10 (high) (see Basel, 2021). The
average overall risk score for most countries was 5.4. Countries with poor ML/TF risk scores had
a mean of 5.7 and a standard deviation of 1.4, indicating significant variations between countries.
Over 75% of the sanctioned countries' scores exceed this mean value. Regarding financial
transparency, about 75% of the sanctioned nations scored higher on the risk scale (6.95)
7
3.2. Model Performance
3.2.1 Accuracy
performance accuracy’s training and testing scores. As seen in Table 4, except for the ANN
model, there are no significant discrepancies in the training and testing scores for the other
algorithms. The logistic regression (84%) and SVM (84%) models scored highest in accuracy,
indicating they both performed exceptionally well in predicting sanctions. Additionally, the
logistic regression classifier had a slightly higher BAC (85%) than the SVM (84%), further
solidifying its superiority among these models. Together, these results suggest that logistic
regression and SVM can be effective tools for predicting money laundering sanctions.
Table 5 presents the evaluation metrics for money laundering sanctions. The ANN (84%),
followed by the logistic regression (81%) models, achieved the highest sensitivity scores,
meaning they did a good job predicting whether Basel sanctioned countries. Logistic regression
and SVM had the highest specificity score (88%) when classifying countries that were not
8
sanctioned (TN). Furthermore, when identifying which countries were sanctioned, the logistic
regression (88%) and the SVM (87%) models had the highest precision scores. Likewise, logistic
regression (84%) and SVM (83%) had the best F-scores for predicting sanctions.
The ROC curve is resistant to class distribution (Chen et al., 2018; Lokanan, 2022). As
shown in Figure 1, the AI sequential ANN model outperformed all other classifiers with an
AUROC score of 90%. Of the ML classifiers, the SVM (87%), gradient descent (86%), and
logistic regression (85%) models had the highest AUROC scores in predicting sanctions. In this
case, the AUROC provides insights into the effectiveness of each classifier and calculates the
9
Figure 1: AUROC Scores
Figure 2 indicates which features of a country are most relevant to predict sanctions.
Financial transparency, political and legal risks, unemployment rate, and ML/TF risks are the
top indicators that a nation will end up on a Basel AML sanction list. While inflation, public
transparency, accountability, and total reserves are significant predictors, they play lesser
roles than the top four. These features offer invaluable insights into detecting potential hot
spots and forecasting countries that face an increased chance of receiving sanctions.
10
Figure 2: Variable Importance
4. Conclusion
Altogether, these findings demonstrate that money laundering sanctions can be accurately
predicted using ML and AI algorithms, with logistic regression and SVM being the best-
performing models. Notably, the AI sequential model performed very well in this regard; the
11
sensitivity scores and AUROC measures showed that the ANN classifier successfully predicted
countries at risk of money laundering sanctions from Basel. Financial transparency, political and
legal risks, the unemployment rate, and factors associated with ML/TF risks were among the
strongest predictors of sanctions. Future research could investigate other algorithms using
different datasets to identify money laundering risks and allow for timely interventions.
Algorithm Hyperparameters
Logistic Regression Regularisation strength (c) class_weight solver max_iterations
default=1.0 {0: 1, 1: 1} (All class have weight 1) lbfgs 100
Random Forests n_estimators = 200 max_features = log2 max_depth =7 criterion = gini
SVM kernel = linear default= 1.0 probability=true
Naives Bayes default = none
Gradient Descent loss= hinge penalty=12 alpha =0.01 weight = balance
ANN activation = relu model=crossentrpphy optimizer=adam
Reference:
Basel Governance. (2021). Basel AML Index | Basel Institute on Governance. Basel Institute on
Governance. https://baselgovernance.org/taxonomy/term/483
Bao, Y., Hilary, G., and Ke, B. (2022). Artificial intelligence and fraud detection. In Innovative
technology at the interface of finance and operations (pp. 223-247). Springer, Cham.
Canhoto, A. I. (2021). Leveraging machine learning in the global fight against money laundering
and terrorism financing: An affordances perspective. Journal of business research, 131,
441-452, https://doi.org/10.1016/j.jbusres.2020.10.012
Clark, T., & McCracken, M. (2013). Advances in forecast evaluation. Handbook of economic
forecasting, 2, 1107-1201.
Chen, Z., Van Khoa, L. D., Teoh, E. N., Nazir, A., Karuppiah, E. K., and Lam, K. S. (2018).
Machine learning techniques for anti-money laundering (AML) solutions in suspicious
transaction detection: a review. Knowledge and Information Systems, 57(2), 245-285.
12
Diebold, F. X., & Mariano, R. S. (2002). Comparing predictive accuracy. Journal of Business &
economic statistics, 20(1), 134-144.DOI: 10.1198/073500102753410444
Ferwerda, J., Deleanu, I. S., & Unger, B. (2019). Strategies to avoid blacklisting: The case of
statistics on money laundering. PloS one, 14(6), e0218532.
Jullum, M., Løland, A., Huseby, R. B., Ånonsen, G., and Lorentzen, J. (2020). Detecting money
laundering transactions with machine learning. Journal of Money Laundering Control,
23(1), 173-186.
King, C., Walker, C., & Gurulé, J. (Eds.). (2018). The Palgrave handbook of criminal and
terrorism financing law. Cham: Palgrave Macmillan.
Lokanan, M. E. (2022). Predicting Money Laundering Using Machine Learning and Artificial
Neural Networks Algorithms in Banks. Journal of Applied Security Research, 1-25.
https://doi.org/10.1080/19361610.2022.2114744
Lokanan, M. E. (2019). Data mining for statistical analysis of money laundering transactions.
Journal of Money Laundering Control, 22(4), 753-763.
Shou, M., Bao, X., and Yu, J (2021) An optimal weighted machine learning model for detecting
financial fraud. Applied Economics Letters, 1-6,
https://doi.org/10.1080/13504851.2021.1989367
Sudjianto, A., Nair, S., Yuan, M., Zhang, A., Kern, D., & Cela-Díaz, F. (2010). Statistical
methods for fighting financial crimes. Technometrics, 52(1), 5-19.
Zhang, Y., & Trubey, P. (2019). Machine learning and sampling scheme: An empirical study of
money laundering detection. Computational Economics, 54(3), 1043-1063.
13