Professional Documents
Culture Documents
Novel Feature Selection Framework For Credit Card Fraud Detection
Novel Feature Selection Framework For Credit Card Fraud Detection
Novel Feature Selection Framework For Credit Card Fraud Detection
Shahrood University of
Technology
Research paper
X-SHAoLIM: Novel Feature Selection Framework for Credit Card
Fraud Detection
Sajjad Alizadeh Fard and Hossein Rahmani*
School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran.
Article History:
Fraud in financial data is a significant concern for both businesses and
Received 09 October 2023 individuals. Credit card transactions involve numerous features, some
Revised 22 November 2023 of which may lack relevance for classifiers and could lead to
Accepted 24 January 2024
overfitting. A pivotal step in the fraud detection process is feature
DOI:10.22044/jadm.2024.13630.2480 selection, which profoundly impacts model accuracy and execution
Keywords:
time. In this paper, we introduce an ensemble-based, explainable
Fraud Detection, Machine feature selection framework founded on SHAP and LIME algorithms,
Learning, Feature Selection, called "X-SHAoLIM". We applied our framework to diverse
Ensemble Learning, Explainable combinations of the best models from previous studies, conducting
AI, Data Mining. both quantitative and qualitative comparisons with other feature
selection methods. The quantitative evaluation of the "X-SHAoLIM"
*Corresponding author: framework across various model combinations revealed consistent
h_rahmani@iust.ac.ir (H. Rahmani).
accuracy improvements on average including increases in Precision
(+5.6), Recall (+1.5), F1-Score (+3.5), and AUC-PR (+6.75). Beyond
enhanced accuracy, our proposed framework, leveraging explainable
algorithms like SHAP and LIME, provides a deeper understanding of
features' importance in model predictions, delivering effective
explanations to system users.
1. Introduction
In the recent years, numerous studies have explored domains with sensitive data. Clear explanations for
the use of machine learning methods to identify and system users are crucial, emerging as an ethical and
prevent fraudulent transactions [1,2]. Some of the legal imperative in many applications [8, 9].
fundamental challenges in detecting financial fraud To address the challenges mentioned earlier, we
are outlined below [3-7]: introduce an "ensemble-based explainable feature
1. Limited access to real-world data sets selection framework" known as "X-SHAoLIM.".
2. Imbalanced class distribution We employ an ensemble approach and thoroughly
3. Feature engineering assess its effectiveness across various
4. Feature selection combinations of the best models in the state-of-the-
5. Sequence modeling art.
6. Explainability SHAP and LIME are two key explainability
Credit card transactions typically have a large algorithms. The SHAP (SHapley Additive
number of features. Some features may not be exPlanations) algorithm, developed by Lundberg
meaningful to the classifiers or lead to overfitting and Lee in 2017, employs concepts from game
(features that have many categorical values or are theory to provide localized explanations for
too sparse). Additionally, the feature selection step forecasting models. In the context of game theory,
can enhance both the speed and performance of the model serves as the rules of the game, and
classifiers [7]. features are akin to potential players. SHAP
Explaining the operations of complex models poses calculates the Shapley value by evaluating the
a significant challenge, especially in security model across various combinations of input
Rahmani & Alizadeh Frad/ Journal of AI and Data Mining, x(x): xxx-xxx, xxxx
features, quantifying the average difference in fraud detection, and demonstrated robust
predictions when a specific feature is present performance even in the presence of noisy data.
versus when it is absent—a measure that reveals Figuerola et al. [19] investigated the performance
the contribution of each feature to the model's of tree-based ensemble learning algorithms in
prediction [10, 11]. detecting fraudulent transactions. They specifically
On the other hand, the LIME (Local Interpretable examined random forest, bagging, XGBoost,
Model-agnostic Explanations) algorithm serves as LightGBM, and CatBoost classifiers. They used
a local explainability method by systematically ANOVA for the feature selection step. Their
adjusting input parameters and observing resulting findings revealed that boosting classifiers
changes in the output. This approach enhances outperformed bagging classifiers in fraud
understanding of the model's predictions, detection, with LightGBM achieving the most
pinpointing, which input variables significantly favorable results across multiple metrics including
influenced the outcome for a specific sample [10]. F1, MCC, and AUC-PR.
From a technical perspective, LIME generates a Since raw input features are not sufficient to detect
new dataset centered around the examined sample, fraudulent transactions, feature engineering
obtains model predictions for these perturbed strategies have been proposed. Feature selection
instances, and subsequently trains an interpretable both removes redundant features and increases
model, such as linear regression, on this augmented learning accuracy [15, 20]. Saheed et al. [21] dealt
dataset. The process assigns greater weight to with fraud detection using a genetic algorithm as a
instances closer to the original sample, providing feature selection method. They first selected the top
local explainability for the analyzed data point 8 features, and used those features in NB, RF, and
[12]. SVM algorithms for fraud detection on the German
The structure of this paper is as what follows. credit card dataset. The experimental results
Section 2 reviews prior research across ensemble showed that the random forest performs better than
learning, feature engineering, and explainability. In NB and SVM.
Section 3, we describe our fraud detection process. Emmanuel Ileberi et al. [22] introduces a Genetic
The main contribution of our paper is in the feature Algorithm (GA)-based feature selection method
selection stage. Section 4 evaluates our framework combined with Random Forest (RF), Decision Tree
for quantitative and qualitative comparisons with (DT), Artificial Neural Network (ANN), Naive
other feature selection algorithms like ANOVA, Bayes (NB), and Linear Regression (LR)
random forest, and XGBoost. Section 5 includes a classifiers for credit card fraud detection. Results
summary and conclusion, and proposes promising on a European cardholders dataset reveal superior
directions for future research. performance, with GA-RF achieving 99.98%
accuracy. Validation results demonstrate GA-DT's
2. Background 100% accuracy and GA-ANN's AUC of 0.94,
In this section, we review previous works in the showcasing the framework's effectiveness for fraud
field of ensemble learning, feature engineering, and detection. Bharat Padhi et al. [23] addressed
explainability. Ensemble learning models, challenges in credit card fraud detection by
comprising multiple sub-models, have consistently proposing a novel feature selection method, Rock
demonstrated superior performance when Hyrax Swarm Optimization Feature Selection
compared to individual models such as logistic (RHSOFS), inspired by natural swarm behavior.
regression, artificial neural networks, support Employing supervised machine learning, the
vector machines, and k-nearest neighbors [13]. approach enhances fraud identification by selecting
Several studies have found that random forests are optimal features from high-dimensional datasets
among the most effective ensemble methods [14- generated from European cardholder dataset. In a
17]. comparative analysis, RHSOFS surpasses existing
Randhawa et al. [18] initially employed standalone methods including Differential Evolutionary
standard models for credit card fraud detection. Feature Selection (DEFS), Genetic Algorithm
Subsequently, they explored the combination of Feature Selection (GAFS), Particle Swarm
models using the AdaBoost and Majority Vote Optimization Feature Selection (PSOFS), and Ant
techniques. To assess the algorithms' resilience Colony Optimization Feature Selection (ACOFS),
against noisy data, noisy data samples were demonstrating superior efficiency. SHAP and
introduced. The experimental outcomes ultimately LIME algorithms are among the explainability
indicated that the Majority Voting method algorithms utilized in various studies [24-29].
exhibited commendable accuracy in credit card
X-SHAoLIM: Novel Feature Selection Framework for Credit Card Fraud Detection
The attributes “trans_num”, “unix_time”, Figure 4. Top 10 frequent values of category for fraud (in
“merch_lat”, and “merch-long” were removed percentage). Approximately 50% of the fraudulent
transactions belong to the "grocery_pos" or
initially, and the attribute “trans_date_trans_time” "shopping_net" category.
was subsequently removed after extracting
meaningful features from it. According to Figure 3, 3.3. Sampling
the average amount of fraudulent transactions was In order to train and evaluate our models, we
$530, while the average amount of legal divided the original dataset into the following three
transactions was $67. sets in a stratified manner:
Train set (including 60% of the original
set)
Test set (including 20% of original
datasets)
Validation set (including 20% of the
original set)
After dividing the original dataset into training,
testing, and validation sets, we performed sampling
only on the training set. In order to create the
possibility of comparison with the same conditions,
Figure 3. The amount of fraudulent transactions is nearly
10 times that of legitimate transactions.
we follow Figuerola et al. [18] sampling approach,
X-SHAoLIM: Novel Feature Selection Framework for Credit Card Fraud Detection
and combine RUS and Borderline-SMOTE (both algorithms form the main cores of the
methods. Initially, the samples of the majority class proposed framework and we used them in two
were reduced to 20 times the number of samples in different stages).
the minority class. Subsequently, the samples of We introduce our proposed framework as an
the minority class were increased to 90% of the "ensemble-based explainable framework" called
number of samples in the majority class. Figure 5 "X-SHAoLIM". We used the word "framework"
illustrates the resulting imbalance ratio because we have actually introduced a new
(legal/fraudulent) after sampling. "structure" for feature selection. In every structure,
the main components and how they interact and
aggregate should be specified. Also, any structure
should have the necessary flexibility to be used in
any problem. Considering the previous two points,
our framework includes three main components
(according to Figure 6).
We called it “X-SHAoLIM” because SHAP and It is possible to combine other feature selection
LIME are the core algorithms of this framework algorithms in the candidate selection stage and
and the first “X” character refers to the flexibility various base models for both SHAP and LIME.
of this framework. Figure 6 shows how we used the X-SHAoLIM
framework in our fraud detection process.
Figure 6. In our fraud detection process, we used X-SHAoLIM framework with above configuration in three stages (FS
refers to Feature Selection algorithm and BM refers to Base Model).
X-SHAoLIM: Novel Feature Selection Framework for Credit Card Fraud Detection
Table 3. Results of feature selection algorithms based on four evaluation metrics, highlighting X-SHAoLIM as the top
performer.
RF + XGB
+ LGBM
91 86 89 86 94 87 90 90 95 88 91 90 97(+6) 88(+2) 92(+3) 93(+7)
RF + XGB
+ KNN
91 85 88 83 93 86 89 86 95 86 90 88 97(+6) 86(+1) 91(+3) 90(+7)
RF +
LGBM + 92 84 88 84 94 85 89 88 96 86 90 89 98(+6) 86(+2) 91(+3) 91(+7)
KNN
XGB +
LGBM + 85 90 87 85 88 90 89 88 90 91 90 89 93(+8) 91(+1) 92(+5) 91(+6)
KNN
amount of transactions between the fraudulent and selected features of XGBoost algorithm. The
legitimate labels. analysis of sum_amt_last_60_days attribute values
According to Table 5, both the “amt” and shows that, unlike the amt attribute, there is no high
“sum_amt_last_7_days” features emerge as the difference between the average values of this
most important features across all three algorithms. attribute in fraudulent and legitimate transactions
Furthermore, the analysis of the (Figure 8).
“trans_hour_category” feature values in Figure 7
reveals variations between different times of the
day and night in terms of the occurrence of fraud.
sum_amt_last_7
1 amt amt
_days
sum_amt_last_7_day sum_amt_las
2 amt
s t_7_days
Figure 8. Average of “sum_amt_last_60_days” per class
sum_amt_last_1 sum_amt_last_14_da
3 category label. Note the smaller gap between fraudulent and
4_days ys
legitimate values compared to the “amt” feature.
sum_amt_last_3 trans_hour_c
4 diff_amt
0_days ategory
sum_amt_last_30_da
In the next phase, we examine the performance of
5 merchant
ys
diff_amt the “X-SHAoLIM” framework. As depicted in
6 category category full_name
Figure 9, similar to the XGBoost algorithm, the
“trans_hour_category” feature is among the top 5
7
sum_amt_last_6 sum_amt_last_60_da sum_amt_las features of “X-SHAoLIM”, while the
0_days ys t_14_days
“sum_amt_last_60_days” feature does not make it
8 street merchant merchant into the top 10 features. Additionally, unlike the
9 full_name trans_hour_category street
previous three algorithms, the “full_name” feature
is not included among the final features.
elapsed_time According to Figure 9, analysis of the Filtering
10 city full_name
_seconds
component revealed that, in both false positive and
It's worth noting that, unlike other algorithms, this false negative samples, the “street” feature has the
feature does not appear among the top 10 features most negative effect on these cases. Consequently,
of the ANOVA algorithm (weakness of the it was removed from the set of selected features.
ANOVA algorithm). Interestingly, this feature was among the top 10
features in the ANOVA and XGBoost algorithms.
Furthermore, as shown in Figure 10, both the
“street” and “full_name” features have nearly 1000
unique values, and these attributes are not included
in the “X-SHAoLIM” attribute set.
Figure 9. Traversal of features in the stages of the X-SHAoLIM framework. The street feature was identified in the filtering
stage and removed from the final feature set.
application of our proposed framework to different
datasets, diverse models, and varied combinations
of models. Additionally, comparative studies with
other feature selection algorithms can provide
deeper insights into its performance and versatility.
References
[1] S. Mittal and S. Tyagi, “Performance Evaluation of
Machine Learning Algorithms for Credit Card Fraud
Detection,” IEEE Xplore, Jan. 01, 2019.
[2] S. Beigi and M. R. Amin Naseri, “Credit Card Fraud
Detection using Data mining and Statistical
Figure 10. Number of unique values for “street” and Methods,” Journal of AI and Data Mining, vol. 8, no. 2,
“full_name” features. They have an excessive number of pp. 149–160, Apr. 2020.
unique values and are not included in the X-SHAoLIM
[3] R. Yan, Y. Liu, R. Jin, and A. Hauptmann, “On
feature set.
predicting rare classes with SVM ensembles in scene
classification,” IEEE Xplore, Apr. 01, 2003.
The quantitative evaluation of the “X-SHAoLIM”
framework across diverse model combinations has [4] N. V. Chawla, N. Japkowicz, and A. Kotcz,
demonstrated substantial enhancements in model “Editorial,” ACM SIGKDD Explorations Newsletter,
accuracy. Notably, we have observed significant Vol. 6, No. 1, p. 1, Jun. 2004.
improvements in Precision (+5.6), Recall (+1.5), [5] Y. Russac, O. Caelen, and L. He-Guelton,
F1-Score (+3.5), and AUC-PR (+6.75) compared “Embeddings of Categorical Variables for Sequential
to other feature selection algorithms, establishing Data in Fraud Context,” The International Conference
its superiority. Furthermore, this paper has on Advanced Machine Learning Technologies and
underscored the value of incorporating SHAP and Applications (AMLTA2018), pp. 542–552, 2018.
LIME explainability algorithms into the feature [6] C. Guo and F. Berkhahn, “Entity Embeddings of
selection process. Beyond enhancing model Categorical Variables,” arXiv.org, Apr. 22, 2016.
performance, these algorithms offer effective [7] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C.
explanations of model behavior, adding an Westland, “Data mining for credit card fraud: A
invaluable layer of transparency and comparative study,” Decision Support Systems, Vol. 50,
interpretability to the fraud detection process. No. 3, pp. 602–613, Feb. 2011.
Looking ahead, future works can explore the
Rahmani & Alizadeh Fard/ Journal of AI and Data Mining, x(x): xxx-xxx, xxxx
[8] A. Adadi and M. Berrada, “Peeking Inside the Black- [19] E. Figuerola Ullastres, “Credit Card Fraud
Box: A Survey on Explainable Artificial Intelligence Detection using Ensemble Learning
(XAI),” IEEE Access, Vol. 6, pp. 52138–52160, 2018. Algorithms,” norma.ncirl.ie. May 30, 2022.
[9] Małgorzata Magdziarczyk, “RIGHT TO BE [20] A. Correa Bahnsen, D. Aouada, A. Stojanovic, and
FORGOTTEN IN LIGHT OF REGULATION (EU) B. Ottersten, “Feature engineering strategies for credit
2016/679 OF THE EUROPEAN PARLIAMENT AND card fraud detection,” Expert Systems with Applications,
OF THE COUNCIL OF 27 APRIL 2016 ON THE Vol. 51, pp. 134–142, Jun. 2016.
PROTECTION OF NATURAL PERSONS WITH
[21] Y. K. Saheed, M. A. Hambali, M. O. Arowolo, and
REGARD TO THE PROCESSING OF PERSONAL
Y. A. Olasupo, “Application of GA Feature Selection on
DATA AND ON THE FREE MOVEMENT OF SUCH
Naive Bayes, Random Forest and SVM for Credit Card
DATA, AND REPEALING DIRECTIVE
Fraud Detection,” 2020 International Conference on
95/46/EC,” SGEM International Multidisciplinary
Decision Aid Sciences and Application (DASA), Nov.
Scientific Conferences on Social Sciences and Arts, Apr.
2020.
2019.
[22]. E. Ileberi, Y. Sun, and Z. Wang, “A machine
[10] R. Omobolaji Alabi, A. Almangush, M. Elmusrati, learning based credit card fraud detection using the GA
I. Leivo, and A. A. Mäkitie, “An interpretable machine algorithm for feature selection,” Journal of Big Data,
learning prognostic system for risk stratification in Vol. 9, No. 1, Feb. 2022.
oropharyngeal cancer,” International Journal of
[23] Bharat Kumar Padhi, S. Chakravarty, B. Naik,
Medical Informatics, Vol. 168, p. 104896, Dec. 2022.
Radha Mohan Pattanayak, and H. Das, “RHSOFS:
[11] Rasheed Omobolaji Alabi, M. Elmusrati, Ilmo Feature Selection Using the Rock Hyrax Swarm
Leivo, Alhadi Almangush, and Antti Mäkitie, “Machine Optimization Algorithm for Credit Card Fraud
learning explainability in nasopharyngeal cancer Detection System,” Vol. 22, No. 23, pp. 9321–9321,
survival using LIME and SHAP,” Scientific Reports, Nov. 2022.
Vol. 13, No. 1, Jun. 2023.
[24] van, Process Mining Handbook. Springer Nature.
[12] A. Gramegna and P. Giudici, “SHAP and LIME:
An Evaluation of Discriminative Power in Credit [25] W. Rizzi, C. Di Francescomarino, and F. M. Maggi,
Risk,” Frontiers in Artificial Intelligence, Vol. 4, Sep. “Explainability in Predictive Process Monitoring: When
2021. Understanding Helps Improving,” Lecture Notes in
Business Information Processing, pp. 141–158, 2020.
[13] I. Sohony, R. Pratap, and U. Nambiar, “Ensemble
[26] R. Sindhgatta, C. Ouyang, and C. Moreira,
learning for credit card fraud detection,” Proceedings of
the ACM India Joint International Conference on Data “Exploring Interpretability for Predictive Process
Science and Management of Data - CoDS-COMAD ’18, Analytics,” Service-Oriented Computing, pp. 439–447,
2018. 2020.
[27] I. Psychoula, A. Gutmann, P. Mainali, S. H. Lee, P.
[14] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S.
Dunphy, and F. Petitcolas, “Explainable Machine
Waterschoot, and G. Bontempi, “Learned lessons in
credit card fraud detection from a practitioner Learning for Fraud Detection,” Computer, vol. 54, no.
perspective,” Expert Systems with Applications, Vol. 41, 10, pp. 49–59, Oct. 2021.
No. 10, pp. 4915–4928, Aug. 2014. [28] W. E. Marcilio and D. M. Eler, “From explanations
to feature selection: assessing SHAP values as feature
[15] V. Van Vlasselaer et al., “APATE: A novel
selection mechanism,” 2020 33rd SIBGRAPI
approach for automated credit card transaction fraud
Conference on Graphics, Patterns and Images
detection using network-based extensions,” Decision
Support Systems, vol. 75, pp. 38–48, Jul. 2015. (SIBGRAPI), Nov. 2020.
[16] M. Zareapoor and P. Shamsolmoali, “Application [29] T.-Y. Wu and Y.-T. Wang, “Locally Interpretable
One-Class Anomaly Detection for Credit Card Fraud
of Credit Card Fraud Detection: based on Bagging
Detection,” IEEE Xplore, Nov. 01, 2021.
Ensemble Classifier,” Procedia Computer Science, Vol.
48, pp. 679–685, 2015. [30] Kaggle, “Credit Card Fraud Detection,”
[17] S. D. Penmetsa and S. Mohammed, “Ensemble www.kaggle.com, 2018.
Techniques for Credit Card Fraud https://www.kaggle.com/datasets/mlg-
ulb/creditcardfraud.
Detection,” International Journal of Smart Business and
Technology, Vol. 9, No. 2, pp. 33–48, Sep. 2021.
[31] K. Shenoy, “Credit Card Transactions Fraud
[18] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and Detection Dataset,” Kaggle.com, 2019.
A. K. Nandi, “Credit Card Fraud Detection using https://www.kaggle.com/datasets/kartik2112/fraud-
AdaBoost and Majority Voting,” IEEE Access, Vol. 6, detection.
pp. 14277–14284, 2018.
مجله هوش مصنوعی و دادهکاوی ،شماره ،xسال .x x x x رحمانی و علیزاده فرد
:X-SHAoLIMیک چارچوب انتخاب ویژگی جدید بهمنظور کشف تقلب در تراکنشهای کارتهای
اعتباری
*
سجاد علیزاده فرد و حسین رحمانی
چکیده:
تقلب در دادههای مالی یک نگرانی جدی برای سازمانهای تجاری و افراد است .تراکنشهای کارتهای اعتباری ویژگیهای متعددی دارند که برخی از
آنها ممکن است برای ردهبندها ،نامرتبط باشند و منجر به بیش برازش شوند .یک مرحله اساسی در فرآیند کشف تقلب ،مرحله انتخاب ویژگیها است
که بهشدت بر دقت و زمان اجرای مدلها مؤثر است .ما در این مقاله ،به ارائه یک چارچوب انتخاب ویژگی جمعی و توضیحپذیر مبتنی بر الگوریتمهای
SHAPو LIMEمیپردازیم (تحت عنوان .)X-SHAoLIMما چارچوب خود را بر روی ترکیبات متنوع از بهترین مدلها در کارهای پیشین اعمال کرده
و به مقایسه کمی و کیفی آن با الگوریتمهای انتخاب ویژگی رایج پرداختیم .ارزیابی کمی چارچوب « »X-SHAoLIMبر روی ترکیبات مختلف از
مدلهای منتخب نشان داد ،چارچوب پیشنهادی بهطور میانگین باعث افزایش دقت مدلها بر اساس معیارهای صحت ( ،)۵.۶+فراخوانی ( ،)1.۵+معیار
)3.۵+( F1و )۶.۷۵+( AUC-PRشده و همچنین به دلیل بهکارگیری الگوریتمهای توضیحپذیر SHAPو ،LIMEدرک عمیقتری از اهمیت ویژگیها
فراهم کرده و توضیحات مؤثری به کاربران سیستم ارائه میدهد.
کلمات کلیدی :تشخیص تقلب ،یادگیری ماشینی ،انتخاب ویژگی ،یادگیری گروهی ،هوش مصنوعی قابل توضیح ،دادهکاوی.