Genetic Algo Application On Credit Card Fraud

2020 International Conference on Decision Aid Sciences and Application (DASA)
Application of GA Feature Selection on Naïve

Bayes, Random Forest and SVM for Credit
Card Fraud Detection
Yakub K. Saheed Moshood A. Hambali Micheal O. Arowolo Yinusa A. Olasupo
Dept. of Computer Dept. of Computer Dept. of Computer Dept. of Computer
2020 International Conference on Decision Aid Sciences and Application (DASA) | 978-1-7281-9677-0/20/$31.00 ©2020 IEEE | DOI: 10.1109/DASA51403.2020.9317228
Science, Science, Science, Science,

Al-Hikmah University, Federal University Landmark University, Federal University
Ilorin, Wukari, Omu-Aran, Nigeria. Wukari,
Ilorin, Nigeria. Wukari, Nigeria. arowolo.olaolu@lmu.edu.ng Wukari, Nigeria.
yinusa@fuwukari.edu.ng
yksaheed@alhikmah.edu.ng hambali@fuwukari.edu.ng
Abstract - Credit Card Fraud (CCF) is a serious challenge CCF is concerned with the illegitimate use of credit card data for
facing credit card holder and the credit card delivering transactions. Transaction with credit card can be performed either
companies in the past decades. There are two levels CCF are digitally or physically[5]. In physical transactions, transaction is
performed, the transaction level frauds and application level perform with credit card either through physical contact or scan
frauds. This paper focuses on the application level of CCF with a device, while in digital transactions, cardholders usually
detection using Genetic Algorithm (GA) as a feature selection supply the card number, expiry date, and card verification number
technique. The GA feature selection technique is in two through phone or online. Even with the several authorization
phases, the first phase is designated as the first priority techniques put in place, CCF have not be able to stall efficiently.
features where eight (8) attributes were selected as the fittest To prevent loss from fraudsters, two mechanisms are commonly
attributes. At second stage which is referred to as the second employed: fraud detection and fraud prevention. Fraud detection
priority features where another set of eight (8) attributes were is a technique of monitoring the cardholders’ transaction activities
considered and selected. The Naïve Bayes (NB), Random with the intent of detecting whether an incoming dealing is coming
Forest (RF) and Support Vector Machine (SVM) supervised from the cardholder or fraudsters[6]. While fraud prevention is a
machine learning techniques were used for the detection of defensive technique, where fraud transaction is halt from taking
CCF on German credit card dataset which is an imbalance place in the first instance. Generally, fraud detection are of two
dataset. The experimental findings of the proposed model types: anomaly detection and misuse detection. Misuse detection
revealed that the first priority features are the most important employs classification techniques to determine an incoming
features. Also, the obtained results showed that the RF transaction whether is a fraud or not. Typically, this type of
algorithm outperformed NB and SVM in terms of accuracy, approach has a model to learn about the various existing fraud
fraud detection rate and precision. patterns. While anomaly detection develops an historical
Keyword: Credit card, Imbalance dataset, Genetic algorithm, transaction model for the behavioral profile of normal transaction
Fraud Detection, NB, SVM, Random forest of a cardholder, and decide whether an incoming transaction is a
potential fraud, once it is deviate from the usual transaction
I - INTRODUCTION pattern. Though, an anomaly detection technique requires
Banking and finance are vital sector in our day to day sufficient successive training data that will model the normal
activities, it is inevitable for a day to pass without dealing with transaction pattern of a cardholder [7].
bank either through online platform or physical transaction [1]. Fraud transactions detection using traditional approaches of
The efficiency and viability of both private and public sector has manual detection consume lot of time and inefficient, hence with
extremely raises because of the banking information system. the invention of big data, manual procedures has become more
Credit cards are extensively employed as a medium of payments impracticable. Nevertheless, financial organizations have recently
due to the general acceptability of e-commerce, internet motivated toward computational approaches for control and
technology, online banking and advancement in mobile intelligent prevention of CCF challenges. Also, with the increase in users
devices, most especially online transactions operations that are number and online transactions serious workloads has been
done through web payment gateways, such as Alipay, PayPal and brought to these systems [8].
others. As the rises in credit card transactions as the most Data mining approach is one of the outstanding techniques
dominant medium of payment for both online and offline employed in preventing and detecting CCF problem[9].
transaction, credit card fraud (CCF) rate are also increasing at very According to [10] CCF detection is the technique of classifying
alarming rate[2], [3]. the transactions into two classes of genuine and fraudulent
Financial fraud has posed serious menace that are far reaching transactions. Fraud detection on credit card is modeled on the
consequences to the individuals, corporate organizations, analysis of a card’s spending pattern. Lot of techniques have been
government and finance industry. Fraud is an unlawful or criminal employed for credit card fraudulent detection, among them are
deception with the aim of attain financial gain or personal gain[4]. artificial neural network[11], decision tree[12], frequent itemset
mining[13], genetic algorithm (GA)[11],[12], migrating birds
978-1-7281-9677-0/20/$31.00
Authorized licensed use limited to: ©2020 IEEE
Carleton University. 1091
Downloaded on May 28,2021 at 01:43:57 UTC from IEEE Xplore. Restrictions apply.
optimization algorithm [16], Naïve Bayes (NB)[17] and support while [13] used 16 relevant features out of 20 variables in their
vector machine (SVM)[18]. Logistic regression and NB study.
comparative analysis was done by [19]. Bayesian and neural
network performance was evaluated on CFF data in [20], while B. Credit Card Fraud
[21] tested for capability of decision tree, neural networks and Existing CCFs can be grouped into two types: external fraud
logistic regression in fraud detections. Ref.[22] appraised random and inner card fraud [7],[18] while a more wider classification
forests and SVM combined with logistic regression, with the effort have been presented by [10] to be three categories, namely, card
of attaining a better detection of CCF. There are number of hitches related frauds (application, account takeover, fake, counterfeit,
that are related to credit card detection, among them are: identity thief and stolen,), Internet frauds (credit card generators,
dynamism in fraudulent behavior pattern, that is, sometimes site cloning and fake sites) and merchant related frauds (merchant
fraudulent transactions seem like legitimate ones; availability of conspiracy and triangulation) [10]. It was reported in [25] that
credit card transaction datasets and highly imbalanced in nature; amount lost due to CCF every year worth $500 million US dollars.
optimal feature selection for the models; appropriate performance There are unusual prodigy that are mainly characterize credit
evaluation metric to be used on skewed CCF data [10]. Also, card transactions data. Both genuine and illicit transactions ones
nature of sampling approach also has great effect on CCF tend to share similar profile. Fraudsters innovate a new tricks to
detection performance, selection of features and detection mimic the genuine (legitimate) cardholder spending pattern.
technique(s) employed. The objective of this work is to examine Hence, the profiles of legitimate and fraudulent activities are
the effect of GA feature selection priority on selected variables of continuously dynamic. This nature of characteristic tends toward
highly skewed CCF data using NB, Random Forest (RF) and SVM reduction in the actual figure of true fraudulent cases classified in
as classifiers to evaluate the approach. a pool of credit card transactions data moving toward a highly
skewed distribution of negative class (legitimate transactions).
II. RELATED STUDY The credit card data investigated by [23] contains 0.025% positive
Credit card transactions are usually regard as a binary cases while in [16] it is below 0.005% positive cases.
classification problem. The credit card transaction is either Ref.[10] examined the performance of NB, k-nearest neighbor
categorize as a genuine transaction (negative class) or a fraudulent (KNN) and logistic regression on fraud detection of highly skewed
transaction (positive class). credit card data. European cardholders’ dataset that containing
284,807 transactions was used for the experiments. They
A. Feature selection employed hybrid technique of under-sampling and oversampling
Analysis of cardholder’s spending profile is the main to pre-process the skewed data. Their results showed that KNN
backbone of CCF detection. This spending behavior is analyzed performs better than other approaches, the accuracy obtained for
by selecting optimal features that capture the uniqueness of a NB, KNN and logistic regression classifiers are 97.92%, 97.69%
credit card profile. The transactions profile (both legitimate and and 54.86% respectively.
fraudulent) tend to be dynamically changing. Hence, optimal Ref.[26] proposed Enhanced LINGO clustering algorithm for
feature selection approach that are greatly differentiates between Fraud Miner. This improvement involving the replacement of
both profiles is required to accomplish efficient credit card Apriori algorithm employed in Fraud Miner with Frequently
transaction classification. The selected features that represent the Pattern creation in LINGO clustering algorithm, and summarize
card usage profiles and algorithms used, determine the CCF customer’s profile either within his genuine or fraud transactions.
detection systems’ performance. These features are derivative of The results of their simulated test transactions showed that LINGO
both transactions (legitimate and fraudulent) and historical produced significant summarized patterns better than the outcome
transaction of a credit card. These features can be grouped into of Apriori Algorithm.
five major types, namely all transactions statistics, merchant type Ref.[27] employed hybrid techniques of AdaBoost and
statistics, regional statistics, time-based amount statistics and majority voting approaches to evaluate the efficacy of the model.
time-based number of transactions statistics [10], [23]. Publicly available credit card dataset in addition to a real-world
The features that represent the general profile of card usage credit card dataset obtained from a financial institution was
are referred to as all transactions statistics type. The features that analyzed. The best MCCscore of 0.823 was achieved in majority
taken into account, the spending profiles of the card in relation to voting approach.
the geographical regions are known as regional statistics type. Ref.[7] examined the performance of two types of random
Merchant statistics type are refer to the features that demonstrate forest models on a real-life B2C credit card dataset transactions.
the card usage with different merchant types. Time-based statistics The models are Random-tree-based random forest (I) and CART-
types are the features that identify the profile of the card usage based random forest (II). Their results showed that Random Forest
with respect to the amounts against time ranges or usage I yielded 91.96% accuracy while Random Forest II yielded
frequencies against time ranges. Cardholder profile is the major 96.77% accuracy.
focus of most of the literature rather than card profile. It is obvious Ref.[25] proposed a hypothetical framework for novel data
that an individual can use multiple credit cards for several mining algorithm known as condensed nearest neighbor (CNN)
purposes. Thus, one can exhibit variant pattern of spending algorithm to detect the CCF. CNN algorithm is a nonparametric
behavior on such cards. In this study, we focus more on card technique of classification with the aims of forming condensed set
profile rather than cardholder alone since a single credit card can for retaining the samples that are vital for decision making. It
only exhibit a distinct spending profile while a cardholder can targets is to minimize the number of attributes for comparison,
exhibit various behavior pattern on different cards. Ref.[24] used therefore creating a condensed training set with improvement in
a total of 30 features in their study, 27 features was used in [23] query and memory requirements.
1092
Authorized licensed use limited to: Carleton University. Downloaded on May 28,2021 at 01:43:57 UTC from IEEE Xplore. Restrictions apply.
dataset from the UCI repository is very partial [28]. The dataset
comprises 1000 instances (credit contenders) and 21 features (with
seven numerical, 14 definite/insignificant). With each instance
define the credit position of a specific member, either good or bad,
III. RESEARCH METHODOLOGY access in the dataset signifies an individual who takes credit from
This research work is presented in three phases. Firstly, the a bank. Every individual is classified as good or bad credit with
respect to a set of features.
dataset was collected from the UCI repository. Secondly, a data
pre-processing was carried out to remove redundancy in the B. Feature Selection
dataset. The third phase of CCF detection is applying three
machine learning techniques. This research work goal is to Feature selection (FS) method reduces the redundant feature
improve the machine learning model to distinguish CCF based on in the dataset, therefore improving the learning performance [25].
GA as a feature selection method. The NB, RF and SVM are used FS method minimizes the impact of irrelevant features in CCFs
to evaluate the performance of the GA on a German credit dataset. and enhance fraud detection rate.
Figure 1 presents the framework of the proposed system and
summarized the approach used in this work. C. Genetic Algorithm
GA technique is very popular technique in evolutionary
computation research. It is a replica of a natural selection
[26],[27]. It is applied widely in business, engineering and many
German Credit Data Preprocessing other domains. The approach ideal is to get an optimal solution to
card Dataset a problem [28],[29]. GA consists of three essential operators, that
is, selection, crossover and mutation. Selection distinguishes the
most fitted individuals in the population set available, based on
fitness function[33]. Crossover merges the second half of the
original record with the first half of the second record. Mutation
randomly exchanges the 0’s with 1’s bit and vice versa.
Perform Feature Selection with GA The steps involved in GA is as follow:
Step 1: Create random chromosome ‘n’ population each
specifying a dissimilar response to the problem.
Step 2: Estimate the fitness of each ‘x’ chromosome.
Step 3: Generate new population to the point of completion of new
population.
GA fitted features selected into two priorities Selection: Select higher fitness value dependent on the
chromosome of two parents.
a. Crossover: To generate new offspring for the parents to
cross over. It can be multi-point or one point.
b. Mutation: Concerning mutation, a few bits flip
haphazardly to transform new offspring.
Training set Train NB,RF and SVM model
c. The new population in the new population is placed in
the acceptance phase.
Step 4: In the replacement step, use a new population to keep the
calculation running.
Step 5: If the last condition is met, stop the phase in the testing
phase and restore the current population's optimum arrangement.
Step 6: Go to step 2 in the looping phase.
Test set
Predict the class label of the set The GA parameter is configured as follows.
Population magnitude: 20
Sum of generations: 20
Probability of crossover: 0.6
Probability of mutation: 0.033
Performance evaluation
Account occurrence: 20
Arbitrary amount seed: 1
D. Naïve Bayes
Figure1: Framework of the proposed credit card fraudulent model
Naive Bayes (NB) classification is built on the approach of
A. German Dataset Bayesian theorem posterior probability. NB model works
excellently most especially when predictors have independent
German credit dataset has been utilized in this learning to classes. Though, sometimes it did well, even when predictors have
classify the transactions into genuine or fraud. The available
1093
no distinctive independent class[34]. NB technique has two stages were considered and selected. Table 1 and Table 2 depicts the
of classified data. The first phase is known as the learning stage information about the features selected.
where the training dataset is input to the model and compute the
parameters of a probability distribution, with the notion that The priority features in the dataset are considered as the most
predictors are conditionally independent. The second stage is important feature in the dataset
prediction phase, where the unfamiliar data (test dataset) is supply
into NB classifier to predict and evaluate the posterior probability
of individual classes. Subsequently, the test dataset is classified Table 1: First Priority Features
based on the highest posterior probability.
NB is good method for a huge dataset, because no complex Features
Features Type
iterative parameter approximation involve [35]. It performs Number
efficiently in both the training and classification stages [36]. It
assumes that all features in the feature vector are equally 1 Position of a prevailing Nom
independent and essential [33]. checking account
2 Duration in month Num

E. Support Vector Machine
4 Purpose of the credit Nom
SVM is used to concept hyperplane to maximize the distance
between two types of structure. It gives a generalization power 5 Credit amount Num
from a small parameter [34].
6 Savings _status Nom
F. Performance Metrics
In this research work, positives class (P) denotes the number 9 Personal status Nom
of fraud transactions and negatives (N) means many non-fraud that
is, genuine transaction. 18 Num dependents Num
Precision (Rate of Hit): provides the accuracy in cases classified 20 Foreign worker Nom
as positive.
Where Nom means Nominal and Num means Numerical
Precision = (1) The GA is also used to selects another set of eight (8) features
( ) in the second stage. The selected attributes are shown in Table 2.
Recall (sensitivity or rate of fraud detection): Provides the Table 2: Second Priority Features
accuracy of positive (fake) case classification.
Features
Features Type
Recall (Sensitivity) = (2) Number
( )
4 Purpose of the credit Nom
Specificity: provides the accuracy of the classification of the non-
fraud case. 5 Credit amount Num
6 Savings _status Nom

Specificity = (3)
( )
7 Employment Nom
Accuracy: is the number of correct predictions of actual
application divided by the total number of predictions. 8 Repayment rate in Num
percentage of non-
( ) refundable income
Accuracy = (4)
( )
12 Property (example real Nom
Note: TP –True Positive, TN - True Negative, FP – False Positive, estate)
FN – False Negative
15 Housing (own or rent) Nom
G. Experimental Setup 18 Number of individuals Num
In this experiment, the German credit card data was being accountable to
considered for this research work. An efficient feature selection deliver upkeep for
based on Genetic Algorithm (GA) is proposed. The GA feature Where Nom means Nominal and Num means Numerical
selection technique is in two phases; the first phase is designated
as the first priority features where eight (8) attributes were selected IV. EXPERIMENTAL ANALYSIS OF THE PROPOSED
as the fittest attributes. The second stage is referred to as the MODEL
second priority feature, where another set of eight (8) attributes
1094
The amount of fraud cases is commonly minimal as related to We performed experimental analysis on the second priority
the total amount of all transactions. The experiments performance features in Table 2 using the classifiers NB, RF and SVM, as
evaluation were carryout using ten folds cross-validation methods presented in Table 4. The NB, RF and SVM classifiers learn from
since the dataset is imbalanced. the training set of data and are also used on the test dataset to
classify and categorize the data into fraud and genuine
transactions. The results are presented in Table 4.
As shown in Table 4, the NB gave a sensitivity of 64.2%, a

specificity of 49.1%, and a precision of 58.2% and 64.2%
A. Experimental Results Analysis on the GA fitted accuracy. Whereas, the RF gave a sensitivity of 60%, specificity
First Priority Features. of 46.6%, precision of 56.1% and accuracy of 60%. The SVM
gave a sensitivity of 59%, specificity of 52.8%, precision of 56.2%
We performed experimental analysis on the first priority
and accuracy of 59%.
features in Table 1 using the classifiers NB, RF and SVM, as
presented in Table 3. Table 4. Performance measures of the second priority features
Table 3: Performance measures of the first priority features Performance
NB RF SVM
metrics
Performance
NB RF SVM
metrics Sensitivity 64.2 60.0 59.0
Sensitivity 94.3 96.4 96.3 Specificity 49.1 46.6 52.8
Specificity 96.7 96.3 96.3 Precision 58.2 56.1 56.2
Precision 94.7 96.5 92.7 Accuracy 64.2 60.0 59.0
Accuracy 94.3 96.40 96.3
Comparison of First and Second

As shown in Table 3, the NB gave a sensitivity of 94.3%, a
specificity of 96.7%, and a precision of 94.7% and 94.3%
features Priority for Precision,
accuracy. Whereas, the RF gave a sensitivity of 96.4%, specificity Specificity and Senstivity
of 96.3%, precision of 96.5% and accuracy of 96.40%. The SVM
gave a sensitivity of 96.3%, 96.3% specificity, and precision of
Sensitivity Specificity Precision
92.7% and accuracy of 96.3%. Therefore, RF outperformed other 2nd Priority

algorithms used in term of sensitivity, precision and accuracy
while NB has highest specificity. 1st Priority
2nd Priority
Classification Accuracy for the
1st Priority
two Priorities
2nd Priority
SVM 1st Priority
RF 0 20 40 60 80 100 120
SVM RF NB
NB
0 20 40 60 80 100 120 Figure 3: Performances comparison for the two features

priorities
2nd Priority 1st Priority
The NB classifier outperforms RF and SVM in terms of
sensitivity, precision and accuracy. The SVM has the highest
Figure 2. Classification Accuracy comparison for the two features specificity when compared to NB and RF. SVM trade-offs its
priorities sensitivity, precision and accuracy.
B. Experimental Results Analysis on the GA fitted Figure 2 shows comparison of the two approaches based on
Second Priority Features. accuracy and it revealed that first priority performed better than
second priority. Also, figure 3 presented comparison based on
1095
specificity, sensitivity and precision. Therefore, the results for credit card fraud detection using multi-perspective
revealed that the first priority features that are fitted by GA are the HMMs,” Futur. Gener. Comput. Syst., vol. 102, pp. 393–
essential features in the Credit Card fraudulent imbalance data. 402, 2020.
V. CONCLUSION [10] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare,

“Credit card fraud detection using Machine Learning
With the recent advances in technology, credit cards have Techniques : A Comparative Analysis,” in International
become a popular means of making payments. Due to security Conference on Computing Networking and Informatics
deficiencies in the operation, fraud has become a rising phase (ICCNI), 2017, pp. 1–9.
resulting in losing millions of dollar yearly. Therefore, necessitate
fraud detection and prevention approach to minimize the fraud rate [11] F. N. Ogwueleka, “Data Mining Application in Credit
in credit card payment. This study presented CCF detection using Card Fraud Detection System,” J. Eng. Sci. Technol., vol.
machine learning approach. The proposed technique used Genetic 6, no. 3, pp. 311 – 322, 2011.
algorithm feature selection priority to enhance classification
algorithm. The results obtained revealed that RF performed better [12] S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane, and
than other classifiers based on accuracy in the first priority feature R. Badgujar, “Credit Card Fraud Detection Using
selection. Also, first priority feature selection is the best approach Decision Tree Induction Algorithm,” Int. J. Comput. Sci.
recommended for credit card detection system. Our future work Mob. Comput., vol. 4, no. 4, pp. 92–95, 2015.
will focus on proposing other selecting features for learning [13] K. R. Seeja and M. Zareapoor, “FraudMiner: A Novel
approaches such as the embedded feature selection method.
Credit Card Fraud Detection Model Based on Frequent
REFERENCES Itemset Mining,” Sci. World Journal, Hindawi Publ., vol.
Vol. 2014, pp. 1 – 10, 2014, doi: 10.1155/2014/252797.
[1] S. N. John, O. K. O, and C. G. Kennedy, “Realtime Fraud
Detection in the Banking Sector using Data Mining [14] K. RamaKalyani and D. UmaDevi, “Fraud Detection of
Techniques / Algorithm,” in International Conference on Credit Card Payment System by Genetic Algorithm,” Int.
Computational Science and Computational Intelligence, J. Sci. Eng. Res., vol. 3, no. 7, pp. 1 – 6, 2012.
2016, pp. 1186–1191, doi: 10.1109/CSCI.2016.223.
[15] P. L. Meshram and P. Bhanarkar, “Credit and ATM Card
[2] A. A. Taha and S. J. Malebary, “An Intelligent Approach Fraud Detection Using Genetic Approach,” Int. J. Eng.
to Credit Card Fraud Detection Using an Optimized Light Res. Technol., vol. 1, no. 10, pp. 1 – 5, 2012.
Gradient Boosting Machine,” IEEE Access, vol. 8, pp.
[16] E. Duman, A. Buyukkaya, and I. Elikucuk, “A novel and
25579–25587, 2020.
successful credit card fraud detection system
[3] R. Sailusha, V. Gnaneswar, R. Ramesh, and G. R. Rao, implemented in a turkish bank,” in Data Mining
“Credit Card Fraud Detection Using Machine Learning,” Workshops (ICDMW), 2013 IEEE 13th International
in Proceedings of the International Conference on Conference, 2013, pp. 162–171.
Intelligent Computing and Control Systems (ICICCS
[17] A. C. Bahnsen, A. Stojanovic, D. Aouada, and B.
2020) IEEE, 2020, no. Iciccs, pp. 1264–1270.
Ottersten, “Improving credit card fraud detection with
[4] Y. Sahin, S. Bulkan, and E. Duman, “A Cost-sensitive calibrated probabilities,” in 2014 SIAM International
decision tree approach for fraud detection,” Expert Conference on Data Mining, 2014, pp. 677–685.
Ssytems Appl., vol. 40, no. 15, pp. 5916–5923, 2013.
[18] G. Singh, R. Gupta, A. Rastogi, M. D. S. Chandel, Riyaz,
[5] A. O. Adewumi and A. A. Akinyelu, “A survey of and A., “A Machine Learning Approach for Detection of
machine-learning and nature-inspired based credit card Fraud based on SVM,” Int. J. Sci. Eng. Technol., vol. 1,
fraud detection techniques,” Int. J. Syst. Assur. Eng. no. 3, pp. 194–198, 2012.
Manag., vol. 8, no. 2, pp. 937–953, 2017.
[19] A. Y. Ng and M. I. Jordan, “On discriminative vs.
[6] E. Duman and M. H. Ozcelik, “Detecting credit card generative classifiers: A comparison of logistic
fraud by genetic algorithm and scatter search,” Expert regression and naive bayes,” Adv. Neural Inf. Process.
Syst. Appl., vol. 38, no. 10, pp. 13057–13063, 2011. Syst., vol. 2, pp. 841–848, 2002.
[7] S. Xuan, G. Liu, Z. Li, L. Zheng, S. Wang, and C. Jiang, [20] S. Maes, K. Tuyls, B. Vanschoenwinkel, and B.
“Random Forest for Credit Card Fraud Detection,” in Manderick, “Credit card fraud detection using Bayesian
IEEE 15th International Conference on Networking, and neural networks.,” in 1st international naiso
Sensing and Control (ICNSC), 2018, pp. 1–6. congress on neuro fuzzy technologies, 2002, pp. 261–
270.
[8] S. Patil, V. Nemade, and P. Soni, “Predictive Modelling
For Credit Card Fraud Detection Using Data Analytics,” [21] A. Shen, R. Tong, and Y. Deng, “Application of
Procedia Comput. Sci., vol. 132, pp. 385–395, 2018, doi: classification models on credit card fraud detection,” in
10.1016/j.procs.2018.05.199. 2007 International Conference on Service Systems and
Service Management, IEEE, 2007, pp. 1–4.
[9] Y. Lucas et al., “Towards automated feature engineering
1096
[22] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. [34] M. A. Hambali, Y. K. Saheed, T. O. Oladele, and M. D.
Westland, “Data mining for credit card fraud: A Gbolagade, “Adaboost Ensemble Algorithms for Breast
comparative study,” Decis. Support Syst., vol. 50, no. 3, Cancer,” J. Adv. Comput. Res., vol. 10, no. 2, pp. 31–52,
pp. 602–613, 2011. 2019.
[23] A. C. Bahnsen, A. Stojanovic, D. Aouada, and B. [35] K. Suresh and R. Dillibabu, “Designing a Machine
Ottersten, “Cost sensitive credit card fraud detection Learning Based Software Risk Assessment Model
using Bayes minimum risk,” in 12th International Using Naïve Bayes Algorithm,” TAGA J., vol. 14, pp.
Conference on Machine Learning and Applications 3141–3147, 2018.
(ICMLA), 2013, pp. 333–338.
[36] I. D. Dinov and I. D. Dinov, Probabilistic Learning:
[24] S. Stolfo, D. W. Fan, W. Lee, A. Prodromidis, and P. Classification Using Naive Bayes. 2018.
Chan, “Credit card fraud detection using meta-learning:
Issues and initial results,” in AAAI-97 Workshop on [37] L. Li et al., “A robust hybrid between genetic algorithm
Fraud Detection and Risk Management, 1997. and support vector machine for extracting an optimal
feature gene subset,” Genomics, vol. 85, no. 1, pp. 16–
[25] P. R. Vardhani, Y. I. Priyadarshini, Y. Narasimhulu, and 23, 2005.
Á. C. N. N. Á. Nonparametric, CNN Data Mining
Algorithm for Detecting Credit Card Fraud. Singapore.:
Springer Singapore, 2019.
[26] M. Hegazy, A. Madian, and M. Ragaie, “Enhanced Fraud

Miner : Credit Card Fraud Detection using Clustering
Data Mining Techniques,” Egypt. Comput. Sci. J., vol.
40, no. 03, pp. 72–81, 2016.
[27] K. Randhawa, C. H. U. K. Loo, and S. Member, “Credit

Card Fraud Detection Using AdaBoost and Majority
Voting,” IEEE Access, vol. 6, pp. 14277–14284, 2018,
doi: 10.1109/ACCESS.2018.2806420.
[28] D. H. Hofmann, “UCI Machine Learning Repository,”

Irvine, CA: University of California, School of
Information and Computer Science, 1994.
https://archive.ics.uci.edu/ml/datasets/statlog+(german+
credit+data).
[29] Y. Saheed and A. Babatunde, “Genetic Algorithm

Technique In Program Path Coverage For Improving
Software Testing,” vol. 7, no. 5, pp. 151–158, 2014.
[30] A. H. Hamamoto, L. F. Carvalho, L. D. H. Sampaio, T.

Abrão, and M. L. Proença, “Network Anomaly Detection
System using Genetic Algorithm and Fuzzy Logic,”
Expert Syst. Appl., vol. 92, pp. 390–402, 2018, doi:
10.1016/j.eswa.2017.09.013.
[31] Y. Zhang, P. Li, and X. Wang, “Intrusion Detection for

IoT Based on Improved Genetic Algorithm and Deep
Belief Network,” IEEE Access, vol. 7, no. c, pp. 31711–
31722, 2019, doi: 10.1109/ACCESS.2019.2903723.
[32] A. Chaudhary and G. Shrimal, “Intrusion Detection

System Based on Genetic Algorithm for Detection of
Distribution Denial of Service Attacks in MANETs,”
SSRN Electron. J., pp. 370–377, 2019, doi:
10.2139/ssrn.3351807.
[33] B. Chakrabarty, O. Chanda, and M. Saiful, “Anomaly

based Intrusion Detection System using Genetic
Algorithm and K-Centroid Clustering,” Int. J. Comput.
Appl., vol. 163, no. 11, pp. 13–17, 2017, doi:
10.5120/ijca2017913762.
1097

Genetic Algo Application On Credit Card Fraud

Uploaded by

Copyright:

Available Formats

You might also like

Genetic Algo Application On Credit Card Fraud

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Genetic Algo Application On Credit Card Fraud

Uploaded by

Copyright:

Available Formats

2020 International Conference on Decision Aid Sciences and Application (DASA)

Application of GA Feature Selection on Naïve

Science, Science, Science, Science,

2 Duration in month Num

6 Savings _status Nom

As shown in Table 4, the NB gave a sensitivity of 64.2%, a

Comparison of First and Second

92.7% and accuracy of 96.3%. Therefore, RF outperformed other 2nd Priority

SVM 1st Priority

0 20 40 60 80 100 120 Figure 3: Performances comparison for the two features

V. CONCLUSION [10] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare,

[26] M. Hegazy, A. Madian, and M. Ragaie, “Enhanced Fraud

[27] K. Randhawa, C. H. U. K. Loo, and S. Member, “Credit

[28] D. H. Hofmann, “UCI Machine Learning Repository,”

[29] Y. Saheed and A. Babatunde, “Genetic Algorithm

[30] A. H. Hamamoto, L. F. Carvalho, L. D. H. Sampaio, T.

[31] Y. Zhang, P. Li, and X. Wang, “Intrusion Detection for

[32] A. Chaudhary and G. Shrimal, “Intrusion Detection

[33] B. Chakrabarty, O. Chanda, and M. Saiful, “Anomaly

You might also like