Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Detection Of Transaction Fraud Using

Deep Learning
A project report submitted for the fulfilment of
the
Bachelor Of Technology Degree
In
Computer Science and Engineering
Under
Maulana Abul Kalam Azad University Of
Technology
Under the supervision of
Prof. Bavrabi Ghosh

Submitted by:
Adrineel Saha
(university roll no. 10400319150,
reg. No. 014117 )
Ayush Kumar Jha
(university roll. No. 10400719072,
reg. No. 012073)
Academic session: 2019-2023
Department of Computer Science &
Engineering

Institute of Engineering &


Management
Y-12, Salt Lake, Sector V, Kolkata, Pin
700091,West Bengal, India
Affiliated To
Maulana Abul Kalam Azad University
of Technologies, West Bengal
BF 142, BF Block, Sector 1, Kolkata,
West Bengal 700064 June 2021
Abstract :-
Background-Online transactions are
becoming more popular in present situation
where the globe is facing an unknown disease
COVID-19. Now authorities of Countries
requested peoples to use cashless
transaction as far as possible. Practically it is
not always possible to use it in all
transactions. Since number of such cashless
transactions have been increasing during
lockdown period due to COVID-19, fraudulent
transactions are also increasing in a rapid
way. Fraud can be analysed by viewing a
series of customer transactions data that was
done in his/her previous transactions. If any
deviation is noticed by them from available
patterns. These authorities think that it is
possibly of fraudulent transaction.
Objective- For detection of fraud during
COVID-19, banks and credit card companies
are applying various methods such as data
mining , decision tree, rule based mining,
neural network, fuzzy clustering approach and
machine learning methods. These
approaches is try to find out normal usage
pattern of customers based on their past
activities. The objective of this project is to
find out such fraud transactions during such
unmanageable situation.
Method- Digital payment schemes are often
threatened by fraudulent activities. Detecting
fraud transaction in during money transfer
may save customers from financial loss.
Mobile based money transactions are focused
in this project for fraud detection. A Deep
Learning (DL) framework is suggested in this
project that monitors and detects fraudulent
activities. Implementing and applying
recurrent neural network on PaySim
generated synthetic financial dataset,
deceptive transactions are identified. The
proposed method is capable to detect
deceptive transactions with an accuracy of
99.87%, F1-Score of 0.99 and MSE of 0.01.
Keywords- Fraud Detection, Recurrent
Neural Network, PaySim, Financial
Transactions, Deep Learning.
Contents
1. Introduction
1.1 Rise in Transaction Fraud During Covid
1.2 Challenges to overcome
1.3 Objective of the proposed work Organization of the project report

2. Literature Survey

3.Proposed Framework
3.1 Data-Set Used
3.2 Proposed Methodology
3.3 Implementation

4. Implementation
4.1 Import Libraries
4.2 Load Dataset
4.3 Exploring Datset and describe numerical attributes
4.4 Check Data Type of the Features and missing values
4.5 Exploring all types of data by Plotting
4.5.1 Univariate Analysis
4.5.2 Bivariate Analysis
4.5.3 Multivariate Analysis
4.6 Machine Learning Modelling
4.6.1 Baseline
4.6.2 Logistic Regression
4.6.3 K Nearest Neighbours
4.7 Comparing Model Performance

4.8 Hyperparameter Fine Tuning

4.9 Training Process of RNN

4.10 Final model


5. Future Work

6.Conclusion

7. References

8. List of Tables
8.1 Summary of collected dataset
8.2 Architecture of proposed RNN Model
8.3 Exploring Datset and descbibe numerical attributes

9. List Of Figures
9.1 Exploring Datset and describe numerical attributes
9.2 Check Data Type of the Features and missing values
9.3 Exploring all types of data by Plotting
9.3.1 Univariate Analysis
9.3.2 Bivariate Analysis
9.3.3 Multivariate Analysis
9.4 Machine Learning Modelling
9.4.1 Baseline
9.4.2 Logistic Regression
9.4.3 K Nearest Neighbours

9.5 Comparing Model Performance

9.6 Hyperparameter Fine Tuning

9.7 Training Process of RNN

9.8 Final model


Introduction

Rise in Transaction Fraud During Covid

Fraud detection has been a rising problem during this global


pandemic situation. Ever since the war against coronavirus started it
had become absolutely clear that it also brought about a war against
digital fraud. Fraudsters are always looking to take advantage of
significant world events. The COVID-19 pandemic and its
corresponding rapid digital acceleration brought about by “stay at
home” orders was a global event unrivalled in the online age. With
the pandemic having pushed people to lean further towards
contactless payments, a lack of awareness and vulnerabilities in
confidential card details are increasing digital frauds. TrustCheckr (
Start-up by IIM Lucknow students which warns against Fraud calls)
identified over 1 million frauds together in business-to-business
(B2B) and business-to-consumer (B2C) segments in the last 15
months – 25% scams in KYC and 20% in QR codes, while B2B scams
were largely done with 30% fake identities and 25% synthetic
identity frauds. TrustCheckr’s findings are based on 350,000 data
points.

TransUnion issued a statement saying “There has been an over 28%


increase in suspected fraudulent digital transaction attempts against
businesses originating from India in the pandemic year”. The highest
number of suspect cases were being reported from Mumbai,
followed by Delhi and Chennai, according to the data. A company
statement said the rate of digital fraud attempts originating from
India against businesses increased 28.32% in the year to March 10,
2021, which marked one year of WHO declaring the COVID-19
pandemic, as against the year-ago period. the most number of
Unified Payments Interface (UPI) scams take place on payment apps
and market places, with 41% of fraud distribution accounted for by
the eastern parts of India and the states of West Bengal, Odisha,
Bihar, Assam, Kashmir, Arunachal Pradesh, Meghalaya, Tripura,
Nagaland, Mizoram, Manipur, Himachal Pradesh and Sikkim.

Challenges to Overcome

The sheer volume of fraud claims isn’t the only challenge facing fraud
management teams during the pandemic. Fraud and dispute
financial crime teams surveyed aid that increased fraud attacks
during COVID-19 had drastically impacted their operations. Survey
respondents specifically noted challenges related to obtaining skilled
staff, having insufficient resources, strained customer experience,
and disrupted cross-departmental communication channels. The
ability to find experienced staff in the fraud and compliance industry
has been a pressing issue. FICO reported that 38% of their
operational challenges stemmed from the inability to obtain skilled
and experienced staff.
The preference for E-commerce websites for purchasing various
products at a more economic or reasonable price have a positive
impact on growth of the target market. Mobile payment system
facilitate nearly any type of payments. Most merchants prefer
online system of operations for online shopping during COVID-
19. Any payment can be made if a person has online transaction
facility and has mobile. Mobile is required for receiving One Time
Password (OTP). Mobile wallets helps to increase the overall use of
mobile payment. It is found that mobile payments have reached
$194.1 billion in 2017 and $30.2 billion in 2017 as compared to $18.7
billion in 2016.
Objective Of The Proposed Work
The objective of this paper is to establish an effective and accurate
fraudulent financial mobile money transaction detection model with
high efficiency and low error rate. It utilises Deep Learning (DL)
techniques for implementing this model. These techniques are
beneficial since it automatically captures hierarchical features
present in the financial dataset. Recurrent Neural Network (RNN)
follows DL architecture which is utilised in this paper. A stacked RNN
model is proposed as a recommender system for detection of fraud
transaction. Automatic recognising of suspicious activities that
trigger illegal attempts will alarm the customers so that economic
loss can be prevented. Analysis of the proposed algorithms includes
determination of quantitative, qualitative, comparative and
complexity measures. The proposed methods have been rigorously
tested using dataset.

Literature Survey

Altab Althar Taha and Sareef Jameel Malbery described that up-
gradation in e-commerce and communication technology have made
credit card usage more popular way of payment and the fraud associated
with transactions is also increasing. They have used the optimized light
gradient boosting machine, where Bayesian based hyper-parameter
optimization are combined to tune with parameter of light gradient
boosting machine (LightGBM). In this approach they used two set of real
world public dataset consisting of fraudulent and non-fraudulent
transactions. Based on the comparison with other techniques, their
proposed system outperformed in terms of accuracy. The proposed
system produces the accuracy of 98.40%, area under receiver operating
characteristics curve (AUC) of 92.88%, Precision of 97.34% and F1-score
of 56.95% [1].
S. Makki, Z. Assaghir, Y. Taher, R. Haque, M. Hacid and H. Zeineddine
research describes that the credit card fraud cause huge financial loss.
Most of the researchers have been working on this to provide an
innovative ways to eradicate this loss and most of the available methods
are costly, time consuming and labor incentive task. The authors have
found out that the imbalanced classification of dataset is the main reason
for the inaccurate results after many experimental studies. These
imbalance classifications consist of un-balanced dataset, which caused
the model to predict inaccurate and causes the financial loss. Therefore,
they have found that LR, C5.0 decision tree algorithm, SVM and ANN
are best algorithm based on accuracy, AUCPR and sensitivity. They have
used the balanced dataset in order to train these models [2].

Debachudamani Prusti and Santhnu Kumar Rath designed an application


with applied machine learning approaches such as Decision tree (DT), k-
nearest algorithm (kNN), Extreme learning machine (ELM), Multilayer
perceptron (MLP) and support vector machine (SVM) to detect the
accuracy in fraud identification. They proposed a model by hybridizing
the DT, SVM and kNN techniques. They used two web-based protocols
such as simple object access protocol (SOAP) and Representational state
transfer (REST) for efficient exchange of data across multiple
heterogeneous platforms. They compared five machine learning
algorithm results based on accuracy metric. SVM performed better than
other algorithms by 81.63% but the hybrid system proposed by them had
higher accuracy of 82.58% [3].

In Chouiekha and El Haj’s paper [4], Convolutional Neural Networks


(CNN) are used for Fraud Detection. A database was created with 18000
artificial images of 300 customers’ activity during 60 days. They used
Customer Details Records in such a way that long conversation or an
unusual number of vouchers used would be detected. CNN is applied to
the images to detect fraudulent activity. 50% of the data set was used for
training, 25% for validation and 25% of for testing. Images have been
rescaled to improve classifier performance. The proposed Deep
CNN(DCNN) contained 7 layers with 3 Convolutional layers, 2 pooling
layers, 1 full connected layer and finally 1 SoftMax regression layer.
Results were evaluated using accuracy. Deep CNN’s performance is
compared against SVM, Random Forest and Gradient Boosting Classifier
(GBC). The results show that DCNN outperforms SVM by 5%, Random
Forest by 10% and GBC by 3%. Deep CNN was found to train almost
twice faster than the rest of the methods.

Tom Sweers in his bachelor thesis, [5] describes AutoEncoders as an


effective neural network which can encode the data as it would learn to
decode it as well. In this approach the Autoencoders are trained to non-
anomaly points, introduced to the anomaly points to classify it as ‘fraud’
or ‘no fraud’ according to the reconstruction error which is expected to
be high in the case of anomalies that the system has not been trained on.
Here, any value above the upper bound value or threshold could be
considered an anomaly.

Proposed Framework
Data-Set Used
Financial dataset simulated by Pay-Sim that identifies mobile money
transactions based on a sample of real transactions. These
transactions are collected from one month financial logs of a mobile
money service. The dataset consists of 6362620 online transaction
records during COVID-19 and each record is formulated as a
collection of several attributes. The non-numeric data present in the
dataset is transformed into numeric data. Next, all the numeric data
are scaled down into a specific range from 0 to 1. This will help in
pre-processing dataset on which proposed classifier is applied. Cash-
out and transfer type transactions are having suspicious transaction
set. The attribute ‘isFraud’ is kept as target variable of classification
procedure.
Proposed Methodology

The aim of the paper is to detect suspicious activities ofmoney


transaction during COVID-19.
Deep learning (DL) belongs to broader family of Machine Learning.
These techniques are consisting of algorithms those are inspired by
operations of human brains. The popularity of DL techniques relies
on its self-learning structure with minimal amount of processing.
Deep neural networks (DNN) are often considered as an
improvement over traditional artificial neural network (ANN) in the
sense that it incorporates multiple layers into its architecture. DNN
can learn hierarchical feature representation from the data itself by
discovering higher level feature extraction from lower level features
. Any deep modelsare thought of as multi-layer architecture that
accepts
input vector and maps them into corresponding output
labels. Recurrent Neural Network (RNN) is a kind of deep models that
allows feedback loop structure in its architecture. The word
‘recurrent’ is used since for every input of data same function is
performed and theoutput of current input depends on the previous
computation. RNN is dominant because it can model
sequences by considering inter-depending relationships in the
samples of the sequences [4]. While designing deep model, it is
necessary to consideractivation function, which is a step that maps
input signal into output signal.
A classifier model associates input data into output classes after
learning from training data. A stacked RNN based model is
proposed as classifier model that identifies transactions that may
have deceptive issues.Multiple RNN layers are stacked into a single
platform for obtaining the proposed model. Four simple RNN layers
along with four dropout layers are incorporatedinto a sequential
model. Initially neural network models are configured and training
process is started. The training process goes through one cycle and
it is known as an epoch. During this period the dataset is partitioned
into smaller sections. Finally, iterative process is executed over a
couple of batch size as a subsections of training dataset for
completing epoch execution. This entire process is inclined towards
solving binary classification problem so binary cross entropy
function is used as training criterion.
Implementation
Import Libraries:-
Numpy
Pandas
Matplotlib
Seaborn
Warnings
Sklearn
Inflection
Joblib
Scipy
Tensorflow

Load Dataset:-
fraud_0.1origbase.csv
Describe Numeric Features and check Data
Type of the Features and missing values:-
Exploring all types of data by Plotting:-
Univariate Analysis:-
Numerical Variables:-
Categorical Variables:-

Bivariate Analysis:-
The majority fraud transaction occours for the same user:-
All the fraud amount is greater than 10.000:-
60% of fraud transaction occours using cash-out-type method:-

Values greater than 100.000 occurs using transfers-type


method:-
Fraud transactions occours at least in 3 days:-

Multivariate Analysis:-
Machine Learning Modelling:-
Baseline:-
Logistic Regression:-

K Nearest Neighbours:-
Comparing Model Performance:-
HyperParameter Fine Tuning:-

Training Process of RNN:-


Final Model:-
Future Work:-
1) We will be incorporating a website
implementation for the model deployment.

Conclusion:-
With more people shopping online during the
pandemic, goods are having to be shipped and
fraudsters know this. they are targeting either
redirecting genuine orders or alternatively placing
fraudulent orders with compromised consumer
accounts to genuine customer addresses, and then
redirecting them once shipped Due to increasing
demand of mobile money transfer, it is necessary to
discover fraud activities during transactions. This is
now inevitable during COVID-19. Discovering illegal
attempts will prevent the customers to be harassed
from financial dispute. The study has been made from
the announcement of Covid-19 to first unlock period
announced by the Government. The main aim of the to
minimize fraud as far as possible. It shows that the
method is practical and is highly suitable for
implementation at the present scenario. This proposed
method is very favourable as of its applicability on large
financial dataset. An efficient and low error system is
required in the field of mobile transaction since it will
notify the customers by triggering deceptive
transactions. It is quite clear that the proposed model is
capable of recognizing suspicious transactions with
promising efficiency. This
proposed method is favourable because of its
applicability on large financial dataset. An efficient and
low error system is required in the field of mobile
transaction since it will notify the customers by
triggering deceptive transactions.

References
[1]A.A. Taha, S.J. Malebary.“An intelligent approach to credit card fraud detection
using an optimized light gradient boosting machine”. IEEE Access, 8 (2020), pp.
25579-25587, 10.1109/ACCESS.2020.2971354
[2] S. Makki, Z. Assaghir, Y. Taher, R. Haque, M. Hacid, H. Zeineddine. “An
experimental study with imbalanced classification approaches for credit card fraud
detection”.IEEE Access, 7 (2019), pp. 93010-
93022,10.1109/ACCESS.2019.2927266
[3] D. Prusti, S.K. Rath.“Web service based credit card fraud detection by applying
machine learning
techniques”.Proceedings of the TENCON 2019 - 2019 IEEE Region 10 Conference
(TENCON), Kochi, India (2019), pp. 492-497,10.1109/TENCON.2019.8929372
[4] Alae Chouiekha, EL Hassane Ibn EL Haj. “ConvNets for Fraud Detection
analysis”. Procedia Computer Science 127, pp.133–138.2018.
[5] Tom Sweers. “Autoencoding Credit Card Fraud”. Bachelor Thesis, Radboud
University. June 2018.

You might also like