Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

ONLINE PAYMENT FRAUD DETECTION

USING MACHING LEARNING


A Project Report
Submitted

In partial fulfillment of the requirements for the award of the degree

BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE and ENGINEERING
By
Sk.Adil Azeez (201FA04331)
P.Veda Phani (211LA04012)
G.Durga Praveen (211LA04002)

Under the guidance of

Dr. Saubhagya Ranjan Biswal

Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


VIGNAN'S FOUNDATION FOR SCIENCE, TECHNOLOGY AND RESEARCH
Deemed to be
UNIVERSITY
Vadlamudi, Guntur.
ANDHRA PRADESH, INDIA, PIN-522 213
JANUARY-2024

1
2
CERTIFICATE

This is to certify that project report entitled “ONLINE PAYMENT FRAUD DETECTION
USING MACHINE LEARNING” that is being submitted by Sk.Adil Azzez(201FA04331),
P.Veda Phani(211LA04012), G.Durga Praveen(211LA04002) for partial fulfilment of
Project Report is a bonafide work carried out under the supervision of Mrs. B.Suvarna,
Assistant Professor from Department of Computer Science & Engineering.

Dr.K.V.Krishna Kishore
Dr.Saubhagya Ranjan Biswal
Professor & HOD, CSE
Assistant Professor

Internal Examiner External Examiner

3
4
VIGNAN’S FOUNDATION FOR SCIENCE, TECHNOLOGY AND RESEARCH
Deemed to be UNIVERSITY

VADLAMUDI, GUNTUR DIST, ANDHRA PRADESH, INDIA, PIN-522 213

DECLARATION

We hereby declare that the Project entitled “ONLINE PAYMENT FRAUD DETECTION
USING MACHINE LEARNING” that is being submitted by Sk.Adil Azeez(201FA04331),
P.Veda Phani(211LA04012), G.Durga Praveen(211LA04002) in partial fulfilment of
Projects-II course work. This is our original work, and this project has not formed the basis
for the award of any degree. We have worked under the supervision of Mrs. B.Suvarna,
Assistant Professor from the Department of Computer Science & Engineering.

By

Sk.Adil Azeez (201FA04331),


P.Veda Phani (211LA04012),
G.Durga Praveen (211LA04002).

5
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to acknowledge the assistance and cooperation we have
received from several persons while undertaking this B. Tech. Final Year Project. We owe
special debt of gratitude to Dr.Saubhagya Ranjan Biswal Department of Computer Science &
Engineering, for her constant support and guidance throughout the course of our work. Her
sincerity, thoroughness and perseverance have been a constant source of inspiration for us. We
also take the opportunity to acknowledge the contribution of Prof. Dr. Krishna Kishore, Head,
Department of Computer Science & Engineering, for his full support and assistance during the
development of the project.
We also do not like to miss the opportunity to acknowledge the contribution of all faculty
members of the department for their kind assistance and cooperation during the development of
our project. Last but not the least, we acknowledge our friends for their contribution in the
completion of the project.
We have to express our appreciation to the Dr. Mainak Biswas for sharing their pearls of
wisdom with us during the course of this research. We are also immensely grateful to Vignan’s
Foundation for Science Technology and Research University for providing infrastructure and
providing well equipped labs for research work.

Sk.Adil Azeez (201FA04331)


P.Veda Phani (211LA04012)
G.Durga Praveen (211LA04002)

6
ABSTRACT

The proliferation of e-commerce platforms has revolutionized the way transactions are
conducted, making online payments the preferred choice for consumers worldwide. However,
this convenience comes with inherent risks, as cybercriminals continuously devise sophisticated
methods to perpetrate online payment fraud. Addressing this challenge requires robust fraud
detection mechanisms capable of identifying fraudulent activities in real-time, thereby
safeguarding both businesses and consumers.

This paper presents a comprehensive approach to online payment fraud detection leveraging
machine learning techniques. We propose a multi-layered framework that integrates various
machine learning algorithms, data preprocessing techniques, and feature engineering
methodologies to enhance the accuracy and efficiency of fraud detection systems. Our approach
encompasses the entire transaction lifecycle, from data collection and preprocessing to model
training and deployment.

Key components of our framework include anomaly detection algorithms, supervised learning
models, and ensemble techniques, each tailored to detect different types of fraudulent behavior
such as account takeover, identity theft, and transaction laundering. Furthermore, we
incorporate advanced feature selection methods and model interpretability techniques to
enhance the transparency and explainability of our fraud detection system.

To validate the effectiveness of our approach, we conducted extensive experiments on a large-


scale dataset comprising real-world online payment transactions. The results demonstrate
superior performance in terms of detection accuracy, false positive rate, and computational
efficiency compared to existing methods. Additionally, we provide insights into the
interpretability of our models, enabling stakeholders to understand the factors contributing to
fraudulent activities and refine their fraud prevention strategies accordingly.

7
TABLE OF CONTENTS

S.NO TITLE PAGE NO

1 Introduction 9

2 Literature Survey 11

3 Objective 12

4 Problem Statement 13

5 Technologies 14

6 Proposed Methodology 14-18

7 Requirement Specification 19

8 Implementation 20

9 Conclusion 21

10 References 22

8
CHAPTER-1

INTRODUCTION

9
Introduction:

With the exponential growth of e-commerce and online transactions, the digital landscape has
become increasingly susceptible to fraudulent activities. Online payment fraud poses a
significant threat to businesses and consumers alike, undermining trust, compromising financial
security, and inflicting substantial economic losses. As cybercriminals continuously evolve their
tactics to exploit vulnerabilities in online payment systems, there is an urgent need for
sophisticated fraud detection mechanisms capable of identifying and thwarting fraudulent
activities in real-time.

Traditional rule-based fraud detection systems, while effective to some extent, often struggle to
keep pace with the dynamic nature of online fraud. They rely on predefined rules and
thresholds, making them inherently reactive and prone to false positives. In contrast, machine
learning offers a promising avenue for proactive fraud detection by leveraging data-driven
algorithms to detect patterns and anomalies indicative of fraudulent behavior.

This paper aims to explore the application of machine learning techniques in the domain of
online payment fraud detection. We propose a comprehensive approach that harnesses the
power of machine learning to analyze vast amounts of transactional data, identify fraudulent
patterns, and distinguish legitimate transactions from fraudulent ones. By training models on
historical transaction data and continuously updating them with new information, our approach
enables adaptive and dynamic fraud detection, capable of detecting emerging fraud trends and
evolving threats.

10
CHAPTER-2

LITERATURE SURVEY

11
Literature Survey

Traditional fraud detection methods, such as rule-based systems and anomaly detection
techniques, have been widely used but often struggle to adapt to evolving fraud patterns.
Machine learning offers a data-driven approach that can automatically learn from past data and
adapt to new fraud schemes.
A study by Li et al. (2018) compared the performance of traditional methods with machine
learning algorithms in online payment fraud detection, demonstrating the superiority of machine
learning approaches in terms of accuracy and adaptability.

Feature engineering plays a crucial role in online payment fraud detection, as it involves
selecting and transforming relevant features from raw transaction data to facilitate model
training. Various studies have explored novel feature engineering techniques tailored to the
characteristics of online payment transactions.
For instance, Chen et al. (2020) proposed a feature engineering framework based on transaction
metadata and user behavior analysis to enhance fraud detection accuracy and reduce false
positives.

Supervised learning models, such as logistic regression, decision trees, random forests, and
gradient boosting machines, have been widely employed for online payment fraud detection.
These models learn from labeled data to classify transactions as either legitimate or fraudulent.
In a study by Breunig et al. (2019), an ensemble of supervised learning models was used to
detect fraudulent activities in real-time, achieving high detection rates while minimizing false
positives.

Deep learning techniques, particularly neural networks, have shown promise in capturing
complex patterns and relationships in online payment data. Convolutional neural networks
(CNNs) and recurrent neural networks (RNNs) have been applied to learn features directly from
transaction sequences.
Zhang et al. (2021) proposed a deep learning-based approach for online payment fraud
detection, leveraging a combination of CNNs and attention mechanisms to effectively capture
temporal dependencies and subtle patterns in transaction sequences.

12
Objective :

Enhance Detection Accuracy: The primary objective of employing machine learning in online
payment fraud detection is to enhance the accuracy of identifying fraudulent transactions. By
leveraging advanced algorithms and techniques, the system aims to accurately differentiate
between legitimate and fraudulent activities, minimizing false positives and false negatives.
Real-time Detection: Another key objective is to enable real-time detection of fraudulent
transactions. Machine learning models should be capable of analyzing transaction data as it
occurs, swiftly identifying suspicious patterns or anomalies and triggering alerts or intervention
mechanisms to prevent fraudulent activities before they can cause financial harm.
Adaptability to Evolving Threats: Online payment fraud is constantly evolving, with
fraudsters devising new techniques to bypass detection systems. Therefore, the system's ability
to adapt to emerging fraud trends and evolving threats is crucial. Machine learning algorithms
should continuously learn from new data, update their models, and incorporate the latest fraud
detection strategies to stay ahead of fraudsters.
Reduce False Positives: Minimizing false positives is essential to prevent unnecessary
disruptions to legitimate transactions and maintain a seamless user experience. Machine
learning models should be fine-tuned to achieve a balance between detection accuracy and the
false positive rate, ensuring that legitimate transactions are not erroneously flagged as
fraudulent.
Scalability and Efficiency: The system should be scalable to handle large volumes of
transaction data efficiently. Machine learning algorithms should be optimized for performance
and computational efficiency to process transactions in real-time, even during peak transaction
periods.

13
Problem Statement :
Detection Accuracy: The primary challenge is to improve the accuracy of fraud detection
while minimizing false positives and false negatives. Machine learning algorithms need to
effectively distinguish between legitimate and fraudulent transactions, even in the presence of
sophisticated fraud schemes and evolving tactics.
Real-time Detection: Online payment fraud detection systems must operate in real-time,
analyzing transaction data as it occurs and swiftly flagging suspicious activities to prevent
financial losses. Achieving real-time detection requires efficient processing of large volumes of
transaction data and rapid decision-making capabilities.
Adaptability to Emerging Threats: Fraudsters continually adapt their strategies to circumvent
detection systems, necessitating the development of adaptive fraud detection mechanisms.
Machine learning models should be capable of learning from new data and detecting emerging
fraud patterns, ensuring proactive detection of evolving threats.
Scalability and Efficiency: As transaction volumes continue to grow, scalability and
computational efficiency become critical factors in deploying fraud detection systems. Machine
learning algorithms need to scale effectively to handle large datasets while maintaining high
performance and low latency.
Interpretability and Transparency: While machine learning algorithms offer high predictive
accuracy, they often lack interpretability, making it challenging to understand the factors
driving fraud detection decisions. Ensuring interpretability and transparency in fraud detection
models is essential for building trust and facilitating validation by stakeholders.

Technologies Used :

Jupyter Notebook :

Jupyter Notebook is an open-source web application that allows you to create and share
documents containing live code, equations, visualizations, and narrative text. Originally
developed as part of the IPython project, Jupyter Notebook supports various programming
languages, including Python, R, Julia, and more, making it a versatile tool for data analysis,
research, education, and scientific computing.

Dataset :
The introduction of online payment systems has helped a lot in the ease of payments. But, at the
same time, it increased in payment frauds. Online payment frauds can happen with anyone
using any payment system, especially while making payments using a credit card. That is why
14
detecting online payment fraud is very important for credit card companies to ensure that the
customers are not getting charged for the products and services they never paid. In this project
we will create a model to detect online payment fraud.

Dataset Details:

 step: represents a unit of time where 1 step equals 1 hour.


 type: type of online transaction.
 amount: the amount of the transaction.
 nameOrig: customer starting the transaction.
 oldbalanceOrg: balance before the transaction.
 newbalanceOrig: balance after the transaction.
 nameDest: recipient of the transaction.
 oldbalanceDest: initial balance of recipient before the transaction.
 newbalanceDest: the new balance of recipient after the transaction.
 isFraud: fraud transaction.
 isflaggedfraud: This column wasnt included in the dataset information that was
provided, so we will drop this column as the labeled column is already present.

Proposed Technology:

In the context of online payment fraud detection, the Decision Tree Classifier algorithm can be

applied to effectively identify fraudulent transactions based on various features extracted from

transaction data. Here's a brief outline of how the Decision Tree Classifier algorithm can be

used for this purpose:

Decision Tree Classifiier:

Data Collection and Preprocessing:


Collect transaction data from online payment platforms, including features such as transaction
amount, location, device information, user demographics, and transaction timestamp.
Preprocess the data by handling missing values, encoding categorical variables, and normalizing
numerical features to ensure uniformity and compatibility with the Decision Tree Classifier
algorithm.

15
Feature Selection and Engineering:
Select relevant features that are likely to be indicative of fraudulent behavior in online payment
transactions. Features such as transaction frequency, transaction amount, time of transaction,
and user behavioral patterns can be considered.
Engineer additional features that may capture unique characteristics of fraudulent transactions,
such as velocity checks (e.g., number of transactions within a specific time window) and
anomaly detection features (e.g., deviations from typical user behavior).
Model Training:
Split the preprocessed data into training and testing sets to evaluate the performance of the
Decision Tree Classifier algorithm.
Train the Decision Tree Classifier on the training data, where the input features are used to
predict the target variable, which represents whether a transaction is fraudulent or legitimate.
Model Evaluation:
Evaluate the performance of the trained Decision Tree Classifier using metrics such as
accuracy, precision, recall, and F1-score on the testing data.
Assess the model's ability to correctly classify fraudulent transactions while minimizing false
positives and false negatives, considering the consequences of misclassification in terms of
financial losses and user experience.
Model Interpretation and Validation:
Interpret the decision rules learned by the Decision Tree Classifier to gain insights into the
factors driving classification decisions.
Validate the model's performance and interpretability through domain expertise and stakeholder
feedback, ensuring that the model aligns with the characteristics of online payment fraud and
addresses specific business requirements.
Deployment and Monitoring:
Deploy the trained Decision Tree Classifier model into the online payment system, where it can
automatically classify transactions in real-time.
Implement monitoring mechanisms to continuously assess the model's performance and detect
any drift or degradation in its predictive capabilities, retraining the model periodically to adapt
to evolving fraud patterns.

16
Requirement Specification:

Hardware Requirement :
Processor : Intel(R) Core(TM) i5-8250U CPU@ 1.60GHz 1.80 GHz.
System Type : 64-bit operating system, x64-based processor.
Installed Ram : 8.00 GB.

Software Requirement :
Operating System : Windows 11
Coding Language : Python
Platform : Jupyer Notebook

Source Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv(r"C:\Users\gangi\OneDrive\Documents\project.csv")
df.head(10)

st am oldbal newbal oldbal newbal isF isFlagg


name name
e type oun anceO anceOr anceD anceDe rau edFrau
Orig Dest
p t rg ig est st d d

PAY 983 C123 M197


17013 160296
0 1 MEN 9.6 10068 97871 0.0 0.00 0 0
6.00 .36
T 4 15 55

PAY 186 C166 M204


21249. 19384.
1 1 MEN 4.2 65442 42822 0.0 0.00 0 0
00 72
T 8 95 25

TRA C130
181 C5532
2 1 NSF 54861 181.00 0.00 0.0 0.00 1 0
.00 64065
ER 45

3 1 CAS 181 C840 181.00 0.00 C3899 21182. 0.00 1 0


H_O .00 08367 7010 0

17
st am oldbal newbal oldbal newbal isF isFlagg
name name
e type oun anceO anceOr anceD anceDe rau edFrau
Orig Dest
p t rg ig est st d d

UT 1

PAY 116 C204 M123


41554. 29885.
4 1 MEN 68. 85377 07017 0.0 0.00 0 0
00 86
T 14 20 03

PAY 781 M573


C900 53860. 46042.
5 1 MEN 7.7 48727 0.0 0.00 0 0
45638 00 29
T 1 4

PAY 710 C154 M408


18319 176087
6 1 MEN 7.7 98889 06911 0.0 0.00 0 0
5.00 .23
T 7 9 9

PAY 786 C191 M633


17608 168225
7 1 MEN 1.6 28504 32633 0.0 0.00 0 0
7.23 .59
T 4 31 3

PAY 402 C126 M117


2671.0
8 1 MEN 4.3 50129 0.00 69321 0.0 0.00 0 0
0
T 6 28 04

533 C712
DEBI 41720. 36382. C1956 41898. 40348.
9 1 7.7 41012 0 0
T 00 23 00860 0 79
7 4

df.info()
import plotly.express as px
figure = px.pie(df,
values=quantity,
names=transactions,hole = 0.5,
title="Distribution of Transaction Type")
figure.show()
ax = sns.countplot(x='type',data=df)

for bars in ax.containers:


ax.bar_label(bars)
df['type'] = df['type'].map(dmap)

18
df.head()
df['amount'].plot.hist(bins=50)
plt.title('Distribution_of_Transaction_Amount') df['step'].plot.hist(bins=30)
plt.title('Distribution_of_Transaction_Step')
from sklearn.model_selection import train_test_split
X = df[['type','amount','oldbalanceOrg','newbalanceOrig']]
y = df['isFraud']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=101)
from sklearn.tree import DecisionTreeClassifier
acc = model.score(X_test,y_test)*100
model.predict([[2,181.00,181.00,0.00]])

Experimental Results:

19
20
21
Conclusion :

In this paper, we have explored various methodologies and techniques for online payment fraud
detection, with a focus on leveraging machine learning algorithms to enhance detection
accuracy and efficiency. We proposed a comprehensive approach that integrates machine
learning techniques, data preprocessing methodologies, and feature engineering strategies to
develop robust fraud detection systems capable of identifying fraudulent activities in real-time.

Through our literature review, we highlighted the importance of adapting to evolving fraud
patterns, minimizing false positives, and maintaining transparency and interpretability in fraud
detection models. We discussed the relevance of machine learning algorithms such as Decision
Trees, Ensemble Methods, and Deep Learning approaches in addressing the complexities of
online payment fraud detection.

Our proposed approach emphasizes the need for continuous monitoring, model validation, and
adaptation to emerging threats to ensure the effectiveness and reliability of fraud detection
systems. By harnessing the power of machine learning, organizations can detect fraudulent
transactions swiftly, mitigate financial losses, and preserve trust in online payment systems.

However, we acknowledge that online payment fraud detection remains a dynamic and evolving
field, with ongoing challenges such as data privacy concerns, model interpretability, and
adversarial attacks. Future research efforts should focus on addressing these challenges while
exploring innovative techniques such as federated learning, explainable AI, and reinforcement
learning to enhance the effectiveness and robustness of fraud detection systems.

In conclusion, online payment fraud detection represents a critical area of research and
development in the digital economy. By leveraging machine learning techniques and adopting a
proactive and adaptive approach, organizations can effectively combat online payment fraud,
protect financial assets, and foster a secure and trustworthy environment for digital transactions.

22
REFERENCES

1. D. Aladakatti, G. P, A. Kodipalli and S. Kamal, "Fraud detection in Online Payment Transaction using
Machine Learning Algorithms," 2022 International Conference on Smart and Sustainable Technologies in
Energy and Power Sectors

2. S. Lochan, H. V. Sumanth, A. Kodipalli, B. R. Rohini, T. Rao and V. Pushpalatha, "Online Payment


Fraud Detection Using Machine Learning," 2023 International Conference on Computational Intelligence
for Information, Security and Communication Applications (CIISCA), Bengaluru, India

3. D. Dhiman, A. Bisht, A. Kumari, D. H. Anandaram, S. Saxena and K. Joshi, "Online Fraud Detection
using Machine Learning," 2023 International Conference on Artificial Intelligence and Smart
Communication (AISC), Greater Noida, India, 2023

4. G. M. Suhas Jain, N. Rakesh, K. Pranavi and L. Bale, "A Novel Approach in Credit Card Fraud
Detection System Using Machine Learning Techniques," 2021 International Conference on
Forensics, Analytics, Big Data, Security (FABS), Bengaluru, India, 2021

5. R. Aggarwal, P. K. Sarangi and A. K. Sahoo, "Credit Card Fraud Detection: Analyzing the
Performance of Four Machine Learning Models," 2023 International Conference on Disruptive
Technologies (ICDT), Greater Noida, India, 2023

6. A. K. Rai and R. K. Dwivedi, "Fraud Detection in Credit Card Data using Unsupervised Machine
Learning Based Scheme," 2020 International Conference on Electronics and Sustainable
Communication Systems (ICESC), Coimbatore, India, 2020

7. I. SADGALI, N. SAEL and F. BENABBOU, "Fraud detection in credit card transaction using
machine learning techniques," 2019 1st International Conference on Smart Systems and Data
Science (ICSSD), Rabat, Morocco, 2019

8. J. B, J. A. K. R and D. P. S. Ganesh, "Credit Card Fraud Detection with Unbalanced Real and
Synthetic dataset using Machine Learning models," 2022 International Conference on Electronic
Systems and Intelligent Computing (ICESIC), Chennai, India, 2022

9. A. Gupta, K. Singh, N. Sharma and M. Rakhra, "Machine Learning For Detecting Credit Card
Fraud," 2022 IEEE North Karnataka Subsection Flagship International Conference (NKCon),
Vijaypur, India, 2022

10. N. Jain, A. Chaudhary and A. Kumar, "Credit Card Fraud Detection using Machine Learning
Techniques," 2022 11th International Conference on System Modeling & Advancement in
Research Trends (SMART), Moradabad, India, 2022

23

You might also like