Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

EMPOWERING FINANCIAL SECURITY:

DETECTING FRAUDULENT
TRANSACTIONS USING ADVANCED
MACHINE LEARNING TECHNIQUES
AND PREDICTIVE ANALYTICS
PRESENTED BY: ARUN KUMAR R
DATA SCIENCE TRAINEE
LEARNBAY
AGENDA

1. Introduction to the problem


2. Data Collection and Preprocessing
3. Exploratory Data Analysis
4. Model Selection and Evaluation
5. Results and Conclusion
6. Q&A
1.INTRODUCTION
 According to the Market Statsville Group (MSG), the global e-commerce fraud prevention market
size is expected to grow from USD 38,714.0 million in 2022 to USD 303,870.4 million by 2033,
growing at a CAGR of 20.6% from 2023 to 2033
 Indian banks reported a Rs 4.69 lakh crore loss on account of frauds between June 1, 2014,
and March 31, 2023, from around 65017 frauds reported across banks
 In FY2023, the total number of fraud cases in the banking system were 13,530. Of this almost 49
per cent or 6,659 cases were in the digital payment – card/internet – category.
 India lost at least Rs 100 crore every day to bank fraud or scams over the past seven years
 In financial year 2023, the Reserve Bank of India (RBI) reported a total of more than 13
thousand bank fraud cases across India. The total value of bank frauds decreased from 1.38
trillion Indian rupees to 302 billion Indian rupees.
10 TYPES OF BANKING FRAUDS IN INDIA

1.Phishing-creating fake websites and gather important information.


2.Vishing-fraudster call & gather info from customers as they call from banks or institutions
3.Frauds using online sales platform
4. Frauds due to the use of unknown/unverified mobile apps
5.ATM card skimming
6.Frauds using screen sharing apps/Remote access
7.SIM swap or SIM cloning
8.Frauds by compromising credentials on results through search engines
9.Scam through QR code scan
10.Impersonation on social media
PROBLEM STATEMENT

 Develop a machine learning model to detect potentially


fraudulent transactions based on the provided features.
 The dataset contains information about various transactions,
including account age, payment method, time of transaction,
and category.
 The goal is to build a classification model that can accurately
classify transactions as either legitimate or potentially
fraudulent.
DATA DICTIONARY
 accountAgeDays: The number of days the account has been active.
 numItems: The number of items associated with the account.
 localTime: Some measure of time, possibly in hours or a similar unit.
 paymentMethod: The method used for payment (e.g., PayPal, store credit, credit
card).
 paymentMethodAgeDays: The number of days since the payment method was
associated with the account.(It indicates how long ago the current payment method
(e.g., PayPal, credit card) was linked to the account.)
 isWeekend: A binary indicator of whether the transaction occurred on a weekend (1
for yes, 0 for no).
 Category: The category of the transaction (e.g., electronics, shopping, food).
 Label(Target column) A binary label (0 for legitimate, 1 for potentially fraudulent).
DATA STRUCTURE
 No_of_columns – 8 Nos

 No_of_Rows – 38662 Nos


DATA DISTRIBUTION
2.DATA CLEANING AND
PREPROCESSING
 Duplicate values
 Treating missing values
 Encoding
 Outlier Treatment
 Feature Scaling
 Imbalanced data treatment – Random Over Sampler
DUPLICATE VALUES

 3033 duplicate rows


 7.73% of total data
 Made two models with and without duplicate values
MISSING VALUES
 Variables ‘isWeekend’ & ‘Category’ has 560 and 95 missing
values respectively.
 the missing values of 'isWeekend’ is aligned with 'label's
category of 'fraud' i.e.1. so filling this with 0 or 1(weekday or
weekend) would make a false model, so drop this variable.
 Treat the ‘category’ variable with “mode” values.
ENCODING
 Variables Category & paymentMethod has categorical
values.
 Treat them with One Hot Encoder & drop the duplicate
variable
OUTLIER TREATMENT
 Variables numItems & paymentMethodAgeDays has outlier
values.
 Since these outliers represent natural variations in the
population, they were leaved as it is.
FEATURE SCALING
 Variables accountAgeDays & paymentMethodAgeDays has
value range upto 2000.
 Since there is no limit for this values, I scaled the dataset
with standardization method.
IMBALANCED DATASET
 The dependent variable ‘label’ have 0’s & 1’s in 38661 &
560 times respectively.
 Huge imbalance(98.57% & 1.43%)
 Used SMOTE method to balance the data.
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
MODEL SELECTION
 After splitting the data into train & test,
I build the model in almost all the
classification algorithms.
 Out of all the classifier models, I choose
the model with high accuracy.
 i.e. RF model.
MODEL EVALUATION

Metrics Accuracy Precision Recall F1 score


model
Training 1.00 1.00 1.00 1.00
Test 1.00 1.00 1.00 1.00
MODEL EVALUATION

 Here, our focus should be on Type-II


error. i.e. False Negative.
 It is less compared to the False Positive.
MODEL EVALUATION
 The ROC-AUC curve also
shows accuracy score of 1.00
and 0.99 for training and test
accuracy.
 The area under the curve
value also 0.99
 To reduce the over fitting
problem I did Cross Validation
on this RF model.
RESULTS
 The final accuracy after cross validation: 99.63 & 99.31
 Business Impact: could avoid the loss of crores of money
for the customers of our bank.
CONCLUSION
 Summary: successfully implemented the bank fraud detection
model.
 Future Works:
1. Integration with real-time data by deploying the model in cloud.
2. Exploring the anomaly detection models.
THANK YOU

You might also like