Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

ONE CLASS CLASSIFICATION APPROACHES FOR

DETECTING AUTOMOBILE INSURANCE FRAUD

A MINOR PROJECT REPORT

In partial fulfilment for the award of the degree


Of
BACHELOR IN TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING (CSE)

Submitted by:
Atul Kumar Agrawal
Registration no. : 1602040031

Under the supervision of


Associate Professor
Dr.Suvasini Panigrahi

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


VEER SURENDRA SAI UNIVERSITY OF TECHNOLOGY, BURLA
SEPTEMBER 2019
Research objective : “One Class Classification approaches for detecting Automobile
Insurance Fraud”

I. INTRODUCTION:

Automobile insurance fraud has been a major issue to the insurance companies and has caused
several crores of losses due to the fraudulent and false claims. It is a serious crime in most parts of
the world and the scammners may be sentenced to atleast 1 year of jail and upto 20 years. The
farudsters involve fake patients, fake doctors, fake lawyers together. Various machine learning and
deep learning techniques have been developed to detect these kind of fraud and research is been
done to get the best suitable method that can detect new patterns of fraudsters over time. As the
number of frauds is low in comparison to the legitimate transactions,we use one class clasification
for the minority class to get better results so as to minimise the classifiation of non-fraudulent data
as fraud i.e. minimising the false positive alarm rates.

II. MOTIVATION :

Automobile Fraud Statistics:

The Insurance Fraud Bureau in the UK estimated there were more than 20,000 fake collisions and
false insurance claims across the UK from 1999 to 2006. One tactic fraudsters use is to drive to a
busy junction or roundabout and brake sharply causing a motorist to drive into the back of them.
They claim the other motorist was at fault because they were driving too fast or too close behind
them, and make a false and inflated claim to the motorist's insurer for injury and damage, which can
pay the fraudsters up to 30 Lakhs .In the Insurance Fraud Bureau's first year or operation, the usage
of data mining initiatives exposed insurance fraud networks and led to 74 arrests and a five-to-one
return on investment. The Insurance Research Council estimated that in 1996, 21 to 36 percent of
auto-insurance claims contained elements of suspected fraud.There is a wide variety of schemes
used to defraud automobile insurance providers.
According to data released by Beijing bureau of China, 10% insurance claims of the total claims
were fraud.The Coalition Against Insurance Fraud estimates that in 2006 a total of about $80 billion
was lost in the United States due to insurance fraud.According to estimates by the Insurance
Information Institute, insurance fraud accounts for about 10 percent of the property/casualty
insurance industry's incurred losses and loss adjustment expenses. Indiaforensic Center of Studies
estimates that Insurance frauds in India costs about $6.25 billion annually.

Data Source : Wikipedia

Various types of fake fraud claims are:


1. Staged Collisions : In this type of frauds, fraudsters use a motor vehicle to stage fake accidents
with an innocent party.
2. Exaggerated claims : Fake claims involving injuries and damages that may have already been
present before the actual accident had taken place.
3. False stolen reports : Claimaint might have sold the vehicle or gifted it to a relative and then
claims for insurance based on stolen case.
4. Hidden information : Claimaints may hide information regarding the driver at the time of
accident might be an excluded driver from the terms of the insurance .
5. Multiple claims: It includes people who claim multiple times for the same loss.

Types of automobile fraud:

1. Soft auto-insurance fraud: Examples of soft auto-insurance fraud include filing more than one
claim for a single injury, filing claims for injuries not related to an automobile accident,
misreporting wage losses due to injuries, and reporting higher costs for car repairs than those that
were actually paid.
2. Hard auto-insurance fraud: It includes activities such as staging,automobile collision, filing
claims when the claimant was not actually involved in the accident, submitting claims for medical
treatments that were not received, or inventing injuries and false stolen reports.

III. Literature Review :

S.No Name of the Research Year of Technique(s) Result


Paper Publication used
1. An Experimental Study 2019 - LR,C5.0, decision
With Imbalanced tree algorithm, SVM
Classification Approaches and ANN are the best
for Credit Card Fraud methods according to
Detection the 3 considered
performance
measures (Accuracy,
Sensitivity and
AUPRC).
2. Predicting Fraudulent 2018 Random Random forest
claims in automobile forest + outperforms the
insurance J48+Naive remaining two
Bayes algorithms

3. One-class support vector 2015 OCSVM OCSVM based


machine based undersampling
undersampling: improves the
Application to churn performance of
prediction and insurance classifiers.
fraud detection
4. The Identification 2015 Outlier Data mining had
Algorithm and Model detection the advantages of low
Construction method time complexity, high
of Automobile Insurance based on knn recognition rate,
Fraud Based on Data high accuracy
Mining
5. Random Rough Subspace 2011 random Random subspace
based Neural Network rough method can be used
Ensemble for Insurance subspace for online fraud
Fraud Detection based detection system
neural
network
ensemble
method

Various deep learning, Machine learning and data mining techniques have been implemented in the
case of automobile insurance fraud detection. These are:

1. Decision tree based


2. Machine learning
i. Supervised learning
A. Classification
a. Support vector machine (SVM) [4]
b. Recursive neural network(RNN)
c. Radial Basis Function neural network [5]
B. Regression and statistics
a. Logistic regression
b. Binary regression
ii. Unsupervised learning
A. Clustering
a. K-means clustering
b. Hierarchial clustering
B. Spectral Ranking Anomaly (SRA) [14]
iii. Semi- supervised learning : Combination of supervised and unsupervised learning
iv. Reinforcement learning
3. Multi layered perceptron based (MLP)
4. Data mining
5. Random forest
6. Naive Bayes Tree [4]
7. Probabilistic neural network
8. Group method of data handling (GMDH)
9. Synthetic Minority Oversampling Technique [2]
10. kRNN and K-Means hybrid for outlier elimination and undersampling [3]
11. Geometric mean based [6]
12. fuzzy Gaussian membership based oversampling [7]
13. data gravitation [8]
14. Fuzzy logic
15. Genetic algorithms
16. hybrid of back-propagation neural networks (ANN) and self-organizing maps (SOM) [9]
17. 10-fold cross validation method
18. OVERSAMPLING TECHNIQUES (for minority class):
a. ADASYN (adaptive synthesis)
b. SMOTE ()
19. BackPropagation
20. C4.5 algorithm [15]
21. Meta learning approaches [15]
DATASET USED : mycarclaims.csv [15]

Various attributes that are used frequently for automobile insurance fraud detection are:
occurance date and report date, claim open date and claim report date, count of customer
communication, policy effective date and claim occurance date, claim on same vehicle check.

IV. Proposed system :


Majority of the above mentioned work did have limitations due to any data imbalance problem. The
proposed model in this paper is a one class classification to deal with the minority class imbalance
problem.
As mentioned in the proposed methodology, we extracted two subsets of data in the ratios 80% and
20 % to ensure that each subset has the same proportion of positive and negative samples.

Proposal:

Flow Chart:
V. Conclusion:

Thus, the complete survey was done to get the various techniques used in automobile insurance
fraud detection. This shows that there is still scope as minority class classifiers can be oversampled
to further increase the accuracy and will be helpful in preventing losses of the insurance companies
to a great extent.

VI. Future Work:

The proposed system shall be implemented on different types of algorithms which are used to
compare and test the accuracies to get the best possible method for higher accuracy and low false
positive alarm rates.

VII. References:

1. Sundarkumar, G. Ganesh, Vadlamani Ravi, and V. Siddeshwar. "One-class support vector


machine based undersampling: Application to churn prediction and insurance fraud detection." In
2015 IEEE International Conference on Computational Intelligence and Computing Research
(ICCIC), pp. 1-7. IEEE, 2015.

2. N.V. Chawla,K.W. Bowyer., L.O. Hall, and W.P. Kegelmeyer,“SMOTE: Synthetic Minority
oversampling Technique”, Journal of Artificial Intelligence Research, vol. 16(1), pp. 321-357, 2002.

3. M. Vasu, and V. Ravi, “ A hybrid undersampling approach for mining unbalanced datasets:
Application to Banking and insurance”, International Journal of Data Mining Modeling and
Management, Vol. 3(1), pp. 75-105, 2011.

4. M.A.H. Farquad, V. Ravi and S. Bapi Raju,“Analytical CRM in banking and finance using SVM:
a modified active learning-based rule extraction approach”, International Journal of Electronic
Customer Relationship Management, vol. 6(1), pp 48-73, 2011.

5. M. D. Pérez-Godoy, A. J. Rivera, C. J. Carmona, M. J. D. Jesus, “Training algorithm for radial


basis funcion network to tackle learning process with imbalanced datasets”, Applied Soft
Computing, Vol. 25, pp. 26-39, 2014.
6. M. J. Kim, D. K. Kang, and H. B. Kim,“Geometric mean based boosting algorithm with
oversampling to resolve data imbalance problem for bankruptcy prediction”, Expert Systems with
Applications. Vol. 41(3), pp. 1074-1082, 2015.

7. D. C. Li, C. W. Liu, S. C. Hu, “A learning method for the class imbalance problem with medical
datasets”, Computers in Biology and Medicine, Vol. 40(5), pp. 509-518, 2010.

8. L. Peng, H. Zhang, B. Yang, andY. Chen, “A new approach for imbalanced data classification
based on data gravitation”, Information Sciences, Vol. 288, pp. 347-373, 2014.

9. C. F. Tsai, andY. H. Lu, “ Customer churn prediction by hybrid neural networks”, Expert Systems
with Applications, Vol. 36 (10), pp. 12547-12553, 2009.

10. Makki, Sara, Zainab Assaghir, Yehia Taher, Rafiqul Haque, Mohand-Saïd Hacid, and Hassan
Zeineddine. "An Experimental Study With Imbalanced Classification Approaches for Credit Card
Fraud Detection." IEEE Access 7 (2019): 93010-93022.

11. Kowshalya, G., and M. Nandhini. "Predicting Fraudulent Claims in Automobile Insurance." In
2018 Second International Conference on Inventive Communication and Computational
Technologies (ICICCT), pp. 1338-1343. IEEE, 2018.

12. Yan, Chun, and Yaqi Li. "The Identification Algorithm and Model Construction of Automobile
Insurance Fraud Based on Data Mining." In 2015 Fifth International Conference on
Instrumentation and Measurement, Computer, Communication and Control (IMCCC), pp. 1922-
1928. IEEE, 2015.

13. Xu, Wei, Shengnan Wang, Dailing Zhang, and Bo Yang. "Random rough subspace based neural
network ensemble for insurance fraud detection." In 2011 Fourth International Joint Conference on
Computational Sciences and Optimization, pp. 1276-1280. IEEE, 2011.
14. K. Nian, H. Zhang, A. Tayal, T. Coleman, and Y. Li, “Auto insurance fraud detection using
unsupervised spectral ranking for anomaly,” The Journal of Finance and Data Science, vol. 2, no. 1,
pp. 58–75, 2016.
15. C. Phua, D. Alahakoon, and V. Lee, “Minority report in fraud detection: classification of
skewed data,” Acm sigkdd explorations newsletter, vol. 6, no. 1, pp. 50–59, 2004.

You might also like