Professional Documents
Culture Documents
Project Submission GRP1 TEAM2
Project Submission GRP1 TEAM2
Project Submission GRP1 TEAM2
Fraud Detection
Analysis
GROUP- 1
TEAM- 2
Hariharan 2001132
Arunachaleshwar 2001179
Yazhini Samyuktha 2001230
Anjali Krishna 2001173
Karthik Venkatachalam 2001117
Kaushik Nathan 2001235
CASE CONTEXT
Fraud Detection of providers in the Health care industry.
Industries face fraud and manual detection is almost impossible
Large amount of data being generated on a daily basis.
Healthcare fraud results in loss of money for insurance companies
Some of the common types of frauds that are committed by providers are:
Billing for services not provided.
Duplicate submission of a claim
Misrepresenting service provided
Charging for a more complex or expensive service
Billing for a covered service when the service provided was not covered
PROBLEM STATEMENT
To identify fraudulent insurance agents based on claims data of the beneficiaries
DATA SUMMARY
In patient data Contains details about claim details, provider details, physician details, procedures
performed on patient, diagnostic details, amount paid by the patient, admission date,
discharge date and the insurer details
Out Contains details about claim details, provider details, physician details, procedures
patient data performed on patient, diagnostic details, amount paid by the patient and the insurer
Beneficiary Contains details about beneficiary ID, basic details of beneficiary, chronic conditions,
Data annual claim and reimbursement amounts
Provider:
Contains details about provider and potential fraud or not
fraudulent data
Process flow
Raw data
Exploratory data
VARIABLE CREATION
Claim days = claimed admission date - claimed discharged date
AGGREGATION
Age= Date of birth – current date (date of death is not available)
Actual admission days= Discharge date- admission date Master data is prepared by
Is date or not aggregating or grouping all the
Number of claims inpatient and outpatient data
Number of diagnostics based on provider details
Number of procedures
Number of chronic conditions dealt
States
EXPLORATORY DATA ANALYSIS
No of days claim
No of fraudulent
transactions
0
Not fraudulent Fraudulent
No of days claim
Fraudulent
No of fraudulent
transactions
Not fraudulent
No of chronic diseases
0
No of chronic diseases
No of fraudulent No of fraudulent
transactions transactions
0
0
Diagnosis
No of procedures
Fraudulent
Fraudulent Not fraudulent
Not fraudulent
Diagnosis
No of procedures
EXPLORATORY DATA ANALYSIS
MODEL CREATION
LOGISTIC REGRESSION MODEL
Logistics regression model has been built to identify potential fraudulent insurance
providers based on the master data set.
Many variables like amount paid by beneficiary, is dead or not, claimed number of
procedures performed on the beneficiary contributed higher significance in predicting
the fraudulent insurance provider.
AIC value
Odds Odds
2105.7 Variables (Increase) (Decrease)
Chronic Diseases 11%
People who claimed extra admission IsDead 32%
days than the actual admission were No of procedures 9%
most likely to be from a fraudulent Claim Days 7%
service provider Age 1%
MODEL ANALYSIS
CONFUSION MATRIX
Predicted vs actual fraudulent claims were verified
0 1 0 1
0 4810 66 0 964 11
ACTUAL 1 307 199 ACTUAL
1 52 49
60%
Lift
3.00
40%
2.00
20% 1.00
0% 0.00
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Decile Decile
INSIGHTS:
• Number of fraudulent providers in the top 2 deciles of the predictive model is 4 times the number
of fraudulent providers from a random sample of 20%.
• Top 20% of the predicted model gives 80% of the target population
OBSERVATIONS &
OBSERVATIONS
RECOMMENDATIONS
With lesser mean age is thereby found to be fraudulent
Providers who Whose annual reimbursement amount is more is found to be fraudulent
serviced Who claimed higher number of admissions days than the actual admission days
customers Whose death rate is high is mostly found to be fraud
Who performed higher number of hospital procedures is found to be fraud
RECOMMENDATION
With lesser mean age are to be scrutinised for fraudulent submissions
With higher annual reimbursement are to be scrutinised for fraud
With higher number of admission days are to be scrutinised
Insurance claims From providers with higher death rate are to be scrutinised
From hospitals that has performed higher number of hospital procedures are to
be scrutinised