Project Submission GRP1 TEAM2

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Healthcare Provider

Fraud Detection
Analysis
GROUP- 1
TEAM- 2
Hariharan 2001132
Arunachaleshwar 2001179
Yazhini Samyuktha 2001230
Anjali Krishna 2001173
Karthik Venkatachalam 2001117
Kaushik Nathan 2001235
CASE CONTEXT
 Fraud Detection of providers in the Health care industry.
 Industries face fraud and manual detection is almost impossible
 Large amount of data being generated on a daily basis.
 Healthcare fraud results in loss of money for insurance companies
 Some of the common types of frauds that are committed by providers are:
 Billing for services not provided.
 Duplicate submission of a claim
 Misrepresenting service provided
 Charging for a more complex or expensive service
 Billing for a covered service when the service provided was not covered

PROBLEM STATEMENT
To identify fraudulent insurance agents based on claims data of the beneficiaries
DATA SUMMARY

In patient data Contains details about claim details, provider details, physician details, procedures
performed on patient, diagnostic details, amount paid by the patient, admission date,
discharge date and the insurer details

Out Contains details about claim details, provider details, physician details, procedures
patient data performed on patient, diagnostic details, amount paid by the patient and the insurer

Beneficiary Contains details about beneficiary ID, basic details of beneficiary, chronic conditions,
Data annual claim and reimbursement amounts

Provider:
Contains details about provider and potential fraud or not
fraudulent data
Process flow
Raw data

Data quality check

Eliminating null values Prediction


Logistic
Regression
Aggregation Confusion matrix Inference
Akaike’s
Variable creation Information Criteria
Gain & Lift

Exploratory data

Master data creation


DATA
DATA CLEANSING
PREPROCESSING
Variables like beneficiary ID, claim ID, operating physician code, diagnostic codes has been removed from
statistical modelling as it might not have any effects over the prediction.

VARIABLE CREATION
Claim days = claimed admission date - claimed discharged date
AGGREGATION
Age= Date of birth – current date (date of death is not available)
Actual admission days= Discharge date- admission date Master data is prepared by
Is date or not aggregating or grouping all the
Number of claims inpatient and outpatient data
Number of diagnostics based on provider details
Number of procedures
Number of chronic conditions dealt

NULL VALUE REMOVAL


Null values are eliminated based on its significance and columns with more null values have been removed
from model creation process
EXPLORATORY DATA ANALYSIS

Not fraudulent Fraudulent


No of fraudulent providers

States
EXPLORATORY DATA ANALYSIS

No of days claim
No of fraudulent
transactions

0
Not fraudulent Fraudulent
No of days claim

Fraudulent
No of fraudulent
transactions

Not fraudulent
No of chronic diseases
0

No of chronic diseases
No of fraudulent No of fraudulent
transactions transactions

0
0

Diagnosis
No of procedures

Fraudulent
Fraudulent Not fraudulent
Not fraudulent
Diagnosis
No of procedures
EXPLORATORY DATA ANALYSIS
MODEL CREATION
LOGISTIC REGRESSION MODEL
Logistics regression model has been built to identify potential fraudulent insurance
providers based on the master data set.

Many variables like amount paid by beneficiary, is dead or not, claimed number of
procedures performed on the beneficiary contributed higher significance in predicting
the fraudulent insurance provider.

AIC value
Odds Odds
2105.7 Variables (Increase) (Decrease)
Chronic Diseases 11%
People who claimed extra admission IsDead 32%
days than the actual admission were No of procedures 9%
most likely to be from a fraudulent Claim Days 7%
service provider Age 1%
MODEL ANALYSIS
CONFUSION MATRIX
Predicted vs actual fraudulent claims were verified

TRAIN PREDICTED TEST PREDICTED

0 1 0 1
0 4810 66 0 964 11
ACTUAL 1 307 199 ACTUAL
1 52 49

Sensitivity= true positives/(true positive + false negative)


Specificity=true negatives/(true negative + false positives)

Accuracy Sensitivity Specificity


94% 0.75 0.93
MODEL
DIAGNOSTICS
GAIN CHART LIFT CHART
120% 7.00
100% 6.00
80% 5.00
4.00
Gain

60%

Lift
3.00
40%
2.00
20% 1.00
0% 0.00
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Decile Decile

Gain Random Cumulative lift Random

INSIGHTS:
• Number of fraudulent providers in the top 2 deciles of the predictive model is 4 times the number
of fraudulent providers from a random sample of 20%.
• Top 20% of the predicted model gives 80% of the target population
OBSERVATIONS &
OBSERVATIONS
RECOMMENDATIONS
 With lesser mean age is thereby found to be fraudulent
Providers who  Whose annual reimbursement amount is more is found to be fraudulent
serviced  Who claimed higher number of admissions days than the actual admission days
customers  Whose death rate is high is mostly found to be fraud
 Who performed higher number of hospital procedures is found to be fraud

RECOMMENDATION
 With lesser mean age are to be scrutinised for fraudulent submissions
 With higher annual reimbursement are to be scrutinised for fraud
 With higher number of admission days are to be scrutinised
Insurance claims  From providers with higher death rate are to be scrutinised
 From hospitals that has performed higher number of hospital procedures are to
be scrutinised

You might also like