Professional Documents
Culture Documents
Healthcare Fraud Detection: Suhita Acharya
Healthcare Fraud Detection: Suhita Acharya
DETECTION
SUHITA ACHARYA
What is Medicare fraud?
• Double Billing
• Phantom Billing
• Upcoding
• Unbundling
• Kickbacks
• Unnecessary Services
• False price reporting
MEDICARE CLAIMS DATASET
Beneficiary:
Inpatient: Inpatient
• Claim level data (with provider/doctor info) for
the patients that have stayed at the hospital for
the medical service. Outpatient
Outpatient:
• Fraud/Non-Fraud providers:
before modeling.
DATA PREPROCESSING
Data was robust-scaled:
Missing Data Imputed:
Due to the inclusion of outliers
Beneficiary:
Two different sets were evaluated for each model:
Inpatient:
• One using SMOTE and one using Borderline
Outpatient:
SMOTE upsampling
Label Encoding: All categorical features
Different model types attempted:
New Feature Creation:
Logistic Regression
Deceased, Tot_Reimbursed_Amt,
Random Forest
Hospital_Stay, Claim_Duration,
Linear SVC
Physician_Count, Claim_Count,
ADABoost
Chr_Cond_Count, etc.
XGBoost
Dropped features:
LightGBM
With high null values, ones from which other
Hyperparameters were tuned based on the F1 metric.
features were created, etc.
Final parameters are chosen by Recursive Feature
Combined all datasets with fraud labels.
Selection
BEFORE WE PROCEED…
Basic Patient Information:
High average claim reimbursement amounts. Some of these Low average claim reimbursement amounts
providers have the highest reimbursement amounts in the
dataset.
High average number of patient insurance claims. Low average number of patient insurance claims.
A narrow range of total patient chronic condition counts. A wider range of total patient chronic condition counts.
Outpatient – high number of diagnosis codes listed on Outpatient - low number of diagnosis codes listed on claims
claims
Future work: Duplicate claim investigation, Doctor-Hospital Network analysis, studying patterns in beneficiaries,
and conducting a market basket analysis.
THANK YOU!