Professional Documents
Culture Documents
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
COST PREDICTION
SYSTEM
Dharesh Bahety
EN18EL301057
Under the Guidance of
Mr. Parag Ravekar Sir
Introduction
Table of Contents
Literature Review
Objectives
Methodology
Tool used in the project
Work done till date
Summary
Reference
INTRODUCTION
People assume Medical Insurance costs to be high due to which they avoid taking
Medical Insurance for themselves and their family, though it has become a necessity
now. This project designs an automatic system that can predict what the medical
insurance cost of a person will be.
In classification, learning algorithms takes the input data and map the output to a
discrete output like True or False. In regression, learning algorithms maps the input
data to continuous output like weight, cost, etc.
• This will help motivate parents to insure their children from early age , as
Insurance prices are low for infants and children compared to adults.
• People are free to decide which policy they wish to buy and estimate cost of
the policy, which is important to decide the coverage they will get and wish to
have.
Linear Regression Model
• Linear regression attempts to model the relationship between two variables by
fitting a linear equation to observed data. One variable is considered to be an
independent variable, and the other is considered to be a dependent variable.
• Let us understand the algorithm to design linear regression through an
example. Say we have a data set of years of experience and salary per year as
shown in fig. We plot the data in a graph and try to determine the best fit (blue
line in fig 2 ).
•Since it is a linear graph, so best fit will be in the form of Y = mX + c . Slope
is determined through given formula, m = (y2 – y1)/(x2 – x1). So here slope
will be m = =
And intercept will be
c= 200000
So our equation for best comes out to be Y = 200000 (X+1)
Once we determine the slope and intercept, we need to design a function that
will replicate the functionality of the best fit, and hence helps in prediction.
Fig: Example of linear regression curve
METHODOLOGY
Model Design
Testing of Prediction of
Input data insurance cost
trained model
Tools Used in the Project
• Python 3: Python is an interpreted high-level general-purpose programming language.
The language constructs as well as its object-oriented approach aim to help
programmers write clear, logical code for small and large-scale projects.
Collected raw data set and Uploaded .csv file on IDE. Found the dependencies to be imported
which are Libraries and functions needed where our libraries are numpy,pandas,matplot,seaborn. The
inbuilt functions used are train_test_split and LinearRegression. The collected data was then analysed
and then we plotted graphs of the analysed dataset, a few examples of which are Distribution of age,
gender, BMI etc as mentioned above.
Then I had carried out data pre-processing which makes raw data compatible for Machine Learning
Algorithm. Then splitting data into Training and Testing data. Then had trained the machine using
Training data and evaluate the performance using Test data. This was fed to our Machine Learning
model which makes it a Trained model. Now the trained model will give Estimate Insurance cost as
output based on input data.
Conclusion
The conclusion of this project is to use the designed system to predict the
Medical Insurance Cost of an Individual depending on their input parameters.
This model gives high accuracy and hence is good to be adopted in the field of
health care and insurance sector.
REFERENCE
Demsar J. “Statistical comparisons of classifiers over multiple data sets”. The Journal of Machine
Learning Research. 2020;7:1–30
Mohammad Amin Morid,Kensaku Kawamoto, Travis Ault,Josette Dorius,Samir Abdelrahman
”Supervised Learning Methods for Predicting Healthcare Costs” David Eccles School of Business,
University of Utah PMCID: PMC5977561 2020
Duncan I, Loginov M, Ludkovski M. Testing Alternative Regression Frameworks for Predictive
Modeling of Health Care Costs. North American Actuarial Journal. 2019
Pradeep kr, Naveen Aradhya “ A Collective Study of Machine Learning (ML) Algorithms with Big
Data Analytics (BDA) for Healthcare Analytics (HcA)” International Journal of Emerging Trends
2018
THANK YOU