ML ABA On ZOMATO Data Analysis

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

MACHINE LEARNING

SUBJECT CODE: 20CS61

ACTIVITY BASED ASSESSMENT

TOPIC: ZOMATO DATA ANALYSIS

SUBMITTED BY:

CHINMAYI H 4VV20CS024

FACULTY NAME:

DR. PARAMESHA K
INTRODUCTION

PROBLEM STATEMENT:
TO DETERMINE WHETHER OR NOT A PERSON WILL HAVE A STROKE
BASED ON HIS/HER HISTORICAL MEDICAL INFORMATION

IMPLEMENTATION

 We have implemented this model in jupyter.


 First we import the libraries such as numpy as np pandas as pd matplotlib.pyplot as
plt seaborn as sns
 Create a csv file which has training dataset with attributes like : id ,gender, age ,
hypertension , heart_disease, ever_married ,work_type ,Residence_type,
avg_glucose_level , bmi, smoking_status and the target attribute is stroke
 Now we preprocess the data.
 Then treat the missing value using the mean.
 Drop the unnecessary columns.
 Check for outliers.
 After finding the distribution of stroke prediction for each attribute
 We split the data into independent and dependent variables
 Apply logistic regression and evaluate
 Then apply KNN classifier and decision tree algorithm
 Evaluate the decision tree
 Plot the tree

DATASET
.

.
.

DATA PREPROCESSING CODES


As we can see BMI has 201
missing values so,
DATA ANALYSIS
SIMILARLY WE APPLY THE DATA ANALYSIS ON EACH ATTRIBUTE WE FIND
THE DISTRIBUTION OF STROKE FOR EACH ATTRIBUTE.

By the data analysis we find that

 Based on ‘Ever Married’ – unmarried are less prone to strokes


 Based on ‘Work Type’ – private jobs has most no. of people prone to stroke followed
by self employed followed by govt jobs.
 Based on ‘Smoking’ – the one who never smoke would most likely never have stroke
Encoding the categorical variables

Splitting data into independent and dependent variables

You might also like