Exploratory Data Analysis - Python June 16th, 2019

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 9

#LifeKoKaroLift

Data Science Certification


Program

16/06/2019
Course : Machine Learning
Edit Master text styles
Lecture On : EDA
Instructor : BIKASH

23/05/19 2
Why EDA is very important?

Exploratory Data Analysis and Data Preprocessing comprises


of 80-90% of time and resources in Machine Learning
Projects

If EDA and Data Pre-processing not properly not done, guess


what, ML results will be also no better.

16/06/2019 3
Data Science Certification
Today’s Agenda
1 Familiarity with the problem
2 What should we explore?
3 Data to Intuition
4 Data Distribution
5 Handling Categorical vs Numerical Features
6 Handling missing values, outliers and duplicates
7 Feature Engineering

16/06/2019
Data Science Certification 4
Poll 1

1. Can we take the data and directly put in Machine Learning


algorithm?

Option 1 - Yes
Option 2 - No

23/05/19 5
Poll 2

2. Can we just drop any records which has null values?

Option 1 - Yes
Option 2 - No

23/05/19 6
Poll 3

3. Why do we need to convert Categorical features into Numerical?

Option 1 - Most ML algorithm only takes numerical values


Option 2 - There is no requirement to convert them
Option 3 - If we convert then processing time will be less
Option 4 - It’s an optional step

23/05/19 7
Poll 4

4. Why feature engineering is important?

Option 1 - It’s again an optional step


Option 2 - Feature Engineering helps in getting zero error in both
train and test data
Option 3 - It helps in understanding better the underlying data
pattern and importance of the variables
Option 4 - Feature engineering is done when data volume is low

23/05/19 8
Further explanation

ANOVA Method by hand -

Link 1 - https://www.youtube.com/watch?v=fNsGo_HM8bA

Facebook Group:

Search - Data Science with Python

16/06/2019 9

You might also like