Welcome to Scribd!

Import As Import As: "Data - Cleaning - CSV"

Uploaded by

0% found this document useful (0 votes)

8 views5 pages

The document discusses different techniques for handling missing data in a dataset: 1. Ignoring rows with a small number of missing values. 2. Filling in missing values manually. 3. Using a global constant like 0 to replace missing values. 4. Taking the mean of existing values in a column to fill in missing values of integer/continuous variables. 5. Using the most frequent existing value in a column to fill in missing categorical variables. The techniques of taking the mean, median or mode are generally better approaches than using a constant when the amount of missing data is large.

Original Description:

Original Title

Lab 4

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

8 views5 pages

Import As Import As: "Data - Cleaning - CSV"

Uploaded by

Anjana Maganti

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 5

Search inside document

AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM

In [25]: import pandas as pd

import numpy as np
df = pd.read_csv("data_cleaning.csv")
df.head()

Out[25]:
Make Colour Odometer (KM) Doors Price

0 Honda White 35431.0 4.0 15323.0

1 BMW Blue NaN 5.0 19943.0

2 Honda White 84714.0 4.0 28343.0

3 Toyota White 154365.0 NaN 13434.0

4 Nissan Blue 181577.0 3.0 14043.0

Getting to know which and how many columns are null

In [26]: df.isna().sum()

Out[26]: Make 1
Colour 1
Odometer (KM) 4
Doors 1
Price 2
dtype: int64

Data Preprocessing

1. Ignoring the row if there are less number of missing datas.

Just like in the above data frame if we only have missing data in the 2nd row, we might
and drop the row. But we currently have large no.of missing data. Thus, this method is
not viable

2. Fill the missing data manually

In [46]: df.loc[1,['Odometer (KM)']] = 100000

In [47]: df.loc[1,['Odometer (KM)']]

Out[47]: Odometer (KM) 100000

Name: 1, dtype: object

This method is not eﬀective if large no.of missing data is present.

http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 1 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM

3. Using global constant to replace the missing values.

In [48]: df = pd.read_csv('data_cleaning.csv')
df.head()

Out[48]:
Make Colour Odometer (KM) Doors Price

0 Honda White 35431.0 4.0 15323.0

1 BMW Blue NaN 5.0 19943.0

2 Honda White 84714.0 4.0 28343.0

3 Toyota White 154365.0 NaN 13434.0

4 Nissan Blue 181577.0 3.0 14043.0

In [49]: df.fillna(0.0).head()

Out[49]:
Make Colour Odometer (KM) Doors Price

0 Honda White 35431.0 4.0 15323.0

1 BMW Blue 0.0 5.0 19943.0

2 Honda White 84714.0 4.0 28343.0

3 Toyota White 154365.0 0.0 13434.0

4 Nissan Blue 181577.0 3.0 14043.0

Here, we replaced the missing values with constant 0

4. We take mean to fill the missing values.

In [50]: df = pd.read_csv('data_cleaning.csv')
df.head()

Out[50]:
Make Colour Odometer (KM) Doors Price

0 Honda White 35431.0 4.0 15323.0

1 BMW Blue NaN 5.0 19943.0

2 Honda White 84714.0 4.0 28343.0

3 Toyota White 154365.0 NaN 13434.0

4 Nissan Blue 181577.0 3.0 14043.0

Using mean to fill missing value of integers

http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 2 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM

In [52]: df.columns[0:2]

Out[52]: Index(['Make', 'Colour'], dtype='object')

In [53]: for i in df.columns[2:4]:

df[i].fillna(int(df[i].mean()), inplace = True)

In [54]: df.head()

Out[54]:
Make Colour Odometer (KM) Doors Price

0 Honda White 35431.0 4.0 15323.0

1 BMW Blue 112890.0 5.0 19943.0

2 Honda White 84714.0 4.0 28343.0

3 Toyota White 154365.0 4.0 13434.0

4 Nissan Blue 181577.0 3.0 14043.0

5. We take most frequent to fill the missing values.

In [55]: df = pd.read_csv('data_cleaning.csv')
df.head()

Out[55]:
Make Colour Odometer (KM) Doors Price

0 Honda White 35431.0 4.0 15323.0

1 BMW Blue NaN 5.0 19943.0

2 Honda White 84714.0 4.0 28343.0

3 Toyota White 154365.0 NaN 13434.0

4 Nissan Blue 181577.0 3.0 14043.0

Using most frequent value from each column

http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 3 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM

In [56]: df.fillna(df.mode().iloc[0])

Out[56]:
Make Colour Odometer (KM) Doors Price

0 Honda White 35431.0 4.0 15323.0

1 BMW Blue 17119.0 5.0 19943.0

2 Honda White 84714.0 4.0 28343.0

3 Toyota White 154365.0 4.0 13434.0

4 Nissan Blue 181577.0 3.0 14043.0

5 Honda Red 42652.0 4.0 23883.0

6 Toyota Blue 163453.0 4.0 8473.0

7 Honda White 17119.0 4.0 20306.0

8 Honda White 130538.0 4.0 9374.0

9 Honda Blue 51029.0 4.0 26683.0

10 Nissan White 167421.0 4.0 6010.0

11 Nissan Green 17119.0 4.0 6160.0

12 Nissan White 102303.0 4.0 16909.0

13 BMW White 134181.0 4.0 11121.0

14 Honda Blue 199833.0 4.0 18946.0

15 Toyota Blue 17119.0 4.0 16290.0

16 Toyota Red 96742.0 4.0 34465.0

17 BMW White 194189.0 5.0 17177.0

18 Nissan White 67991.0 3.0 6010.0

19 Nissan Blue 17119.0 4.0 6010.0

20 Toyota Green 124844.0 4.0 24130.0

21 Honda White 30615.0 4.0 29653.0

22 Toyota White 148744.0 4.0 22489.0

23 Honda Green 130075.0 4.0 21242.0

The missing values has been replaced by the most frequent values.

In [ ]:

http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 4 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM

http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 5 of 5

DDEC III AND IV Cable Diagrams
Document5 pages
DDEC III AND IV Cable Diagrams
ScribdTranslations
No ratings yet
SMDM Final - Jupyter Notebook
Document17 pages
SMDM Final - Jupyter Notebook
Deepak Mahindra
100% (1)
FRA Milestone 1 Jupyter Notebook PDF
Document42 pages
FRA Milestone 1 Jupyter Notebook PDF
Sravan
100% (3)
Python 3 - A Comprehensive Guide
Document16 pages
Python 3 - A Comprehensive Guide
Anjana Maganti
No ratings yet
Piecing Me Together
Document5 pages
Piecing Me Together
api-518180289
100% (1)
Part B - Program 1
Document3 pages
Part B - Program 1
chethan rohith PC
No ratings yet
Gokul
Document10 pages
Gokul
computerg00007
No ratings yet
Model
Document164 pages
Model
Sanjay
No ratings yet
Data Wrangling
Document24 pages
Data Wrangling
Luiz Arthur Medeiros
No ratings yet
Data Wrangling
Document24 pages
Data Wrangling
Jackie Yong
No ratings yet
GT - Assignment
Document8 pages
GT - Assignment
Md Asif
No ratings yet
Introduction To R Program and Output
Document6 pages
Introduction To R Program and Output
alexgamaqs
No ratings yet
Ekomet Jawaban CH 2
Document8 pages
Ekomet Jawaban CH 2
valen miranda
No ratings yet
Data Acquisition Python
Document12 pages
Data Acquisition Python
Odiseo Py
No ratings yet
Project On Police Data
Document4 pages
Project On Police Data
Deepesh Yadav
No ratings yet
Fdatool Tutorial
Document18 pages
Fdatool Tutorial
hachan
No ratings yet
TA - Uas Kelompok
Document9 pages
TA - Uas Kelompok
Qolbi Hakim
No ratings yet
Practical-5 - Jupyter Notebook
Document8 pages
Practical-5 - Jupyter Notebook
Harsha Gohil
100% (1)
ML LAB 12 - Jupyter Notebook
Document11 pages
ML LAB 12 - Jupyter Notebook
rishitha
No ratings yet
Ghost Err
Document6 pages
Ghost Err
ชัยวัฒน์ สินธุมา
No ratings yet
05 Data Loading, Storage and Wrangling-1
Document22 pages
05 Data Loading, Storage and Wrangling-1
FucKerWengie
No ratings yet
AP19110010030 Assignment-4 Lab
Document9 pages
AP19110010030 Assignment-4 Lab
Sravan Kilaru AP19110010030
No ratings yet
Pandas
Document16 pages
Pandas
lalkrishna123
No ratings yet
Dataset and Visualization: Ames Set UCI Machine Learning Datasets (Https://archive - Ics.uci - Edu/ml/index - PHP)
Document4 pages
Dataset and Visualization: Ames Set UCI Machine Learning Datasets (Https://archive - Ics.uci - Edu/ml/index - PHP)
Hamed Gholami
No ratings yet
Load Dataset: Import As
Document8 pages
Load Dataset: Import As
ZESTY
No ratings yet
ABC Costing Classic Pen Case
Document3 pages
ABC Costing Classic Pen Case
NISHANT KATKAR
No ratings yet
GD1250
Document85 pages
GD1250
Jcarlos Jaramillo
0% (1)
Tolerance For Thread
Document15 pages
Tolerance For Thread
thilipkumar
No ratings yet
Equipment: MS Software: Procedure To Adjust The Step Range of Cmyd Key Correction
Document2 pages
Equipment: MS Software: Procedure To Adjust The Step Range of Cmyd Key Correction
radisa69q
No ratings yet
Petroleum Price Prediction Models - Colaboratory
Document21 pages
Petroleum Price Prediction Models - Colaboratory
sethantanah
No ratings yet
Verilog 1
Document43 pages
Verilog 1
Ahmad Shehroz KaYani
No ratings yet
Pandas PD: File PD Read - CSV File Head
Document10 pages
Pandas PD: File PD Read - CSV File Head
Abhijeet Dubey
No ratings yet
DEF File Description
Document5 pages
DEF File Description
chompink6900
No ratings yet
Lab1 Features Selections-Class-GI2
Document25 pages
Lab1 Features Selections-Class-GI2
Oussama Souissi
No ratings yet
Assignment 3 Customer
Document3 pages
Assignment 3 Customer
Akshata Chopade
No ratings yet
LCD Module
Document32 pages
LCD Module
Muhammad Ikhsan Prajarani
No ratings yet
Grafik
Document4 pages
Grafik
Felix Tandano
No ratings yet
Decision Tree: 1 Decession Tree Classifier Urban or Not Urban
Document35 pages
Decision Tree: 1 Decession Tree Classifier Urban or Not Urban
Jown Abbas
No ratings yet
MM SQ.: San Ace 40L
Document2 pages
MM SQ.: San Ace 40L
Siti Zuraidah A Razak
No ratings yet
Diagnostic Software (Archive) - Page 15 - Bimmerforums - The Ultimate BMW Forum
Document3 pages
Diagnostic Software (Archive) - Page 15 - Bimmerforums - The Ultimate BMW Forum
john larson
100% (2)
Chapter 5
Document6 pages
Chapter 5
Saleh
No ratings yet
Linear Regression (Cellphone - Prices)
Document14 pages
Linear Regression (Cellphone - Prices)
Ali Yaqoob
No ratings yet
Data Pre Processing
Document2 pages
Data Pre Processing
rk73462002
No ratings yet
PCA Problem Statement With Answer
Document22 pages
PCA Problem Statement With Answer
SBS Movies
No ratings yet
Flyer - MSZ AP Series 2022 02
Document5 pages
Flyer - MSZ AP Series 2022 02
ShuaibSolomon
No ratings yet
Acer Al1714 SM
Document48 pages
Acer Al1714 SM
Ximo Gracia Bertolin
No ratings yet
Black Cat Blades LTD.: Bulldozers
Document95 pages
Black Cat Blades LTD.: Bulldozers
John Gonzalez
No ratings yet
Car Price Prediction
Document72 pages
Car Price Prediction
HugoPrieto2
No ratings yet
Data Viewer
Document3 pages
Data Viewer
Anonymous FZNn6rB
No ratings yet
Goal - Perform Eda Over A Dataset 'Samplesuperstore' Goal - Perform Eda Over A Dataset 'Samplesuperstore'
Document9 pages
Goal - Perform Eda Over A Dataset 'Samplesuperstore' Goal - Perform Eda Over A Dataset 'Samplesuperstore'
nishant
No ratings yet
Guidelines For Drawing Layer Control
Document10 pages
Guidelines For Drawing Layer Control
Thinh Duy
No ratings yet
Tambo de Mora - Report
Document12 pages
Tambo de Mora - Report
Alberto Chávez Angeles
No ratings yet
Decision Tree Regressor and Ensemble Techniques - Regressors
Document18 pages
Decision Tree Regressor and Ensemble Techniques - Regressors
S A
No ratings yet
Exploratory Data Analysis
Document22 pages
Exploratory Data Analysis
jdmarin
No ratings yet
MCDM Topsis: Amit Prakash Jha
Document26 pages
MCDM Topsis: Amit Prakash Jha
NANDINI GUPTA
No ratings yet
Flyer - MSY GW Series Cooling Only 2022 11
Document4 pages
Flyer - MSY GW Series Cooling Only 2022 11
lio mare
No ratings yet
HD Tune Pro Manual
Document31 pages
HD Tune Pro Manual
Deden Sureden
No ratings yet
Machine_Learning_Stock_Time_Series__1700932258
Document21 pages
Machine_Learning_Stock_Time_Series__1700932258
Luis Martin Valenzuela Leyva
No ratings yet
ARM Evaluation System: Reference Manual
Document36 pages
ARM Evaluation System: Reference Manual
around_2
No ratings yet
Task 6
Document14 pages
Task 6
Dương Vũ Minh
No ratings yet
Calibrated HD and SD Settings For LE40A656
Document5 pages
Calibrated HD and SD Settings For LE40A656
Rafael Adrian
No ratings yet
Data Smart: Using Data Science to Transform Information into Insight
From Everand
Data Smart: Using Data Science to Transform Information into Insight
Jordan Goldmeier
Rating: 4 out of 5 stars
4/5 (16)
OS Assignment
Document2 pages
OS Assignment
Anjana Maganti
No ratings yet
Sample Questions For Mock, Practise, Assessment
Document9 pages
Sample Questions For Mock, Practise, Assessment
Anjana Maganti
No ratings yet
Bellman Ford Algorithm: Title
Document10 pages
Bellman Ford Algorithm: Title
Anjana Maganti
No ratings yet
Mock Interview Prep
Document9 pages
Mock Interview Prep
Anjana Maganti
No ratings yet
O (T - COND (T) ) : Name 'Tina'
Document5 pages
O (T - COND (T) ) : Name 'Tina'
Anjana Maganti
No ratings yet
Online Groceries: Title
Document2 pages
Online Groceries: Title
Anjana Maganti
No ratings yet
Unit 2-Part2
Document49 pages
Unit 2-Part2
Anjana Maganti
No ratings yet
Object-Oriented Programming Lab Problems
Document6 pages
Object-Oriented Programming Lab Problems
Anjana Maganti
No ratings yet
Water Resources: Chapter 5 L9-10
Document23 pages
Water Resources: Chapter 5 L9-10
Anjana Maganti
No ratings yet
Data Base Management System Exercise 1
Document7 pages
Data Base Management System Exercise 1
Anjana Maganti
No ratings yet
Topology: Computer Networks (Introduction)
Document12 pages
Topology: Computer Networks (Introduction)
Anjana Maganti
No ratings yet
All Columns All Rows Operation Example No (Names of Column Is No (Some Condition) No (Names of Column Is Yes Yes (Name of Column Is
Document11 pages
All Columns All Rows Operation Example No (Names of Column Is No (Some Condition) No (Names of Column Is Yes Yes (Name of Column Is
Anjana Maganti
No ratings yet
Chapter 2 Synchronous Machinesppt
Document53 pages
Chapter 2 Synchronous Machinesppt
rayghar
No ratings yet
Group Discussion Worksheet
Document2 pages
Group Discussion Worksheet
NE AO
No ratings yet
Xenon 2.2 4x2 148HP DC PDF
Document4 pages
Xenon 2.2 4x2 148HP DC PDF
shiksha toroo
No ratings yet
Monica Ward Thesis
Document7 pages
Monica Ward Thesis
candacedaiglelafayette
100% (2)
DSI Valves: Forged Steel Carbon, Stainless and Alloy Gate, Globe and Check Valves
Document32 pages
DSI Valves: Forged Steel Carbon, Stainless and Alloy Gate, Globe and Check Valves
Jordan Magaña Morales
No ratings yet
Incessant Technologies, An NIIT Technologies Company, Becomes A Pegasystems Inc. Systems Integrator Partner (Company Update)
Document2 pages
Incessant Technologies, An NIIT Technologies Company, Becomes A Pegasystems Inc. Systems Integrator Partner (Company Update)
Shyam Sunder
No ratings yet
Nlsiu Repugnancy
Document25 pages
Nlsiu Repugnancy
simran yadav
No ratings yet
Docit - Tips - Individual Performance Commitment and Review Ipcr Form PDF
Document2 pages
Docit - Tips - Individual Performance Commitment and Review Ipcr Form PDF
Edelyn Lindero Ambos
50% (2)
Introduction To Voice Over Internet Protocol (Voip) : Mohammad Qahir Wardak Legal & Licensing Manager
Document20 pages
Introduction To Voice Over Internet Protocol (Voip) : Mohammad Qahir Wardak Legal & Licensing Manager
sltharindu_pluz
No ratings yet
Artificial Retina: Presented by
Document6 pages
Artificial Retina: Presented by
Sai Krishna
No ratings yet
Senior High School (Core) 2 Semester Quarter 3 Module 5: News and Media
Document24 pages
Senior High School (Core) 2 Semester Quarter 3 Module 5: News and Media
Chrisella Dee
33% (3)
Eta Line 150-250
Document32 pages
Eta Line 150-250
tafseerahmad
No ratings yet
Metaverse Tourism Conceptual Framework and Research Propositions
Document8 pages
Metaverse Tourism Conceptual Framework and Research Propositions
Marc Moner
No ratings yet
Specification of Combine Harvester
Document18 pages
Specification of Combine Harvester
Impang Kichu
No ratings yet
Reduction Beta Gamma
Document7 pages
Reduction Beta Gamma
Bhushan Chaudhary
100% (1)
Consumer Behaviour Mod II
Document50 pages
Consumer Behaviour Mod II
Harinder Singh
No ratings yet
Bodasurudabix
Document3 pages
Bodasurudabix
tom.quang6112
No ratings yet
B23 PDF
Document3 pages
B23 PDF
Hector Silva
No ratings yet
Giesekus Model
Document18 pages
Giesekus Model
vasopoli1
No ratings yet
Anti Dumping Duty
Document5 pages
Anti Dumping Duty
yeshanew
No ratings yet
Per 3 Science7
Document2 pages
Per 3 Science7
Maria Cristina Pol
No ratings yet
ANP 202 Principles of Animal Production PDF
Document207 pages
ANP 202 Principles of Animal Production PDF
BELKYS
100% (1)
Techciti: Managed Services
Document6 pages
Techciti: Managed Services
Techciti Technologies
No ratings yet
Iterative Business Model Innovation A Conceptual Proc - 2023 - Journal of Busin
Document14 pages
Iterative Business Model Innovation A Conceptual Proc - 2023 - Journal of Busin
tranhungduc1995
No ratings yet
Maths Connect 3R Resourcebank-Pack
Document176 pages
Maths Connect 3R Resourcebank-Pack
musman1977
No ratings yet
Tube Clamp
Document113 pages
Tube Clamp
Amitava Sil
No ratings yet
Simatic Manager
Document19 pages
Simatic Manager
NabilBouabana
100% (2)
Finalised Personal Statement
Document2 pages
Finalised Personal Statement
api-702252927
No ratings yet
Investment in Gold
Document4 pages
Investment in Gold
ERIN KRISTI
No ratings yet