Professional Documents
Culture Documents
Import As Import As: "Data - Cleaning - CSV"
Import As Import As: "Data - Cleaning - CSV"
Out[25]:
Make Colour Odometer (KM) Doors Price
In [26]: df.isna().sum()
Out[26]: Make 1
Colour 1
Odometer (KM) 4
Doors 1
Price 2
dtype: int64
Data Preprocessing
Just like in the above data frame if we only have missing data in the 2nd row, we might
and drop the row. But we currently have large no.of missing data. Thus, this method is
not viable
http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 1 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM
In [48]: df = pd.read_csv('data_cleaning.csv')
df.head()
Out[48]:
Make Colour Odometer (KM) Doors Price
In [49]: df.fillna(0.0).head()
Out[49]:
Make Colour Odometer (KM) Doors Price
In [50]: df = pd.read_csv('data_cleaning.csv')
df.head()
Out[50]:
Make Colour Odometer (KM) Doors Price
http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 2 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM
In [52]: df.columns[0:2]
In [54]: df.head()
Out[54]:
Make Colour Odometer (KM) Doors Price
In [55]: df = pd.read_csv('data_cleaning.csv')
df.head()
Out[55]:
Make Colour Odometer (KM) Doors Price
http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 3 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM
In [56]: df.fillna(df.mode().iloc[0])
Out[56]:
Make Colour Odometer (KM) Doors Price
The missing values has been replaced by the most frequent values.
In [ ]:
http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 4 of 5
AP19110010012_Lab Assignment 4 - Jupyter Notebook 05/09/21, 4:11 PM
http://localhost:8888/notebooks/AP19110010012_Lab%20Assignment%204.ipynb Page 5 of 5