Que.37210q4ckcDMPM Lab 03 -Data Cleaning and Preprocessing (1)

Uploaded by

maithili9100

0% found this document useful (0 votes)

1 views1 page

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

1 views1 page

Que.37210q4ckcDMPM Lab 03 -Data Cleaning and Preprocessing (1)

Uploaded by

maithili9100

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 1

Search inside document

Data Mining & Predictive Modelling

Lab Assign-03- Data Pre-processing & Data Cleaning Functions

●
Read the datasets "ToyotaCorolla.csv" and "ToyotaCorolla-Dirty.csv" as per the
requirements.
●
Test following data pre-processing and data cleaning functions
●
You need to install following packages and use following libraries

 install.packages("tidyverse") install.packages("Hmisc")

 library(tidyverse) library(dplyr) library(ggplot2) library(Hmisc)

 Save your R command file.

 Save your code and results in a word file and submit it on the VOLP.

 FUNCTIONS

 rename() - for rename columns

 select() - for selecting columns
 slice() - for selecting rows
 sample_n() -- for sampling rows
 arrange() -- for arranging rows by column values
 filter() -- for filtering rows
 mutate() -- for transforming values in a field
 is.na() --- find missing data
 sum(is.na()) -- find number of missing values
 mean(is.na()) -- find % of missing values
 na.omit() -- deleting incomplete observations
 mean(<column>, na.rm = TRUE) -- find mean leaving records with missing data
 complete.cases() --- to detect rows with complete data
 clean=na.omit() --- to remove incomplete rows or data
 glimpse(data) -- summarise dataset
 boxplot.stats() - for detecting outlier stats
 boxplot() - for comparing boxplots
 boxplot.stats()$out -- for setting outlier detection limit

Numerical imputation
 dirty$column[is.na(column)] <- mean(column, na.rm = TRUE) --- mean imputation
 dirty$column <- impute(dirty$column, fun = mean) # mean imputation
 dirty$column <- impute(dirty$column, fun = median) # median imputation

You are required to pre-process 1 more data-set of your choice

** ** **

Lab-2 Data Cleaning and Preprocessing
Document1 page
Lab-2 Data Cleaning and Preprocessing
moumitashopping0
No ratings yet
Aditya Garg DMDW
Document40 pages
Aditya Garg DMDW
Raj Nish
No ratings yet
Introduction To Python (Part III)
Document29 pages
Introduction To Python (Part III)
Subhradeep Pal
No ratings yet
Lab Manual Page No 1
Document32 pages
Lab Manual Page No 1
R.R.Rao
No ratings yet
Datasets
Document40 pages
Datasets
Asmatullah Khan
No ratings yet
Lecture 1
Document167 pages
Lecture 1
Ny Sata Andrianirina
No ratings yet
CRM Cheat Sheet
Document7 pages
CRM Cheat Sheet
Kurozato Candy
No ratings yet
Uipath Guide PDF
Document6 pages
Uipath Guide PDF
Ibrahim Syed
No ratings yet
Intro R
Document38 pages
Intro R
bhyjed35
No ratings yet
Introduction To R
Document36 pages
Introduction To R
Refael Lav
No ratings yet
10-Visualization of Streaming Data and Class R Code-10!03!2023
Document19 pages
10-Visualization of Streaming Data and Class R Code-10!03!2023
G Krishna Vamsi
No ratings yet
MCP Lab-2023 ContentForPythonLibrariesTopic
Document9 pages
MCP Lab-2023 ContentForPythonLibrariesTopic
Nihad Ahmed
No ratings yet
Data Handing Using Pandas-I
Document46 pages
Data Handing Using Pandas-I
Priyanka Porwal Jain
100% (2)
Java8features Streamapi
Document13 pages
Java8features Streamapi
mohammed.rana
No ratings yet
Quandl - R Cheat Sheet
Document4 pages
Quandl - R Cheat Sheet
seanrwcrawford
No ratings yet
List of Codes
Document2 pages
List of Codes
kalsi2rav
No ratings yet
Class XII Data Handlinng Using PandasI
Document46 pages
Class XII Data Handlinng Using PandasI
Pushkar Chaudhary
No ratings yet
Numpy&pandas
Document17 pages
Numpy&pandas
Saif Ali Khan
No ratings yet
Data Handling Using Pandas-I-ORG
Document44 pages
Data Handling Using Pandas-I-ORG
Ayush Kumar
No ratings yet
Functions in R Sem-III 2021 PDF
Document30 pages
Functions in R Sem-III 2021 PDF
rajveer shah
No ratings yet
12 Pandas
Document9 pages
12 Pandas
Sanchit Singla
No ratings yet
Python Data Analysation and Visualisation
Document4 pages
Python Data Analysation and Visualisation
Jagmohan
No ratings yet
R Programming Cheat Sheet: by Via
Document2 pages
R Programming Cheat Sheet: by Via
Kimondo King
No ratings yet
Lab 2 - Lists, Vectors, Dataframes
Document16 pages
Lab 2 - Lists, Vectors, Dataframes
MANYAPURI POORNA CHANDER VU21MATH0400045
No ratings yet
Commands SQL, Python (BASICS)
Document7 pages
Commands SQL, Python (BASICS)
Kuldeep Gangwar
No ratings yet
Chapter 3 Data Management in R
Document12 pages
Chapter 3 Data Management in R
nailofar
No ratings yet
CAMERON, C. e TRIVEDI, P.K. Microeconometrics Using Stata. Cambridge: CUP, 2010
Document3 pages
CAMERON, C. e TRIVEDI, P.K. Microeconometrics Using Stata. Cambridge: CUP, 2010
ari araujo
No ratings yet
Introduction To Data Science With R Programming
Document91 pages
Introduction To Data Science With R Programming
Vimal Kumar
No ratings yet
Data Science
Document60 pages
Data Science
Arya
100% (1)
Java - Streams
Document10 pages
Java - Streams
bikashghoshh41
No ratings yet
Introduction To Numpy Pandas and Matplotlib
Document2 pages
Introduction To Numpy Pandas and Matplotlib
rk73462002
No ratings yet
PL SQL Questions
Document193 pages
PL SQL Questions
Raja Mohan
100% (1)
What Is A Data Structure?: Data Structures in Data Science
Document24 pages
What Is A Data Structure?: Data Structures in Data Science
Meghna Choudhary
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
Document13 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
sam egoroff
No ratings yet
Rstudio Study Notes For PA 20181126
Document6 pages
Rstudio Study Notes For PA 20181126
Trong Nghia Vu
No ratings yet
Exercise and Experiment 3
Document14 pages
Exercise and Experiment 3
h8792670
No ratings yet
SQL
Document111 pages
SQL
shamanth br
100% (1)
3 Data-Manipulation
Document47 pages
3 Data-Manipulation
Pratyush Jain
No ratings yet
R Syntax Examples 1
Document6 pages
R Syntax Examples 1
Pedro Cruz
No ratings yet
Python For DS Cheat Sheet
Document6 pages
Python For DS Cheat Sheet
Sebastián Emdef
100% (2)
Pandas in Python
Document59 pages
Pandas in Python
Aamna Raza
No ratings yet
DataFrame Statistics
Document41 pages
DataFrame Statistics
rohan jha
No ratings yet
#Create Vector of Numeric Values #Display Class of Vector
Document10 pages
#Create Vector of Numeric Values #Display Class of Vector
Anooj Srivastava
No ratings yet
CH 2
Document7 pages
CH 2
Rashi Mehta
No ratings yet
Python Pandas Demo PDF
Document23 pages
Python Pandas Demo PDF
Rakshit Kukreja
100% (2)
Data Ingestion Using Unifi
Document30 pages
Data Ingestion Using Unifi
Kartik Sharma
No ratings yet
An Introduction To R: Alessio Farcomeni
Document38 pages
An Introduction To R: Alessio Farcomeni
Valerio Zarrelli
No ratings yet
Miller
Document9 pages
Miller
Loku Bappa
No ratings yet
Imp Details
Document6 pages
Imp Details
Jyotirmay Sahu
No ratings yet
Statistics With R Programming For Bigdata (Autosaved)
Document41 pages
Statistics With R Programming For Bigdata (Autosaved)
rohithmahendran1305
No ratings yet
Data Wrangling PDF
Document14 pages
Data Wrangling PDF
Edgar Guerrero
No ratings yet
Exploratory Data Analysis
Document5 pages
Exploratory Data Analysis
Cyd Duque
No ratings yet
R Programming For NGS Data Analysis
Document5 pages
R Programming For NGS Data Analysis
Abcd
No ratings yet
R Programming - Lecture3
Document30 pages
R Programming - Lecture3
Azuyi Xr
No ratings yet
RSQLML Final Slide 15 June 2019 PDF
Document196 pages
RSQLML Final Slide 15 June 2019 PDF
Thanthirat Thanwornwong
No ratings yet
Combining Spatial Data in R
Document6 pages
Combining Spatial Data in R
bord02
No ratings yet
Daftar Lampiran: Music Signal Analysis
Document7 pages
Daftar Lampiran: Music Signal Analysis
jeremi kucing
No ratings yet
R Lab Program
Document21 pages
R Lab Program
Sachin Shimogha
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet