Welcome to Scribd!

Skip carousel

Cleaning and Preparing Data

Uploaded by

Soban Maruf

0% found this document useful (0 votes)

3 views12 pages

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

3 views12 pages

Cleaning and Preparing Data

Uploaded by

Soban Maruf

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 12

Search inside document

Cleaning and Preparing

Data
TE AIML (Hon)
Athang Joshi

Athang Joshi 1
What is data cleaning?

• Process of fixing or removing incorrect, corrupted, incorrectly formatted,

duplicate, or incomplete data within a dataset.
• Combining multiple data sources causes duplication or mislabelling of data.
• If data is incorrect, outcomes and algorithms are unreliable, even though they may
look correct.
• Data cleaning processes vary from dataset to dataset.
• A template should always be e stablished for data cleaning process.

Athang Joshi 2
Steps to clean data

• Remove duplicate or irrelevant observations

• Fix structural errors
• Filter unwanted outliers
• Handle missing data
• Validate and QA

Athang Joshi 3
Pratibha Sharma 4
Pratibha Sharma 5
Filter unwanted outliers

• Observations that do not appear to fit within the data.

• Should be removed, only if there is a legitimate reason.
• Sometimes, it is the appearance of an outlier that will prove a theory.
• Just because an outlier exists, doesn’t mean it is incorrect.
• If an outlier proves to be irrelevant for analysis or is a mistake, then only it should
be removed.

Pratibha Sharma 6
Fix structural errors

• Structural errors are strange naming conventions, typos, or incorrect

capitalization.
• For example, you may find “N/A” and “Not Applicable” both appear, but they
should be analyzed as the same category.

Athang Joshi 7
Handle missing data

• Many algorithms do not accept missing values.

• Observations that have missing values can be dropped. But this should be done
very carefully.
• Input missing values based on other observations. But then, there is an
opportunity to lose integrity of the data.
• The way the data is used can be altered to effectively navigate null values.

Athang Joshi 8
Validate and QA

• At the end of process, answers of following questions should be ready:

1) Does the data make sense?
2) Does the data follow the appropriate rules for its field?
3) Does it prove or disprove the working theory, or bring any insight to light?
4) Is the data giving any trends which are helpful for the next theory?
5) If not, is that because of a data quality issue?

Athang Joshi 9
Advantages and benefits of data cleaning

• Removal of errors when multiple sources of data are at play.

• Fewer errors make for happier clients and less-frustrated employees.
• Ability to map the different functions
• Monitoring errors and better reporting to see where errors are coming from,
making it easier to fix incorrect or corrupt data for future applications.
• Using tools for data cleaning makes for more efficient business practices and
quicker decision-making.

Athang Joshi 10
Characteristics of a quality data

• Validity (The degree to which the data conforms to defined rules or constraints)
• Accuracy (The data is close to the true values)
• Completeness (The degree to which all required data is known)
• Consistency (Data is consistent within the same dataset and/or across multiple
data sets)
• Uniformity (The degree to which the data is specified using the same unit of
measure)

Athang Joshi 11
Thank You!
(athangj@sies.edu.in)

Athang Joshi 12

Data Cleaning
Document6 pages
Data Cleaning
andrew stankovik
No ratings yet
General Data Analyst Interview Questions
Document7 pages
General Data Analyst Interview Questions
Meetanshi Gor
No ratings yet
Module 1 - PPT5 - Pre - Processing of Data
Document21 pages
Module 1 - PPT5 - Pre - Processing of Data
namma.wedding1806
No ratings yet
Data Pre-Processing Data Cleaning
Document13 pages
Data Pre-Processing Data Cleaning
Tanish Saajan
No ratings yet
Module 1 - Introduction To Data Analytics
Document21 pages
Module 1 - Introduction To Data Analytics
Harikrishna Vallapuneni
No ratings yet
Module 1
Document36 pages
Module 1
Mhd Aslam
No ratings yet
3 Persiapan Data Mining
Document83 pages
3 Persiapan Data Mining
icobes ur
No ratings yet
Data and Basic Stats Rev C 1-25 (Compatibility Mode)
Document85 pages
Data and Basic Stats Rev C 1-25 (Compatibility Mode)
SMAK
No ratings yet
DA Qns
Document8 pages
DA Qns
narakatlas1987
No ratings yet
SIA02 01 2023 DataAnalitics BA
Document11 pages
SIA02 01 2023 DataAnalitics BA
Muhammad Rafi
No ratings yet
Dwina DM 03 Persiapan 2018
Document82 pages
Dwina DM 03 Persiapan 2018
Hanny Febrii Elizabeth
No ratings yet
Kenny-230718-Top 60+ Data Analyst Interview Questions and Answers For 2023
Document39 pages
Kenny-230718-Top 60+ Data Analyst Interview Questions and Answers For 2023
vanjchao
No ratings yet
Data Mining-L3
Document22 pages
Data Mining-L3
Shanza Rehman
No ratings yet
Week 12
Document55 pages
Week 12
sirajquirish
No ratings yet
Module 2
Document53 pages
Module 2
Sri Karthik Avala
No ratings yet
02.data Preprocessing PDF
Document31 pages
02.data Preprocessing PDF
sunil
100% (1)
Be A 65 Ads Exp 3
Document6 pages
Be A 65 Ads Exp 3
Ritika dwivedi
No ratings yet
Data Cleansing Steps
Document8 pages
Data Cleansing Steps
Imane Loukili
No ratings yet
Datamining Unit 2 Part2
Document11 pages
Datamining Unit 2 Part2
Bhasutkar Mahesh
No ratings yet
Data Preparation
Document17 pages
Data Preparation
Joyce Choy
No ratings yet
55.how To Perform ML
Document16 pages
55.how To Perform ML
TariqMalik
No ratings yet
Chapter 2 Data Preprocessing
Document23 pages
Chapter 2 Data Preprocessing
liyu agye
No ratings yet
1-Introduction To Data Stage and Algorithm-22!05!2024
Document24 pages
1-Introduction To Data Stage and Algorithm-22!05!2024
Gaming world
No ratings yet
BECE352E Module 2
Document58 pages
BECE352E Module 2
zistavodro
No ratings yet
Data Analyst or Business Analyst
Document1 page
Data Analyst or Business Analyst
someshyjais.it
No ratings yet
Data Ethics Framework Part 3
Document25 pages
Data Ethics Framework Part 3
Ugur Kaplancali
No ratings yet
Romi DM 03 Persiapan Mar2016
Document82 pages
Romi DM 03 Persiapan Mar2016
Tri Indah Sari
No ratings yet
Session 7 - Data Preprocessing and Transformation - Thien Nguyen
Document33 pages
Session 7 - Data Preprocessing and Transformation - Thien Nguyen
Hao Nguyen Ngoc Anh
No ratings yet
15 Data Analyst Questions
Document9 pages
15 Data Analyst Questions
arasan77silambu
No ratings yet
Data Science Note
Document24 pages
Data Science Note
ejaz
No ratings yet
Smart Science To Improve Lives™
Document7 pages
Smart Science To Improve Lives™
Rakeshchowdary Singamaneni
No ratings yet
Data2 PDF
Document48 pages
Data2 PDF
Yao magao
No ratings yet
Data Analysis and Interpretation: Sherlys. Escandor
Document22 pages
Data Analysis and Interpretation: Sherlys. Escandor
Sherly Escandor
No ratings yet
1.data Cleaning Screening
Document21 pages
1.data Cleaning Screening
Sukhmani Sandhu
No ratings yet
Big Data Categories-Life Cycle
Document15 pages
Big Data Categories-Life Cycle
shweta sinha
No ratings yet
Eda
Document12 pages
Eda
Inspiring Evolution
100% (1)
XAI Basics
Document34 pages
XAI Basics
suman.singh251186
No ratings yet
Brodhead Presentation
Document37 pages
Brodhead Presentation
otba alomari
No ratings yet
Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
Document30 pages
Data Analysis From Theoretical To Implementation Using Excel, Python, Flourish
mohamed
No ratings yet
Data Processing and Data Analysis - 104910 1
Document8 pages
Data Processing and Data Analysis - 104910 1
Gurya
No ratings yet
Chapter 2 Notes
Document5 pages
Chapter 2 Notes
Emily Cleveland
No ratings yet
468 - DM Bok 2
Document157 pages
468 - DM Bok 2
alexander acosta
No ratings yet
Applying The Analytics Framework:: How To Get To Actionable Analytics Initiatives
Document2 pages
Applying The Analytics Framework:: How To Get To Actionable Analytics Initiatives
silhare_s
No ratings yet
10 AI Success Metric and Performance Indicators
Document30 pages
10 AI Success Metric and Performance Indicators
Shampa Nasrin
No ratings yet
Analyzing Data Qualitative Research - Revised
Document57 pages
Analyzing Data Qualitative Research - Revised
Joyz Tejano
No ratings yet
KMBN IT01 LM Consolidated
Document123 pages
KMBN IT01 LM Consolidated
sparsh sharma
No ratings yet
4.1 - Data Preprocessing
Document28 pages
4.1 - Data Preprocessing
mactabios23
No ratings yet
UNIT I - Introduction - DataScience - New
Document34 pages
UNIT I - Introduction - DataScience - New
Sid S
No ratings yet
System Life Cycle Presentation
Document41 pages
System Life Cycle Presentation
Sterling Acheampong
No ratings yet
Business Analytics
Document21 pages
Business Analytics
Dakshkohli31 Kohli
No ratings yet
Unit 3 Data Warehouse
Document17 pages
Unit 3 Data Warehouse
Vanshika Chauhan
No ratings yet
Week 1 - Introduction To Systems Analysis and Design
Document42 pages
Week 1 - Introduction To Systems Analysis and Design
Iulian Agapie
No ratings yet
Data Audit
Document8 pages
Data Audit
Vamsi Vasisht
No ratings yet
Part 3 - Fact Finding Techniques
Document49 pages
Part 3 - Fact Finding Techniques
NUR SHAFIAH SOLEHAH ZAIRUL HISHAM
No ratings yet
Data Cleaning, Integration, and Data Transformation Techniques
Document7 pages
Data Cleaning, Integration, and Data Transformation Techniques
samforresume
No ratings yet
To Data Analysis
Document14 pages
To Data Analysis
carole_lt7
No ratings yet
Chapter 4
Document20 pages
Chapter 4
You
No ratings yet
Coursera - Data Analytics - Course 4
Document6 pages
Coursera - Data Analytics - Course 4
Utjale
No ratings yet
Research Process: - Steps in Research - Data Sets Preparation - Experimental Research - Performance Evaluation
Document27 pages
Research Process: - Steps in Research - Data Sets Preparation - Experimental Research - Performance Evaluation
yekoyesew
No ratings yet
Data Cleaning: The Ultimate Practical Guide
From Everand
Data Cleaning: The Ultimate Practical Guide
Lee Baker
No ratings yet
b3 Plant Leaf Disease Detection
Document62 pages
b3 Plant Leaf Disease Detection
Soban Maruf
No ratings yet
Snowflake
Document1 page
Snowflake
Soban Maruf
No ratings yet
Exp 8
Document5 pages
Exp 8
Soban Maruf
No ratings yet
Participants
Document8 pages
Participants
Soban Maruf
No ratings yet
Ajio 1676447083099
Document1 page
Ajio 1676447083099
Soban Maruf
No ratings yet
Internet Programming Techneo Sem 5
Document241 pages
Internet Programming Techneo Sem 5
Soban Maruf
No ratings yet
Archaeological Institute of America American Journal of Archaeology
Document3 pages
Archaeological Institute of America American Journal of Archaeology
Joe B. Bryant
No ratings yet
PMP®+Certification+Training KJKLK KND K Laskdjkl
Document38 pages
PMP®+Certification+Training KJKLK KND K Laskdjkl
bencekatarina4
No ratings yet
David Taylor, Susan Balloch - The Politics of Evaluation - Participation and Policy Implementation (2005, Policy Press) PDF
Document279 pages
David Taylor, Susan Balloch - The Politics of Evaluation - Participation and Policy Implementation (2005, Policy Press) PDF
Carolina Bagattolli
No ratings yet
Ligmincha Europe Magazine # 20
Document56 pages
Ligmincha Europe Magazine # 20
Ton Bisscheroux
No ratings yet
Northouse8e PPT 05
Document22 pages
Northouse8e PPT 05
Grant imahara
No ratings yet
TSI LogDat2 Install Guide 2011re
Document25 pages
TSI LogDat2 Install Guide 2011re
Nathan Vo
No ratings yet
(Hendrix & Hunt) Using The Imago Dialogue To Deepen Cuouples Therapy
Document21 pages
(Hendrix & Hunt) Using The Imago Dialogue To Deepen Cuouples Therapy
ryanqiuy
No ratings yet
Detailed Lesson Plan in Music 1
Document7 pages
Detailed Lesson Plan in Music 1
Maria Victoria Dela Cruz
No ratings yet
VERSION 2 Lesson PLAN Beliefs Facts opinionsCOMPLETE!
Document4 pages
VERSION 2 Lesson PLAN Beliefs Facts opinionsCOMPLETE!
Elmer Pineda Guevarra
No ratings yet
Portfolio
Document10 pages
Portfolio
nilaya.1811
No ratings yet
Methods of Trigonometry by J E Hebborn
Document45 pages
Methods of Trigonometry by J E Hebborn
Tenson Chikumba
100% (1)
UIIC Assistant Result 2024
Document7 pages
UIIC Assistant Result 2024
allenimmanuel1973
No ratings yet
3 Module 1 - The Surgical Technologist Slides
Document37 pages
3 Module 1 - The Surgical Technologist Slides
sweetiepotamus
No ratings yet
Albanese Lawi
Document6 pages
Albanese Lawi
Prachurjo Duttaroy
No ratings yet
Grade 6 English Reading Using The Dictionary To Select The Appropriate Meaning From Several Meanings
Document6 pages
Grade 6 English Reading Using The Dictionary To Select The Appropriate Meaning From Several Meanings
Misyel Camposano
No ratings yet
Unit One Writing Sentences What Is A Sentence?: Basic Writing Skills, Enla 202
Document13 pages
Unit One Writing Sentences What Is A Sentence?: Basic Writing Skills, Enla 202
Nazif Abrahim
No ratings yet
Healthy Relationships: Joshua Nadudo, Mark Ian Jamias, Preciousgem Toloy
Document25 pages
Healthy Relationships: Joshua Nadudo, Mark Ian Jamias, Preciousgem Toloy
Mark Gil Guillermo
No ratings yet
S3 U4 MiniTest
Document3 pages
S3 U4 MiniTest
Đinh Thị Thu Hà
No ratings yet
Vocabulary: Extra Practice Section C
Document2 pages
Vocabulary: Extra Practice Section C
isabelgilguerrero
100% (1)
Rubric Pinhole
Document2 pages
Rubric Pinhole
jcecil
No ratings yet
Competency-Based Education and Outcome-Based Education
Document29 pages
Competency-Based Education and Outcome-Based Education
Cillo Mariel
100% (3)
Week 4
Document7 pages
Week 4
MARIAN TIMTIMAN
No ratings yet
Kaplan Cirugia No Cardiaca PDF
Document584 pages
Kaplan Cirugia No Cardiaca PDF
Nalemi JT
No ratings yet
GESE G3 - Classroom Activity 2 - Directions, Places and Jobs
Document4 pages
GESE G3 - Classroom Activity 2 - Directions, Places and Jobs
Paula Alcañiz López-Tello
No ratings yet
Assignment 1: Personality and Work Outcome: Puan Siti Faizah Binti Jamaluddin
Document3 pages
Assignment 1: Personality and Work Outcome: Puan Siti Faizah Binti Jamaluddin
khairul ikhwan
No ratings yet
Professional Education - 150 Items: Multiple Choice
Document21 pages
Professional Education - 150 Items: Multiple Choice
Stan Erj
No ratings yet
Learning Strategies Types Normal
Document11 pages
Learning Strategies Types Normal
Sri Thoat Mulyaningsih
No ratings yet
Statics 1
Document38 pages
Statics 1
josua james
No ratings yet
Atal Innovation Mission
Document7 pages
Atal Innovation Mission
Shafaq Alam
No ratings yet
Universidad de La Guajira Centro de Lenguas-Sede Maicao
Document13 pages
Universidad de La Guajira Centro de Lenguas-Sede Maicao
Gehiler C'otes
No ratings yet