Professional Documents
Culture Documents
Lesson 7 Data Description and Diagnostics
Lesson 7 Data Description and Diagnostics
Lesson 7 Data Description and Diagnostics
and Diagnostics
PREPROCESSING TECHNIQUES
Preprocessing techniques refer to a set of steps and
methods applied to raw data before it undergoes
analysis. The primary goal is to clean, organize, and
enhance the data, making it more suitable for accurate
and efficient analysis. This process is crucial because
real-world data is often messy, containing errors,
missing values, or unnecessary details that can hinder
the effectiveness of analytical tools.
1. DATA CLEANING
Data cleaning is the process of identifying and fixing
errors or inconsistencies in a dataset to ensure that it is
accurate, reliable, and ready for analysis. Imagine you
have a bunch of puzzle pieces, but some are missing,
others are damaged, and a few don't quite fit. Data
cleaning is like sorting through these pieces, fixing or
removing the problematic ones, and making sure
everything fits together smoothly.
Data Cleaning
Handling Missing Data: Dealing with Noisy Data:
Sometimes, data is incomplete, Noisy data includes outliers or
and certain values are missing. errors that don't reflect the
Data cleaning involves actual trends in the dataset.
figuring out how to fill in these Data cleaning identifies and
missing pieces, either by corrects these anomalies,
estimating values or removing ensuring that they don't skew
incomplete records. the analysis.
Real Life Example
Imagine you are managing a customer database for an e-commerce platform. In
this database, there's a column for customer ages, and upon closer inspection,
you notice a few entries that seem off. Some customers have ages recorded as
150 or even 999.