Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 11

DATA SCIENCE

ANALYTICS

DATA
CLEANIN
G AND
PREPROCE
SSING
Anapi/Celorico/Domingo/Pascua/Rosiete

IN EXCEL
TABLE
OF
• CONTE
Data cleaning 01


NTS
Advantages and benefits
Basic steps for data cleaning
02
03
• Data Preprocessing
• Steps in Data Preprocessing 04
• Advantages and Disadvantagesof Data 05
Reprocessing 06
WHAT IS
DATA
CLEANI
Data cleaning is the process of

NG
fixing or removing incorrect,
corrupted, incorrectly formatted,
duplicate, or incomplete data
within a dataset
ADVANTAGES
AND BENEFITS
OF DATA
Having clean data will ultimately increase overall productivity and allow for the highest quality information in your

CLEANING
decision-making. Benefits include:

• Removal of errors when multiple sources of data are at play.


• Fewer errors make for happier clients and less-frustrated employees.
• Ability to map the different functions and what your data is intended to do.
• Monitoring errors and better reporting to see where errors are coming from, making it easier to
fix incorrect or corrupt data for future applications.
• Using tools for data cleaning will make for more efficient business practices and quicker
decision-making.
THE BASIC STEPS FOR
CLEANING DATA ARE
AS FOLLOWS:
1. Import the data from an external data source.
2. Create a backup copy of the original data in a separate workbook.
3. .Ensure that the data is in a tabular format of rows and columns with: similar data in each
column, all columns and rows visible, and no blank rows within the range. For best results,
use an Excel table.
4. Do tasks that don't require column manipulation first, such as spell-checking or using the
Find and Replace dialog box.
5. Next, do tasks that do require column manipulation. The general
steps for manipulating a column are:

a. Insert a new column (B) next to the original column (A) that needs cleaning.
b. Add a formula that will transform the data at the top of the new column (B).
c. Fill down the formula in the new column (B). In an Excel table, a calculated column is
automatically created with values filled down.
d. Select the new column (B), copy it, and then paste as values into the new column (B).
e. Remove the original column (A), which converts the new column from B to
DATA
PREPROCESSI
NG
Data preprocessing is a kind of process
in data analysis. It is used to clean and
transform raw data into useful
information that can be used by
computers.
STEPS IN DATA
PREPROCESSING
1.Collection of the data
2.Cleaning of the data
3.Handling missing value
4.Removing Duplicates
5.Standardized Formats
6.Filtering and Sorting
ADVANTAGES OF DATA
PREPROCESSING
There are several advantages of data preprocessing in Excel:

• Excel provides a user-friendly interface so that we can easily do data


preprocessing and other data analysis tasks.
• Excel offers a wide range of functions and features that helps in different data
preprocessing needs.
• Excel is widely available, that’s why it is commonly used for data
preprocessing.
• Excel integrates well with other Microsoft Office applications, facilitating
seamless data transfer and collaboration.
DISADVANTAGES OF
DATA PREPROCESSING
There are several disadvantages of data preprocessing in Excel:

• Along with the advantages, there are some disadvantages of data


preprocessing in Excel:
• Excel may not be suitable for handling large datasets.
• Excel’s analytical capabilities are robust but may not match those offered by
specialized statistical or data analysis software.
• Data preprocessing tasks in Excel often require manual execution.
THAN
K

You might also like