Professional Documents
Culture Documents
Chapter 2.1 Data Cleaning in MS Excel
Chapter 2.1 Data Cleaning in MS Excel
MS Excel
“Dirty” data can actually do your business
more harm than good.
Types of problems
Extra Spaces from text
Problem:
Trailing space in Excel refers to the spaces that appear
at the end of a cell's contents. Extra spaces added
when the entering cell's contents are or by formatting
issues often cause trailing space. Trailing space can
interfere with formulas and other functions
Treatment:
Select the Entire column and then trim
The #REF! error shows when a formula
refers to a cell that's not valid. This
happens most often when cells that were
referenced by formulas get deleted, or
pasted over.
#NAME? is a common Excel error notation
that appears when a formula or function cannot find
the referenced data it needs to complete the
calculation. This could be caused by a few different
things, such as a misspelling in the formula name or
an invalid reference
Types of problems
Empty rows
Problem:
It breaks the information into multiple tables instead of
one single table
There shouldn't be any empty rows in table
Treatment:
Select the Entire column and then filter
Filter for empty cell in any column
Delete the row which appear after filtering
Types of problems
Duplicate Data
Problem:
Entire Record is same
Treatment:
Highlight using Conditional formatting
Remove Duplicate
Filter Data using advance filtering
Types of problems
Data Types and Data Consistency
Problem:
Data spelled incorrect
Some columns may have inconsistent data type
Treatment:
Find and Replace
Text to Column
Excel functions for changing text case
Microsoft Excel has three special functions that you can use to change the case
of text. They are UPPER, LOWER and PROPER.
The upper() function allows you to convert all lowercase letters in a text string to
uppercase.
The lower() function helps to exclude capital letters from text.
The proper() function makes the first letter of each word capitalized and leaves
the other letters lowercase (Proper Case).
1. Trim extra spaces
2. Delete/remove blank rows using filter
3. Remove duplicate
4. All “regular” and “irregular” must be in Upper
case
5. All Names in collum C must be in Proper
6. Font: Panroman
Font size: 11
Text alignment: Align Left
7. ID number must place in column B
Name must place in column C
Grade must place in column D