02 - Data Preparation and Cleaning

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

Data Preparation and

Cleansing
Data Analytics for Accounting
OBJECTIVES

• Understand how data are organized in an accounting


information system
• Understand how data are stored in a relational database
• Explain and apply extraction, transformation, and loading
(ETL) techniques
Mastering the Data

• Mastering the data requires a firm understanding of what data


are available and where they are stored, as well as being skilled
in the process of extracting, transforming, and loading (ETL) the
data in preparation for data analysis
How Data are Used and Stored in the
Accounting Cycle

• A basic understanding of accounting processes and its associated


data, how those data are organized, and why, can help you
request the right data and facilitate that request so that you
know exactly where each piece of data is held
• While it is often preferred to analyze data from a flat file, when
it comes to storing data and maintaining data integrity, a
relational database is preferred because of its ability to maintain
“one version of the truth” across multiple data elements
FLAT FILE VS RELATIONAL DATABASE

FLAT FILE RELATIONAL DATABASE

Customer
Customer Sales
Sales Customer
Customer Sales
Sales
Data
Data Data
Data Table
Table Table
Table

Product
Product Payment
Payment Product
Product Payment
Payment
Data
Data Data
Data Table
Table Table
Table
Data and Relationships in a Relational
Database

Storing data in a normalized, relational database instead of a flat


file ensures that data are complete, not redundant, and that
business rules are enforced; it also aids communication and
integration across business processes
THREE TYPES OF COLUMNS IN RELATIONAL
DATABASE

• Primary Key, a column which its purpose is to ensure that each


row in the table is unique, so it is often referred to as a “unique
identifier.”
• Foreign Key, column which relates one table to another table. To
connect one table to another table is by placing primary key of
source table to destination table
• Descriptive Attribute, column which hold descriptive data
• Composite primary key, two or more foreign keys from different
tables
EXAMPLE

Supplier Table
Supplier ID Supplier Name Address • Supplier ID is primary key in Supplier
Table
1 PT ABC Jl. Maju Jaya 5
• Supplier ID acts as foreign key in
2 CV XYZ Jl. Merdeka no 10 Purchase Order Table to connect
Supplier Table and Purchase Order
Table
• Supplier Name, Address are descriptive
Purchase Order Table
columns in Supplier Table
PO No Date Amount Supplier ID
1787 11/1/2020 5.000.000 1
1788 12/1/2020 725.000 2
1789 15/1/2020 3.250.000 1
Data Dictionary

• Data Dictionary, also called a Data Definition Matrix, provides


detailed information about the business data, such as standard
definitions of data elements, their meanings, and allowable
values
• The data dictionary is very important as it contains information
such as what is in the database, who is allowed to access it,
where is the database physically stored, etc
Extraction, Transformation and Loading
(ETL) of Data

• Processes of extraction of data from data source, transform the


data to meet the requirement and load transformed data into
defined format
Extraction

• Process of extraction of data from data source


• The more prepared when requesting data in the first place, the
more time organization will save and the database administrators
in the long run
• Two first steps of ETL which part of extraction
1. Determine the purpose and scope of the data request
2. Obtain the data
Extraction (2)

• There are a variety of methods in data gathering


• Two most commons way are
• Using SQL (for relational database)
• Excel (for flat file store in spreadsheet)
Transformation

• Anytime data are moved from one location to another, it is


possible that some of the data could have been lost during the
extraction
• It is critical to ensure that the extracted data are complete, and
that the integrity of the data remains
• This is the 3rd step of ETL process which called Validating the
Data for Completeness and Integrity
Transformation (2)

• Four steps of validation processes


1. Compare the number of records
2. Compare descriptive statistics for numeric field
3. Validate date/time fields
4. Compare string limits for text fields
Transformation (3)

• Once the data have been validated, the data will likely need to
be cleaned. Some of the more common ways in cleansed data are
1. Remove headings or subtotals
2. Clean leading zeros and nonprintable characters
3. Format negative numbers
4. Correct inconsistencies across data
Loading

• The data analysis technique you plan to implement, the subject


matter of the business questions you intend to answer, and the
way in which you wish to communicate results will all drive the
choice of which tool you use to perform your analysis.
• The last step of ETL

You might also like