Professional Documents
Culture Documents
Gathering & Cleaning Data: Baygan Casuela Dionisio Gayo Lagunday Mejilla Ragual Tan
Gathering & Cleaning Data: Baygan Casuela Dionisio Gayo Lagunday Mejilla Ragual Tan
Gathering & Cleaning Data: Baygan Casuela Dionisio Gayo Lagunday Mejilla Ragual Tan
Gathering &
Cleaning
Data
Once you’ve
collected all of
your data, it’s
time to analyze it
and organize your
findings.
Let us now discuss the
WAYS OF
OBTAINING
DATA
1. Literature Sources
It is a qualitative
method of
obtaining data
whose results are
based on intensive
engagement with
respondents about
4. Observations
is used by monitoring
participants in a
specific situation or
environment at a
given time and day.
5. Documents &
Records
This is the process of
examining existing
documents and records of
an organization for tracking
changes over a period of
time. Records can be
tracked by examining call
logs, email logs, databases,
minutes of meetings, staff
6. Experiments
A structured interview is
a type of interview in
which the interviewer
asks a particular set of
predetermined
questions.
Advantages of
Structured Interview
• Standardized questions.
• Potential to pre-code answers using
computers to analyze the data.
• The interviewer is present to explain
the question to avoid
misunderstanding.
Disadvantages of
Structured Interview
• The interviewer can potentially effect the
answers through using tone of voice and
body language.
• The interview is only as good as the
questions it contains, if the interviewer or
the respondents misinterprets the data
then it becomes invalid.
• Time consuming and costly.
Unstructured Interview
A unstructured interview is a type of
interview in which the interviewer asks
questions which are not prepared in
advance.
• Costly process
• Time consuming process
• Introvert nature
• Subjectivity in decision-making
COMPONENTS OF
THE COMPLETE &
ACCURATE DATA
SET
Content just needs to
be right
Frustrated by inexplicable
extra spaces that follow the
value you want? This formula
“trims” those out for you.
7. CONCATENATE
formula = “&”
Drake is a simple-to-use,
extensible, text-based data
workflow tool that organizes
command execution around data
and its dependencies. Data
processing steps are defined along
with their inputs and outputs.
and Drake automatically
resolves their dependencies
and calculates:
• which commands to execute
(based on file timestamps)
• in what order to execute the
commands (based on
dependencies)
2.
OpenRefine
OpenRefine (formerly Google
Refine) is a powerful tool for
working with messy data: cleaning
it; transforming it from one format
into another; and extending it with
web services and external data.
3.
DataWrangl
er
Wrangler is an interactive tool for data
cleaning and transformation. Spend less
time formatting and more time analyzing
your data. Wrangler allows interactive
transformation of messy, real-world data
into the data tables analysis tools
expect. Export data for use in Excel, R,
Tableau
4.
DataCleaner
The heart of DataCleaner is a strong data
profiling engine for discovering and
analyzing the quality of your data. Find the
patterns, missing values, character sets and
other characteristics of your data values.
Profiling is an essential activity of any Data
Quality, Master Data Management or Data
Governance program.
5. Winpure Data
Cleaning Tool
Data quality is an important contributor
in the overall success of a project or
campaign. Inaccurate data leads to
wrong assumptions and analysis.
Consequently it leads to failure of the
project or campaign. Duplicate data can
thus cause all sorts of hassles such as
slow load ups, accidental deletion etc.