Gathering & Cleaning Data: Baygan Casuela Dionisio Gayo Lagunday Mejilla Ragual Tan

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 45

Chapter 2:

Gathering &
Cleaning
Data

Baygan Casuela Dionisio Gayo Lagunday Mejilla Ragual Tan


Let us know
HOW TO
COLLECT
DATA
first…
1. Determine What
Information You Want to
Collect
The first thing
you need to do is
choose what
details you want
to collect.
2. Set a Timeframe for
Data
Collection
In the early stages
of your planning
process, you should
establish a
timeframe for your
data collection.
3. Determine Your Data
Collection Method

At this step, you


will choose the
data collection
method that will
make up the core
of your data-
gathering
4. Collect the
Data

Once you have


finalized your
plan, you can
implement your
data collection
strategy and
start collecting
5. Analyze the Data &
Implement your Findings

Once you’ve
collected all of
your data, it’s
time to analyze it
and organize your
findings.
Let us now discuss the
WAYS OF
OBTAINING
DATA
1. Literature Sources

This involves the


collection of data
from already
published text
available in the
public domain.
2. Surveys
It is another way of
gathering data for
research purposes.
Information are
gathered through
questionnaire, mostly
based on individual or
group experiences
regarding a particular
3. Interviews

It is a qualitative
method of
obtaining data
whose results are
based on intensive
engagement with
respondents about
4. Observations

is used by monitoring
participants in a
specific situation or
environment at a
given time and day.
5. Documents &
Records
This is the process of
examining existing
documents and records of
an organization for tracking
changes over a period of
time. Records can be
tracked by examining call
logs, email logs, databases,
minutes of meetings, staff
6. Experiments

Data are mostly


collected based on
the cause and
effect of the two
variables being
studied.
ADVANTAGES &
DISADVANTAGES
OF DATA SOURCES
Structured Interview

A structured interview is
a type of interview in
which the interviewer
asks a particular set of
predetermined
questions.
Advantages of
Structured Interview
• Standardized questions.
• Potential to pre-code answers using
computers to analyze the data.
• The interviewer is present to explain
the question to avoid
misunderstanding.
Disadvantages of
Structured Interview
• The interviewer can potentially effect the
answers through using tone of voice and
body language.
• The interview is only as good as the
questions it contains, if the interviewer or
the respondents misinterprets the data
then it becomes invalid.
• Time consuming and costly.
Unstructured Interview
A unstructured interview is a type of
interview in which the interviewer asks
questions which are not prepared in
advance.

In unstructured interviews, questions arise


spontaneously in free- flowing
conversation, which means that different
respondents are asked different questions.
Advantages of
Unstructured Interview
• Flexible and more comfortable.
• Better understanding of the respondents
that in a structured interview.
• Very practical method to analyze a certain
respondent.
• It break the communication gap between
the interviewer and the respondent.
Disadvantages of
Unstructured Interview
• There are chance to get diverted from
the entire interview.
• Time consuming
• Not suitable for certain candidates
• There are risk of speaking about
confidential matters during the
interview.
Face to face Interview

Also known as one on one


interview. It is a data collection
method when the interviewer
directly communicates with
the respondent in accordance
with prepared questionnaire,
Advantages of Face to
face Interview
• Accurate answers or screening
• Capture verbal and non- verbal
questions.
• The interviewer is the one that has
control over the interview.
• Captures emotions and behaviors
Disadvantages of Face
to face Interview

• Costly process
• Time consuming process
• Introvert nature
• Subjectivity in decision-making
COMPONENTS OF
THE COMPLETE &
ACCURATE DATA
SET
Content just needs to
be right

Any errors make room for


inaccurate and incomplete
usage of data.
Form eliminates
ambiguities about the
content
There’s a certain degree of
discipline needed to create
consistent and standardized
forms in obtaining data.
10 TIPS ON
CLEANING YOUR
DATA
1. Read the data
documentation

This will tell you what each


component of the data file
represents and help you
identify what data is most
relevant to your research
interests and what data you
can avoid.
2. Excel’s “Text-to-
Columns” feature

Especially large data files are


often stored in “csv” or
“comma separated value”
formats and can be imported
into Excel using this handy
feature.
3. VLOOKUP formula

Do you want to pull multiple


values from a workbook into
another workbook? VLOOKUP
has your back.
4. COUNTIF formula

Are you looking for duplicate


values in a range or checking
whether values in one
workbook are present in
another workbook? COUNTIF
counts the number of times a
value occurs in a range.
5. LEFT and RIGHT
formulas

These are very useful when you


need to parse out specific
characters from the beginning or
end of a value. For instance if
“092017” represents September
2017, but I only need the year,
then I can use the RIGHT formula
to collect the last four digits.
6. TRIM formula

Frustrated by inexplicable
extra spaces that follow the
value you want? This formula
“trims” those out for you.
7. CONCATENATE
formula = “&”

Concatenate is a fancy word


for linking two values together
– you can use the formula for
this or insert an ampersand
between the two cell
references, e.g. =A1&B1.
8. Excel’s filters doesn’t
get enough credit

Are you looking for multiple


misspellings of New York? The
filters help you quickly identify
and correct them.
9. Nest your formulas

Find ways to combine formulas to reduce


the number of steps you have to complete!
For instance, do you need to look up values
in Workbook 1 that are associated to a
value’s last five characters in Workbook 2?
Nest the RIGHT and VLOOKUP formulas to
quickly get your answer.
10. Work off a copy of
the original data file

You don’t want to be in a


situation where you have
mistakenly deleted data
values and then have to
download the data file again.
Keep the original version
handy as a backup.
TOOLS ON
CLEANING YOUR
DATA
1. Drake

Drake is a simple-to-use,
extensible, text-based data
workflow tool that organizes
command execution around data
and its dependencies. Data
processing steps are defined along
with their inputs and outputs.
and Drake automatically
resolves their dependencies
and calculates:
• which commands to execute
(based on file timestamps)
• in what order to execute the
commands (based on
dependencies)
2.
OpenRefine
OpenRefine (formerly Google
Refine) is a powerful tool for
working with messy data: cleaning
it; transforming it from one format
into another; and extending it with
web services and external data.
3.
DataWrangl
er
Wrangler is an interactive tool for data
cleaning and transformation. Spend less
time formatting and more time analyzing
your data. Wrangler allows interactive
transformation of messy, real-world data
into the data tables analysis tools
expect. Export data for use in Excel, R,
Tableau
4.
DataCleaner
The heart of DataCleaner is a strong data
profiling engine for discovering and
analyzing the quality of your data. Find the
patterns, missing values, character sets and
other characteristics of your data values.
Profiling is an essential activity of any Data
Quality, Master Data Management or Data
Governance program.
5. Winpure Data
Cleaning Tool
Data quality is an important contributor
in the overall success of a project or
campaign. Inaccurate data leads to
wrong assumptions and analysis.
Consequently it leads to failure of the
project or campaign. Duplicate data can
thus cause all sorts of hassles such as
slow load ups, accidental deletion etc.

You might also like