Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 47

Gathering Data

Instructor: Mr. Abdul Basit


SOURCES OF INFORMATION

Primary Source Secondary Source


•Data is collected by •Data collected,
researcher himself compiled or
written by other
•Data is gathered researchers eg. books,
through questionnaire, journals, newspapers
interviews, •Any reference must
observations etc. be acknowledged
Data Collection Methods
Secondary Data Analysis
 A type of research in which data collected by others
are reanalyzed.
Primary Data Analysis
 Original analysis of the data collected in a study.
Where do data come from?
 We’ve seen our data for this lab, all nice and
collated in a database – from:
 Insurance companies (claims, medications,
procedures, diagnoses, etc.)
 Firms (demographic data, productivity data, etc.)
Pre-Data Collection Steps
1. Clearly define the goals and objectives of the data
collection
2. Reach understanding and agreement on operational
definitions and methodology for the data collection plan
3. Ensure data collection (and measurement) repeatability,
reproducibility, accuracy, and stability
Secondary Data – Examples of
Sources
 County health departments
 Vital Statistics – birth, death certificates
 Hospital, clinic, school nurse records
 Private and foundation databases
 City and county governments
 Surveillance data from state government programs
 Federal agency statistics - Census, NIH, etc.
Secondary Data – Advantages
 It may be very accurate.
 When especially a government agency has
collected the data, incredible amounts of time and
money went into it. It’s probably highly accurate.
Secondary Data – Limitations
 When was it collected? For how long?
 May be out of date for what you want to analyze.
 May not have been collected long enough for
detecting trends.
Secondary Data – Limitations
 Is the data set complete?
 There may be missing information on some
observations
 Unless such missing information is caught and
corrected for, analysis will be biased.
Secondary Data – Limitations
 Are there confounding problems?
 Sample selection bias?
 Source choice bias?
 In time series, did some observations drop out
over time?
Secondary Data – Limitations
 Are the data consistent/reliable?
 Did variables drop out over time?
 Did variables change in definition over time?
 E.g. number of years of education versus highest
degree obtained.
Secondary Data – Limitations
 Is the information exactly what you need?
 In some cases, may have to use “proxy variables” –
variables that may approximate something you really
wanted to measure. Are they reliable? Is there
correlation to what you actually want to measure?
DATA COLLECTION METHODS & DATA
SOURCES
 Data collection method: a detailed plan of
procedures that aims to gather data for the
purpose of answering a research question
 Data source: the “who” (or “what”) that supplies
the data
 Firsthand data: data provided by people who have
experienced some phenomenon directly
 Secondhand data: an indirect account of a
phenomenon (e.g., case notes, bystander)
DATA COLLECTION AND THE
RESEARCH PROCESS
 Data collection supplies the critical link
between theory and practice
 Data collection is a consideration for each
phase of the research process
 Phase 1: Problem area and research question
 Phase 2: Research design
 Phase 3: Data analysis
 Phase 4: Writing the report
Selecting a Problem Area and Research
Question
 Rethinking the research question from the
data collection point of view, adds depth and
dimension to underlying intention of the
research question
 After the research problem is selected and the
research question formulated, consider
 different data sources available to the study
 different data collection methods suitable
Formulating a Research Design
 Thinking about “research design” from the
data collection point of view, increases the
likelihood that the data collection method will
fit well with the study context and sample
 The research design specifies when, where,
and how often data are to be collected
Analyzing Data
 Thinking about “data analysis” from the data
collection point of view will produce results
that have greater clarity
 All data collected should have an obvious
place in the data analysis
Writing the Report
 Thinking about “writing a research report”
from the data collection point of view brings
clarity to the purpose of data collection
 Consider who is to be the expected audience of
the report
Criteria for Selecting a Data Collection
Method
 Eight practical criteria
 Size of study
 Scope of study
 Program participation
 Worker cooperation
 Intrusion into the lives of research participants
 Resources
 Time
 Previous research findings
Decision-making grid
__________________________________________________________
Data Collection Methods
___________________________________________________________
Survey Secondary Content Existing
Research Observation Analysis Analysis Statistics
(Chapter 17) (Chapter 14) (Chapter 18) (Chapter 19) (Chapter 20)
_____________________________________________________________________________________________________
General Criteria:

1. Size + 0 + + +
2. Scope + – – – –
3. Program participation + 0 + + +
4. Worker cooperation + – + + +
5. Intrusion to clients – – + + +
6. Resources + – + + +
7. Time + – + + +
8. Previous research + 0 – – –
Previous Research Studies
 Learn from existing research studies
 Which data collection methods worked best to
study the problem
 Expand upon earlier research by trying different
data collection approaches
Selection of a Data Collection
Method
 Create a decision-making grid to choose the
best data collection method
 List the criteria for selection
 List possible data collection methods
 With the research question in mind, assess each
data collection method according to the set
criteria
Characteristics of quantitative and
qualitative research
Key features of
Qualitative Research
1. Collection primarily of qualitative rather than quantitative data
Qualitative methods emphasize observations about natural behavior and
artifacts that capture social life as it is experienced by the participants
rather than the numerical representations of the categories predetermined
by the researcher.
 
2. Exploratory research question.
Qualitative researchers typically begin their projects seeking to discover
what people think and how they act, and why, in some social setting.
 
3. Inductive reasoning (Reasoning that moves from more specific kinds
of statement to more general ones)
Only after immersing themselves to many observations, do qualitative
researchers try to develop general principles to account their observations.
Key features of
Qualitative Research
4. A focus on human subjectivity.
Qualitative methods emphasize the meanings that participants
attach to events and that people give to their lives.
 
5. Reflexive research design.
In the qualitative methods, the research design may need to be
reconsidered or modified in response to new developments, or to
changes in some other component as research progresses.

6. Sensitivity to the subjective role of the researcher.


Qualitative researchers should be sensitive to the role they play in
the process of data collection. “Researcher as an instrument”
Ways to collect qualitative data
1. Participant Observation
2. Individual Interview
a) Semi-structured interview
b) Unstructured interview
3. Textual Analysis
4. Focus Group Discussion
Quantitative data
 Data are used to classify groups.
 Examples; numbers, quantity, prevalence,
incidence.
 Variables can be classified as physical
(population, infrastructure), social (poverty,
slums), spatial (land use, proximity) etc.
Quantitative data – example
Quantitative analysis...
No of events: 219
No of people killed: 191,344
Average killed per
6,598
year:
No of people affected: 317,454,534
Average affected per
10,946,708
year:
Economic Damage
16,802,500
(US$ X 1,000):
Quantitative analysis...
Quantitative analysis...

m
m
m

40 K
60 K
80 K
mK
100
Distance of migration destination
50

40

30

20

10

0
100 Km 80 Km 60 Km 40 Km 20 Km

Migrants
Methods Used To Collect Primary Source Data

1. Interviews
2. Questionnaires
3. Survey
4. Experimentation
5. Case Study
6. Observation

However, for a small-scale study, the most commonly used


methods are interviews, survey questionnaires and observations.
Observe verbal &
non-verbal communication, Need to keep
surrounding atmosphere, meticulous records of
culture & situation the observations

Observations

Can be done through discussions,


observations of habits, rituals,
review of documentation,
experiments
Steps To An Effective Observation
Determine what needs to be observed
(Plan, prepare checklist, how to record data)

Select your participants


Random/Selected

Conduct the observation


(venue, duration, recording materials, take photographs )

Compile data collected

Analyze and interpret data collected


What kind of data should be
collected?
 The information you collect is the evidence you will
have available to answer the evaluation questions.
 Poor evidence is information which cannot be
trusted, is limited, or simply is not relevant to the
questions asked.
 Good evidence is information that comes from
reliable sources
and through trustworthy methods that address important
questions
Developing a data collection plan
 Identify types of data needed for the study
 Select the types of measures to measure each variable
 Select and/or develop instruments
 Secure written permission to use each instrument
 Pilot test researcher-developed instrument & revise plan
 Develop data collection forms and procedures
 Implement data collection plan
Identify types of data needed for the
study
1-Testing hypothesis or answering research questions
2-Describe characteristics of sample
Demographics - age, gender, ethnic origin, education
background, marital status
Health-related variables - health habits, diet, exercise,
illness, length of illness,
3- Control for extraneous variables
Measure as many as possible
Intrinsic and extrinsic factors (variables)
May want to see if main effects also apply to
Select and/or develop instruments
Identify existing instruments
Fit with conceptual definition of variable
Quality of instrument - validity & reliability
Validity:

Will the information collection methods you have designed produce information that
measures what you say you are measuring? Be sure that the information you
collect is relevant to the evaluation questions you are intending to answer.
Reliability:

Will the evaluation process you have designed consistently measure what you want
it to measure? If you use multiple interviews, settings, or observers, will they
consistently measure
the same thing each time? If you design an instrument, will people interpret your
questions the same way each time?
Cont.
 Resources - costs
 Instrument use & scoring
 Data collectors salary
 Subject compensaton
 Availability & familiarity
 Researcher expertise
 Equipment
 Norms - comparability
 Established norms for instrument - provide comparison group
 Replication - use same instruments
 Populations appropriateness
 Reading level & writing ability
 Cultural , ethnic origin
 Gender biased
 Translations for non-English speaking subjects
Secure written permission to use each
instrument
 Look for employer and write to author at
place of employment
 Find most recent publication to identify
current employer
 Request a copy of the instrument and
information on scoring, procedures, validity,
and reliability
Develop data collection forms and
procedures
 Forms
 Screening potential subjects
 Consent & assent forms
 Explanations to potential subjects for people
referring subjects
 Advertisements to recruit subjects
 Records for tracking contacts with subjects
 Mailing lists and logs for receipts

Cont.
 Procedures
 Specific conditions for data collection
 Specific procedures and sequencing for
experiments
 Standard information for subject's
questions
 Procedures for risks if they occur
 List of all materials needed
 Interview guidelines, instruments,
observation directions
Implement data collection plan
 Select who will collect data
 Researcher or neutral agent
 Staff
 Experience
 Background similar to subject
 Personality - pleasant, sociable, non-
judgmental, non-threatening
 Available to collect data for the entire study
period
Things to Consider
 All data collection methods are capable of gathering quantitative and
qualitative data, although some may be better suited towards one task or
the other

 There is no single data collection method that can guarantee credible data

 All data collection methods can be consciously manipulated

 All data collection methods can be ‘contaminated’ by unrecognized bias

 All data collection methods require conscious deliberation on the part of


the researcher to ensure credibility

44
Meaning of Data Quality (1)
 Generally, you have a problem if the data doesn’t
mean what you think it does, or should
 Data not up to spec : garbage in, glitches, etc.
 You don’t understand the spec : complexity, lack of
metadata.
 Many sources and manifestations
 As we will see.
 Data quality problems are expensive and pervasive
 DQ problems cost hundreds of billion $$$ each year.
 Resolving data quality problems is often the biggest effort
in a data mining study.
Example
T.Das|97336o8327|24.95|Y|-|0.0|1000
Ted J.|973-360-8779|2000|N|M|NY|1000

 Can we interpret the data?


 What do the fields mean?
 What is the key? The measures?
 Data glitches
 Typos, multiple formats, missing / default values
 Metadata and domain expertise
 Field three is Revenue. In dollars or cents?
 Field seven is Usage. Is it censored?
 Field 4 is a censored flag. How to handle censored data?
Data Glitches
 Systemic changes to data which are external to the
recorded process.
 Changes in data layout / data types
 Integer becomes string, fields swap positions, etc.
 Changes in scale / format
 Dollars vs. euros
 Temporary reversion to defaults
 Failure of a processing step
 Missing and default values
 Application programs do not handle NULL values well …
 Gaps in time series
 Especially when records represent incremental changes.

You might also like