Unit 3: Research Design for data Acquisition

M Krishna
Presentation Agenda
➢ Introduction
➢ Measurement Design:
➢ Primary Type of Measurement scales,
➢ Validity and Reliability Measurement,
➢ Sample Design,
➢ Non-Probability Sampling
➢ Probability Sampling
➢ Data Collection Procedure
➢ Source of secondary Data
➢ Primary data Collection Methods
➢ Validity and Reliability of data Collection Procedures
➢ Measurement error
There are four methods of acquiring
✓ Collecting new data;
✓ Converting/transforming legacy
✓ Sharing/exchanging data; and
✓ Purchasing data.
This includes automated collection
(e.g., of sensor-derived data), the
manual recording of empirical
observations, and obtaining existing
data from other sources.
✓ Business Needs: The first thing to always consider is the business need - why are
these data required? What will be done with them?
✓ Business Rules: A business rule identifies the constraints under which the
business operates.
✓ Data Standards: Any Government, USGS, or industry standards that apply will
need consideration.
✓ Accuracy Requirements: Among the most familiar accuracy requirements is the
locational accuracy for spatial data; but there are other accuracy requirements that
may need to consider as well.
✓ Cost: Cost is always a consideration. Sometimes it's cheaper to buy than to
Cost: Cost is always a consideration. Sometimes it's cheaper to buy than to collect.
to collect.
✓ Format: Need the data as spatial data, photos, flat files, Excel files, XML files
Collection Methods
1. Questionnaire: a research instrument consisting of a series of questions and
other prompts for the purpose of gathering information from respondents.
These 5 basic questions—how, why, who, when, and what
✓ Structured Questionnaire.
✓ Unstructured Questionnaire.
✓ Open ended Questionnaire.
✓ Close ended Questionnaire.
2. Interviews: A method of data collection that involves two or more people
exchanging information through a series of questions and answers.

3. Observation Methods: Studying behavior, interactions, and phenomena in

their natural settings.

4. Physical Measures: Involve direct quantification of various physical

properties, characteristics, or variables

5. Attitude Scale: is tool prepared for the purpose of measuring people’s attitude
an issue

6. Psychological tests: standardized instruments used to assess various

psychological constructs and characteristics in individuals.
7. Q-sorts: is used in psychology and social sciences to study subjective opinions,
attitudes, or perceptions of individuals.

8. Delphi technique: The Delphi technique is a structured and iterative data

collection method used to gather expert opinions and insights on a specific topic.

9. Visual analogue scale: The Visual Analogue Scale (VAS) is a psychometric

tool used to measure subjective or psychological attributes, such as pain intensity,
emotional states, satisfaction, or other subjective experiences

10. Pre-existing data: data that has been collected for a specific purpose or study
before the current research project
➢ Measurement design refers to the systematic process of planning and
developing appropriate methods and tools to measure specific variables in a
research study.
➢ A well-designed measurement process is essential for obtaining accurate and
reliable data to answer research questions effectively.
➢ Measurement is the process of assigning numbers to variables it include
counting, ranking and comparing objects or events
➢ Some qualitative studies gather data in narrative form and numbers are not
associated with these data so these data not included in the concept of
1. Nominal: Attributes are only Names : Weakest (Gender, religious
affiliation, Material status and job)
2. Ordinal: is a type of categorical data that has a natural order or ranking
between its categories, but the differences between the categories are not
quantifiable or evenly spaced.
3. Interval: data and measurement, an interval scale is a type of quantitative
scale that has meaningful and equal intervals between its values.
4. Ratio: the ratio scale is a type of measurement scale that possesses all the
properties of an interval scale but with the additional feature of having a
true zero point

Note: Data can always be converted from one level to a lower level of measurement, but not to a higher
level. E.g. Interval and ratio data can be converted to nominal or ordinal data.
Primary Data may be collected through:
➢ Experiments
➢ Surveys (sample surveys or census surveys)
➢ Observation
➢ Interviews
➢ Questionnaires
➢ Schedules
✓ Data from the field is collected with the help of observation by the observer or
by personally going to the field.

✓ Observation may be defined as systematic viewing, coupled with

consideration of seen phenomenon

✓ Structured Observation : When the observation is characterized by a careful

definition of the units to be observed, the style of recording the observed
information, standardized conditions of observation and the selection of
related data of observation.

✓ Unstructured Observation : When it takes place without the above

✓ Participant Observation: When the observer is member of the group which
he is observing then it is Participant Observation.

✓ Non-Participant Observation: When observer is observing people without

giving any information to them then it is Non-Paricipant Observation.

✓ Uncontrolled Observation: When the observation takes place in natural

condition i.e., uncontrolled observation. It is done to get spontaneous picture
of life and persons.

✓ Controlled Observation: When observation takes place according to pre-

arranged plans, with experimental procedure then it is controlled observation
generally done in laboratory under controlled condition.
Advantages of Observation Disadvantages of observation
Produces Large quantities of data Interviewing selected subjects may
provide more information,
economically, than waiting for the
spontaneous occurrence of the
All data obtained from observations It is expensive method
are usable
The observation technique can be Limited information
stopped or begun at any time
Relative Inexpensive Extensive Training is needed
➢The Interview Method of collecting data involves presentation of oral-verbal
stimuli and reply in terms of oral- verbal responses.
➢Interviewer asks questions (which are aimed to get information required for
study) to respondent.

➢Structured Interviews : In this case, a set of pre-decided questions are there.

➢Unstructured Interviews : In this case, we don’t follow a system of pre-

determined questions.

➢Focused Interviews : Attention is focused on the given experience of the

respondent and its possible effects.
➢Clinical Interviews : Concerned with broad underlying feelings or
motivations or with the course of an individual’s life experience.

➢Group Interviews : a group of 6 to 8 individuals is interviewed.

➢Qualitative and quantitative Interviews : divided on the basis of subject

matter i.e., whether qualitative or quantitative.

➢Individual Interviews : Interviewer meets a single person and interviews


➢Selection Interviews : Done for selection of people for certain Jobs.

Advantages Disadvantages of observation
More information at greater depth It is an expensive Method
can be obtained
Resistance may be overcome by a Interviewer bias
skilled interviewer
Personal information can be Respondent bias
Time consuming
✓A Questionnaire is sent ( by post or by mail ) to the persons concerned
with a request to answer the questions and return the Questionnaire.
✓A Questionnaire consists of a number of questions printed in a definite
order on a form.
Essentials of Good Questionnaire
➢ Should be short and simple
➢ Follow a sequence of questions from easy to difficult one
➢ Technical terms should be avoided
➢ Should provide adequate space for answers in questionnaire
➢ Directions regarding the filling of questionnaire should be given Physical
Appearance – Quality of paper, Colour
➢ Sequence must be clear
✓ Open-ended questions: This gives the respondents the ability to respond
in their own words.
✓ Close-ended or fixed alternative questions L This allows the
respondents to choose one of the given alternatives.
Advantages Disadvantages of observation
Low cost –even when the universe is large and is Time consuming
Free from interviewer bias The respondents need to be educated and cooperative
Respondents have adequate time to think through the This method is slow
Respondents who are not easily approachable, can also Possibility of unclear replies
be reached conveniently
Large samples can be used
➢Very similar to Questionnaire method
➢The main difference is that a schedule is filled by the enumerator who is
specially appointed for the purpose.
➢Enumerator goes to the respondents, asks them the questions from the
Questionnaire in the order listed, and records the responses in the space
➢Enumerator must be trained in administering the schedule
Questionnaire Schedule
Q generally send to through mail and no further Schedule is filled by the enumerator or research
assistance from sender. worker.
Q is cheaper method Costly requires field workers
Non response is high Non response is low
Incomplete and wrong information is more. Depends on Honesty of the enumerator.
Depends on the quality of questionnaire Relatively more correct and complete
Reliability is concerned with the extent to which an experiment, test, or
measurement procedure yields consistent results on repeated trials. Reliability
is the degree to which a measure is free from random errors.

Validity is defined as the ability of an instrument to measure what the

researcher intends to measure.
Reliability Measurement

✓ Test-Retest Reliability: Test-retest reliability assesses the consistency of a

measurement over time.
✓ Internal Consistency Reliability: Internal consistency reliability measures
the extent to which items within a measurement instrument are consistent
with one another.
✓ Inter-Rater Reliability: Inter-rater reliability examines the agreement
between different raters or observers when using the same measurement
Inter-Rater Reliability: Inter-rater reliability examines the agreement
between different raters or observers when using the same measurement
Validity Measurement
✓ Content Validity: Content validity refers to the degree to which the items or questions
in a measurement instrument adequately represent the entire domain or content of
the construct being measured
✓ Criterion Validity: Criterion validity assesses how well a measurement instrument
correlates with an external criterion or gold standard that is already established to
measure the same construct.
✓ Concurrent Validity: Concurrent validity is a type of criterion validity that measures
how well a new measurement instrument compares to an existing instrument that
measures the same construct.
✓ Predictive Validity: Predictive validity examines the extent to which a measurement
instrument predicts or forecasts future outcomes related to the construct being
✓ Construct Validity: Construct validity evaluates the degree to which a measurement
instrument accurately assesses the theoretical construct it intends to measure.
✓ Sampling is the process of selecting a subset of individuals or items from a
larger population to represent and make inferences about the entire
✓ In research, sampling is a critical step as it helps researchers gather data
efficiently, economically, and with an acceptable level of accuracy.
✓ The population refers to the entire group of individuals, items, or events
that the researcher is interested in studying and generalizing the findings to.
✓ A sample is a smaller, manageable subset of the population that is selected
for data collection.
✓ The sampling frame is a list or collection of all the members of the
population from which the sample will be drawn.
Sampling methods are ways to select a sample of data from a
given population (every individual in the whole group).
It is unrealistic to collect data from the entire population because it:
✓ is too big
✓ takes too much time
✓ costs too much money

Ttherefore take an appropriate sized sample as a way of representing the

✓ Sampling is the process of selecting a subset of individuals or items from a
larger population to represent and make inferences about the entire population.

✓ In research, sampling is a critical step as it helps researchers gather data

efficiently, economically, and with an acceptable level of accuracy.

✓ The population refers to the entire group of individuals, items, or events that
the researcher is interested in studying and generalizing the findings to.

✓ A sample is a smaller, manageable subset of the population that is selected for

data collection.

✓ The sampling frame is a list or collection of all the members of the population
from which the sample will be drawn.
Sampling Process
➢ Defining the population of concern.
➢ Specifying a sampling frame, a set of items or events possible
to measure.
➢ Specifying a sampling method for selecting items or events
from the frame.
➢ Determining the sample size.
➢ Implementing the sampling plan.
➢ Sampling and data collection
✓ The principle of randomization is a fundamental concept in research and

✓ It involves the random assignment of participants or subjects to different

groups or conditions in a study.

✓ Randomization is used to minimize bias and ensure that the groups being
compared are as similar as possible at the beginning of the study, except
for the specific intervention or treatment being tested.

✓ This allows researchers to draw valid and reliable conclusions about

cause-and-effect relationship
✓ To the selection of a sample from a population, when this selection is
based on the principle of randomization, that is, random selection or
✓ Probability sampling is a type of sampling method used in research,
where each member of the population has a known and non-zero chance
of being selected in the sample.
✓ In probability sampling, every individual or item in the population has an
equal or known probability of being included in the sample, ensuring that
the sample is representative of the entire population.
✓ In simple random sampling, every member of the population has an
equal chance of being selected in the sample. This is typically achieved
using random number generators or drawing lots to select the sample.
Advantage Disadvantage
✓ Easiest Method and commonly ✓ Make no use of auxiliary info
✓ Nots require any additional ✓ Can be expensive and
Infor. On the frame (such as unfeasible for large population
gender, geographical area, etc) (to identified and reach) or if
the personal interview required.
✓ Analysis of data is reasonably ✓ Not be representative of the
easy and has a sound whole population
mathematics basis
Stratified Random Sampling involves dividing the population into mutually
exclusive subgroups based on certain characteristics (e.g., age, gender,
Then, a random sample is drawn from each stratum proportionate to its size in
the population. This method ensures representation of various subgroups in
the sample.
Advantage Disadvantage
✓ Ensure an adequate sample size ✓ Problem if strata not clearly
for subgroups in the population defined
of interest
✓ Almost certainly produce a gain ✓ Analysis is quite complicated.
in precision in the estimates of ✓ Requires more efforts
the whole population, because ✓ Needs a larger sample size
heterogeneous population is ✓ Strata are overlapping changes
split into fairly homogeneous of bias
In Systematic Sampling, the researcher selects every nth member from a list
of the population. The first individual is randomly selected, and then
subsequent selections are made at regular intervals.
Advantage Disadvantage
✓ Easier to draw without mistakes ✓ If it has periodic arrangement,
then sample collected may not
be an accurate representation of
the entire population
✓ More precise than SRS as more ✓ Over representation of several
evely spread over population group is greater
✓ Easy to use
Cluster Sampling involves dividing the population into clusters or groups
(e.g., schools, households, villages). A random sample of clusters is selected,
and then all members within the selected clusters are included in the
Advantage Disadvantage
✓ Reduced field costs ✓ Clusters may not be
✓ Applicable where no complete representative of whole
list of units is available population but may be too alike
✓ Easier to apply larger ✓ Analysis more complicated than
geographical area for SRS
✓ Same time of travelling ✓ Not good representative of the
Multistage Sampling: Multistage sampling combines two or more sampling
methods. For example, a researcher might use stratified sampling to select
clusters and then use simple random sampling within each cluster.
➢ Unbiased and representative samples that allow for generalization to the
entire population
➢ Statistical procedures can be used to estimate sampling errors and make
➢ Straightforward and transparent sampling methods that are easy to

➢ Probability sampling can be more complex and time-consuming than non-

probability sampling methods.
➢ It also requires an accurate sampling frame (list of all population
members) to ensure proper selection.
➢ Non-probability sampling is a type of sampling method in which not all
members of the population have a known and equal chance of being
selected in the sample.

➢ Unlike probability sampling, non-probability sampling methods do not

rely on random selection.

➢ Non-probability sampling is often used when it is challenging to obtain a

random sample or when the research objectives do not require strict
➢ Convenience Sampling: Convenience sampling involves selecting
participants who are readily available or easy to reach.
➢ This method is quick and convenient but may introduce biases, as it does
not ensure representation of the entire population.
➢ Purposive Sampling: Purposive sampling involves deliberately selecting
participants based on specific characteristics or criteria relevant to the
research question.
➢ This method is useful when the researcher wants to study a particular
subgroup or when expertise is required for the study.
➢ Snowball Sampling: Snowball sampling is commonly used for hard-to-
reach populations or when there is no sampling frame available.
➢ It involves identifying initial participants who then refer additional
participants, creating a "snowball" effect.
➢ Quota Sampling: Quota sampling involves dividing the population into
subgroups based on certain characteristics and then selecting participants
from each subgroup until a predetermined quota is reached. Quota
sampling attempts to ensure proportional representation of subgroups.
Advantages Disadvantages
Possibility to reflect the descriptive Possible unknown proportion i.e lack of
comments about sample representation of the entire population
Cost-effectiveness and time Lower level of generalization of gereralization
effectiveness of research findings
Effective when it is unfeasible or Difficulties in estimating sample variability and
impractical tos conduct probability identifying bias.
Problem 1: A coin is thrown 3 times. What is the probability that at least one head is

Sol: Sample space = [HHH, HHT, HTH, THH, TTH, THT, HTT, TTT]

Total number of ways = 2 × 2 × 2 = 8. Fav. Cases = 7

P (A) = 7/8


P (of getting at least one head) = 1 – P (no head)⇒ 1 – (1/8) = 7/8

Problem 2: Find the probability of getting a numbered card when a card is drawn from the
pack of 52 cards.

Sol: Total Cards = 52. Numbered Cards = (2, 3, 4, 5, 6, 7, 8, 9, 10) 9 from each suit 4 × 9 = 36
P (E) = 36/52 = 9/13

Problem 3: There are 5 green 7 red balls. Two balls are selected one by one without
replacement. Find the probability that first is green and second is red
Sol: P (G) × P (R) = (5/12) x (7/11) = 35/132
Problem 4: What is the probability of getting a sum of 7 when two dice are thrown?
Sol: Probability math - Total number of ways = 6 × 6 = 36 ways.

Favorable cases = (1, 6) (6, 1) (2, 5) (5, 2) (3, 4) (4, 3) --- 6 ways.

P (A) = 6/36 = 1/6

Problem 3: There are 5 green 7 red balls. Two balls are selected one by one without
replacement. Find the probability that first is green and second is red
Sol: P (G) × P (R) = (5/12) x (7/11) = 35/132
Problem 4: What is the probability of getting a sum of 7 when two dice are thrown?
Sol: Probability math - Total number of ways = 6 × 6 = 36 ways.

Favorable cases = (1, 6) (6, 1) (2, 5) (5, 2) (3, 4) (4, 3) --- 6 ways.

P (A) = 6/36 = 1/6

Problem 3: There are 5 green 7 red balls. Two balls are selected one by one without
replacement. Find the probability that first is green and second is red
Sol: P (G) × P (R) = (5/12) x (7/11) = 35/132
Problem 4: What is the probability of getting a sum of 7 when two dice are thrown?
Sol: Probability math - Total number of ways = 6 × 6 = 36 ways.

Favorable cases = (1, 6) (6, 1) (2, 5) (5, 2) (3, 4) (4, 3) --- 6 ways.

P (A) = 6/36 = 1/6

Problem 5: 1 card is drawn at random from the pack of 52 cards.

(i) Find the Probability that it is an honour card.
(ii) It is a face card.

Sol: (i) honour cards = (A, J, Q, K) 4 cards from each suits = 4 × 4 = 16

P (honour card) = 16/52 = 4/13
(ii) face cards = (J,Q,K) 3 cards from each suit = 3 × 4 = 12 Cards.
P (face Card) = 12/52 = 3/13
Problem 6: Two cards are drawn from the pack of 52 cards. Find the probability that
both are diamonds, or both are kings.
Sol: Total no. of ways = 52C2
Case I: Both are diamonds = 13C2
Case II: Both are kings = 4C2
P (both are diamonds, or both are kings) = (13C2 + 4C2 ) / 52C2 = (78+6)/1326 =84/ 1326

Problem 7: Three dice are rolled together. What is the probability as getting at least
one '4'?

Sol: Total number of ways = 6 × 6 × 6 = 216.

Probability of getting number ‘4’ at least one time
= 1 – (Probability of getting no number 4) = 1 – (5/6) x (5/6) x (5/6) = 91/216
✓ Data Collection begins only after a research Problem has been defined and
research design finalised.
✓ Primary Data: are collected for the first time, hence original in character
✓ Secondary Data: data are those which have already been collected by
someone else and have already been passed through Statistical Process.

✓ E.g.
1.1 Personal Interview
1.1.1 Direct Personal Interview
1.1.2 Indirect oral Interview
1.2 Telephone
1.3 Mailed Questioner Method
1.4 Online 3. Experiments
1.5 In house self-administrated
1.6 Focus group Discussion
1.7 schedules sent through Enumerators
2. Observation
2.1 Participant Observation
2.2 Non-participant Observation
2.3 Mechanical Observation
1. Structured Interview: Usually used in quantitative research. Standard set of
questions are asked to all respondents. Interviewer ask the questions
exactly as appeared. Choice of answer to the questions is often pre-
determined (Close-ended)

2. Unstructured interview: Neither the questions not the answers are pre-
determined. Questions ca be changed or adopted to meet respondent’s
understanding. Does not offer a restricted pre-set range of answers.

3. Semi-Structured Interview: includes partly Open-Ended and Closed-Ended

questions. Involves both give and receive information
1. Participant Observation: The observer take part in the sitituation he or she
observers. Mostly takes part in community settings.

2. Non-participant Observation: The observer does not participate in the

situation and collects data by observing behaviour without interacting with

3. Mechanical Observation: People or situations are to be observed in a

closed setting through Mechanical Devices
1. Internal Sources
1.1 In house publication
1.2 Letters, Records
1.3 Databases
2. External Sources
2.1 Ministries, Agencies of gove
2.2 Reports of international Bodies and Foreign Govt.
2.3 www., Magazines, Journals, Newspapers, Social Media
2.4 Associations
2.5 Research Groups & Companies
2.6 University and Colleges
Secondary data Primary Data
Advantages 1. Less expensive 1. Application and
2. Easily accessible usable
3. Immediately 2. Accurate ad reliable
available 3. Up-to-data
Disadvantages 1. May not be 1. Expensive
applicable 2. Not as readily
2. Potentially acceble
unreliable 3. Not available
3. Frequently outdates immediately
✓ Each item constitutes equally to the measure of that construct, implying all
items are of equal importance
✓ Can have serval types of response categories

To enhance the validity and reliability of data collection procedures, researchers
✓ Clearly define the research objectives and variables of interest.
✓ Use established and validated data collection instruments whenever possible.
✓ Conduct pilot testing to identify and address any issues with the instruments.
✓ Train data collectors to ensure consistency and accuracy in data collection.
✓ Consider blinding or masking techniques to minimize bias in data collection.
✓ Use randomization and appropriate sampling methods to minimize selection
✓ Implement quality control measures to check for data accuracy and
✓ Questionnaire should be short and simple
✓ Size of the questionnaire should be kept to the minimum
✓ Questions should proceed in Logical sequence
✓ Personal questions should be left to the end
✓ Technical terms should be avoided
✓ Questions may be Dichotomous (yes / No) or multiple choice or open-ended
✓ Questions difficult in interpretation should be avoided.
✓ Control questions should bet place in the questionnaire to facilitate cross
check for testing and reliability of information
✓ Question affecting sentiments should be avoided
✓ Adequate space should be provided in questionnaire
✓ Brief directions should be given at necessary places
✓ Ensure better quality of paper, colour for drawing attention.
✓ Questionnaire should be short and simple
✓ Size of the questionnaire should be kept to the minimum
✓ Questions should proceed in Logical sequence
✓ Personal questions should be left to the end
✓ Technical terms should be avoided
✓ Questions may be Dichotomous (yes / No) or multiple choice or open-ended
✓ Questions difficult in interpretation should be avoided.
✓ Control questions should bet place in the questionnaire to facilitate cross
check for testing and reliability of information
✓ Question affecting sentiments should be avoided
✓ Adequate space should be provided in questionnaire
✓ Brief directions should be given at necessary places
✓ Ensure better quality of paper, colour for drawing attention.
✓ Be clear about the various aspects of the problem to be dealt with
✓ Keep in mind the nature of information sought, sample respondents and kind
of analysis intended
✓ Rough draft of the questionnaire should be prepared first by giving due
thought to the appropriate sequence of putting questions.
✓ Should re-examine and revise the rough draft Questionnaire
✓ Technical defects must be minutely scrutinized and removed
✓ Pilot study should be undertaken for pre-testing the Questionnaire
✓ Questionnaire should be edited as per the feedback of pilot survey
✓ Provide straight forward directions to clearly understand the questions by the
✓ Considerations:
▪ Nature, scope and objectives of study
▪ Level of precision required
▪ Availability of funds and involvement of time
▪ Level of efforts and expertise
✓ Data Processing and Analysis :
• Processing implies editing, coding, classification and tabulation of
collected data to help further analysis
• Analysis refers to computation of certain measures along with searching
for patterns of relationship that exist among data groups
• Analysis involves organizing data for answering the research questions
✓ Data Classification:
▪ Arrangement of data in groups as per common characteristics
▪ Geographical (area-wise)
▪ Chronological (basis of time)
▪ Qualitative (Attributes)
▪ Quantitative (Magnitude)
Origin Type of error
1. Researcher Wrong question, Inappropriate analysis,
Misinterpretation, Experimenter expectation,
2. Sample Wrong target, wrong method, wrong people
3.Interviewer Interviewer bias, interpretation, Carelessness
a) Scale Rounding off, cutting off
b) Questionnaire Positional, Ambiguity, Evoked set, Construct-
question incogrence
5. Respondent Consistency / inconsistency, Ego humility, Fatigue,
Lack of commitment, Radom
Thank You for Your Kind Attention

