Professional Documents
Culture Documents
21MST18 M2
21MST18 M2
21MST18 M2
While the degree of impact from faulty data collection may vary by discipline and the nature
of the investigation, there is the potential to cause disproportionate harm when these research
results are used to support public policy recommendations.
Data types
There are two main approaches for data collection about a problem, person or phenomenon
viz., Primary data and secondary data.
1. Primary Data –
In this type, the information must be collected by the researcher in person. Sources used
in this approach are called primary sources.
In summary, primary sources provide first-hand data, whereas secondary sources provide
second-hand data.
Primary research can be done through various methods, but this type of research is often
based on the principles of the scientific method. This means that in the process of doing
primary research, researchers develop research questions or hypotheses, collect and analyze
measurable, empirical data, and draw evidence-based conclusions.
2. Secondary Data - Sometimes information required is already available and needs only to
be extracted. Information gathered using this approach is said to be collected from
secondary sources.
Researchers have plenty of options to explore when it comes to doing secondary research.
The following sources can assist researchers in doing secondary research:
The Internet makes secondary research significantly easier for researchers today. Many
government agencies and educational institutions, for instance, make their data available
online so researchers can easily download information for their use. There are even web
applications for creating word clouds to visualize the frequency of keywords for topics in
databases.
When using data from secondary sources you need to be careful as there may be certain
problems with the availability, format and quality of data. The extent of these problems
varies from source to source. While using such data some issues you should keep in mind are:
1. Validity and reliability - The validity and reliability of information may vary from
source to source. For example, data collection through census may be more valid than
data collection through personal diaries.
2. Personal bias - The information obtained from personal diaries, magazines and
newspapers may be personal bias, as these writers are likely to exhibit less
rigorousness and objectivity than one would expect in a research report.
3. Availability of data - It is common for beginning researchers to assume that the
required data will be available. But you cannot and should not make this assumption.
Therefore, it is important to make sure that the required data is available before you
proceed further with your study.
4. Format - Before deciding to use the data from secondary sources, it is equally
important to make sure that the data are available in the required format. For example,
you might need to analyse the age in the categories 23-33, 34-48 etc., but in your
sources age may be categorized differently, e.g., 21-24, 25-29 etc.
A schedule is a formalized set of questions, statements and spaces for answers, provided to
the enumerators who ask questions to the respondents and note down the answers.
The important points of difference between the questionnaire and schedule are as under:
Developing a questionnaire
Designing of questionnaires and schedules are similar and the flowchart below gives as
overall idea of formulating the same.
1. Deciding on the information required - It should be noted that one does not start by
writing questions. The first step is to decide 'what are the things one needs to know
from the respondent in order to meet the survey's objectives?' These, as has been
indicated in the opening chapter of this textbook, should appear in the research brief
and the research proposal. One may already have an idea about the kind of
information to be collected, but additional help can be obtained from secondary data,
previous rapid rural appraisals and exploratory research. In respect of secondary data,
the researcher should be aware of what work has been done on the same or similar
problems in the past, what factors have not yet been examined, and how the present
survey questionnaire can build on what has already been discovered. Further, a small
number of preliminary informal interviews with target respondents will give a
glimpse of reality that may help clarify ideas about what information is required.
2. Define the target respondents - At the outset, the researcher must define the
population about which he/she wishes to generalise from the sample data to be
collected. For example, in marketing research, researchers often have to decide
whether they should cover only existing users of the generic product type or whether
to also include non-users. Secondly, researchers have to draw up a sampling frame.
Thirdly, in designing the questionnaire we must take into account factors such as the
age, education, etc. of the target respondents.
3. Choose the method(s) of reaching target respondents - It may seem strange to be
suggesting that the method of reaching the intended respondents should constitute part
of the questionnaire design process. However, a moment's reflection is sufficient to
conclude that the method of contact will influence not only the questions the
researcher is able to ask but the phrasing of those questions. The main methods
available in survey research are:
1. personal interviews
2. group or focus interviews
3. mailed questionnaires
4. telephone interviews.
Within this region, the first two mentioned are used much more extensively than the
second pair. However, each has its advantages and disadvantages. A general rule is
that the more sensitive or personal the information, the more personal the form of data
collection should be.
4. Decide on question content - Researchers must always be prepared to ask, "Is this
question really needed?" The temptation to include questions without critically
evaluating their contribution to the achievement of the research objectives, as they are
specified in the research proposal, is surprisingly strong. No question should be included
unless the data it gives rise to is directly of use in testing one or more of the hypotheses
established during the research design.
There are only two occasions when seemingly "redundant" questions might be included:
Opening questions that are easy to answer and which are not perceived as being
"threatening", and/or are perceived as being interesting, can greatly assist in
gaining the respondent's involvement in the survey and help to establish a rapport.
This, however, should not be an approach that should be overly used. It is almost
always the case that questions which are of use in testing hypotheses can also serve
the same functions.
"Dummy" questions can disguise the purpose of the survey and/or the sponsorship
of a study. For example, if a manufacturer wanted to find out whether its
distributors were giving the consumers or end-users of its products a reasonable
level of service, the researcher would want to disguise the fact that the distributors'
service level was being investigated. If he/she did not, then rumours would abound
that there was something wrong with the distributor.
5. Develop the question wording - Survey questions can be classified into three forms,
i.e. closed, open-ended and open response-option questions. So far only the first of
these, i.e. closed questions has been discussed.
This type of questioning has a number of important advantages;
o It provides the respondent with an easy method of indicating his answer - he does
not have to think about how to articulate his answer.
o It 'prompts' the respondent so that the respondent has to rely less on memory in
answering a question.
o Responses can be easily classified, making analysis very straightforward.
o It permits the respondent to specify the answer categories most suitable for their
purposes.
Even after the researcher has proceeded along the lines suggested, the draft questionnaire
is a product evolved by one or two minds only. Until it has actually been used in
interviews and with respondents, it is impossible to say whether it is going to achieve the
desired results. For this reason, it is necessary to pre-test the questionnaire before it is
used in a full-scale survey, to identify any mistakes that need correcting.
o whether the questions as they are worded will achieve the desired results
o whether the questions have been placed in the best order
o whether the questions are understood by all classes of respondent
o whether additional or specifying questions are needed or whether some
questions should be eliminated
o whether the instructions to interviewers are adequate.
Usually, a small number of respondents are selected for the pre-test. The respondents
selected for the pilot survey should be broadly representative of the type of respondents
to be interviewed in the main survey.
If the questionnaire has been subjected to a thorough pilot test, the final form of the
questions and questionnaire will have evolved into its final form. All that remains to be
done is the mechanical process of laying out and setting up the questionnaire in its final
form. This will involve grouping and sequencing questions into an appropriate order,
numbering questions, and inserting interviewer instructions.
Sampling Methods
Types of Probability Sampling - There are several sampling methods that fall under
probability sampling. In each method, those who are within the sample frame have some
chance of being selected to participate in a study. Four of the common types of probability
sampling are:
a. Simple Random Sample: The most basic form of probability sampling, in a simple
random sample each member of a population is assigned an identifier such as a number,
and those selected to be within the sample are picked at random, often using an
automated software program.
Advantages
Lack of Bias - The use of simple random sampling removes all hints of bias or at
least it should. Because individuals who make up the subset of the larger group are
chosen at random, each individual in the large population set has the same
probability of being selected. In most cases, this creates a balanced subset that carries
the greatest potential for representing the larger group as a whole.
Simplicity - There are no special skills involved in using this method, which can
result in a fairly reliable outcome. This method involves dividing larger groups into
smaller subgroups that are called strata. Members are divided up into these groups
based on any attributes they share. As mentioned, individuals in the subset are
selected randomly and there are no additional steps.
Limitations
Difficulty Accessing Lists of the Full Population - An accurate statistical measure of
a large population can only be obtained in simple random sampling when a full list of
the entire population to be studied is available. Think of a list of students at a
university or a group of employees at a specific company. The problem lies in the
accessibility of these lists. As such, getting access to the whole list can present
challenges. Some universities or colleges may not want to provide a complete list of
students or faculty for research. Similarly, specific companies may not be willing or
able to hand over information about employee groups due to privacy policies.
Time Consuming - When a full list of a larger population is not available, individuals
attempting to conduct simple random sampling must gather information from other
sources. If publicly available, smaller subset lists can be used to recreate a full list of
a larger population, but this strategy takes time to complete. Organizations that keep
data on students, employees, and individual consumers often impose lengthy
retrieval processes that can stall a researcher's ability to obtain the most accurate
information on the entire population set.
Costs - In addition to the time it takes to gather information from various sources, the
process may cost a company or individual a substantial amount of capital. Retrieving
a full list of a population or smaller subset lists from a third-party data provider may
require payment each time data is provided. If the sample is not large enough to
represent the views of the entire population during the first round of simple random
sampling, purchasing additional lists or databases to avoid a sampling error can be
prohibitive.
Let us take an example to understand this sampling technique. The population of the US
alone is 330 million. It is practically impossible to send a survey to every individual to gather
information. Use probability sampling to collect data, even if you collect it from a smaller
population.
For example, an organization has 500,000 employees sitting at different geographic locations.
The organization wishes to make certain amendments in its human resource policy, but
before they roll out the change, they want to know if the employees will be happy with the
change or not. However, it’s a tedious task to reach out to all 500,000 employees. This is
where probability sampling comes handy. A sample from the larger population i.e., from
500,000 employees, is chosen. This sample will represent the population. Deploy a survey
now to the sample.
From the responses received, management will now be able to know whether employees in
that organization are happy or not about the amendment.
o Choose your population of interest carefully: Carefully think and choose from the
population, people you believe whose opinions should be collected and then include them
in the sample.
o Determine a suitable sample frame: Your frame should consist of a sample from your
population of interest and no one from outside to collect accurate data.
o Select your sample and start your survey: It can sometimes be challenging to find the
right sample and determine a suitable sample frame. Even if all factors are in your favor,
there still might be unforeseen issues like cost factor, quality of respondents, and
quickness to respond. Getting a sample to respond to a probability survey accurately might
be difficult but not impossible.
But, in most cases, drawing a probability sample will save time, money, and a lot of
frustration.
1. When you want to reduce the sampling bias: This sampling method is used when the
bias has to be minimum. The selection of the sample largely determines the quality of the
research’s inference. How researchers select their sample largely determines the quality
of a researcher’s findings. Probability sampling leads to higher quality findings because
it provides an unbiased representation of the population.
Non-probability sampling
Non-probability sampling is when a sample is created through a non-random process. This
could include a researcher sending a survey link to their friends or stopping people on the
street. This type of sampling would also include any targeted research that intentionally
samples from specific lists such as aid beneficiaries, or participants in a specific training
course. Non-probability samples are often used during the exploratory stage of a research
project, and in qualitative research, which is more subjective than quantitative research, but
are also used for research with specific target populations in mind, such as farmers that grow
maize.
a. Convenience Sample: As its name implies, this method uses people who are convenient
to access to complete a study. This could include friends, people walking down a street,
or those enrolled in a university course. Convenience sampling is quick and easy, but
will not yield results that can be applied to a broader population.
Advantages
1. Simplicity of sampling and the ease of research
2. Helpful for pilot studies and for hypothesis generation
3. Data collection can be facilitated in short duration of time
4. Cheapest to implement that alternative sampling methods
Disadvantages
1. Highly vulnerable to selection bias and influences beyond the control of the
researcher
2. High level of sampling error
3. Studies that use convenience sampling have little credibility due to reasons above
b. Snowball Sample: A snowball sample works by recruiting some sample members who
in turn recruit people they know to join a sample. This method works well for reaching
very specific populations who are likely to know others who meet the selection criteria.
Advantages
1. The ability to recruit hidden populations
2. The possibility to collect primary data in a cost-effective manner
3. Studies with snowball sampling can be completed in a short duration of time
4. Very little planning is required to start the primary data collection process
Disadvantages
1. Oversampling a particular network of peers can lead to bias
2. Respondents may be hesitant to provide names of peers and asking them to do so
may raise ethical concerns
3. There is no guarantee about the representativeness of samples. It is not possible to
determine the actual pattern of distribution of the population.
4. It is not possible to determine the sampling error and make statistical inferences
from the sample to the population due to the absence of a random selection of
samples.
In an organization, for studying the career goals of 500 employees, technically, the sample
selected should have proportionate numbers of males and females. This means there should
be 250 males and 250 females. Since this is unlikely, the researcher selects the groups or
strata using quota sampling.
Researchers also use this type of sampling to conduct research involving a particular illness
in patients or a rare disease. Researchers can seek help from subjects to refer to other
subjects suffering from the same ailment to form a subjective sample to carry out the study.