Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

SOURCES OF DATA: SAMPLING PROCEDURES

Objectives

1. To differentiate primary and secondary sources of data and give examples.


2. To explain the three methods of collecting data.
3. To illustrate with example the sampling techniques.
4. To calculate the sample size from a given population.
5. To appreciate the role of sampling in research.

Primary Sources of Educational Data (Best & Kahn, 2003)

1. Official Records and Other Documentary Materials


These include records and reports of legislative bodies, City superintendents,
principals, presidents, deans, department heads, educational committees, licenses,
certificates, report cards, extra-curricular events, pictures, drawings, maps letters,
diaries and recordings.

2. Oral Testimony
This category includes interviews with administrators, teachers and other school
employees, students and relatives, school patrons, or lay citizens, and members of
governing bodies.

3. Relics
This category includes buildings, furniture, teaching materials, equipment, murals,
decorative pictures, textbooks, examinations, and samples of student work.

Secondary Sources of Educational Data

1. Reports of a person who relates the testimony of an actual witness of, or participant in
an event
2. Writer of the secondary source who was not on the scene of the event but merely
reported what the person who was there said or wrote.
3. Textbooks and encyclopedias

Data Collected through the following:

1. Subjective Method. This can be done by asking people questions either directly or
indirectly. The direct or interview method is a method of asking questions orally to
respondents, hence, there is a face-to-face interaction or encounter between the
interviewer and the interviewee, the purpose of which is to get more in-depth information
about perceptions, insights, attitudes, experiences or beliefs. Although it is useful for
gaining insights and context into a topic, it is, however, time consuming, expensive and
susceptible to interview bias as compared to other data collection methods.
The indirect or questionnaire method is a method of asking questions with the use of
a questionnaire. A questionnaire is a set of questions for gathering information from
respondents who are required to fill out the forms themselves. Questionnaires can be
hand-carried to the respondents or sent by mail and collected later or returned by stamped
addressed envelope. It can also be sent and retrieved electronically (by email).

2. Objective Method. This can be done by actual observation or experimentation.


The observation method is a method of collecting data that involves listening and
watching very carefully in order to discover particular information about the characteristics
of a population. The experiment method makes use of a controlled study in which one
attempts to determine the cause-and-effect relationships. The study is controlled in the

1
sense that the researcher controls how subjects are assigned to groups and which
treatments each group receives.

3. Use of Existing Records. This method makes use of data that were previously gathered
by another person and were compiled and made available by institutions or agencies.

Population and Sample

Population is the totality or aggregate of a group of persons, things, or phenomenon about which
we want to describe and draw a conclusion. Some populations are small in size while others are large.
When a population is small, complete enumeration may be used for a study or investigation and any
numeric quantity derived that is used to describe the population is called parameter.

When the population is very large, collecting information may not be possible, may be time
consuming and costly. However, the use of a sample, which is any part or subset of a population, will be
less expensive and data collection is faster and since the data is smaller, it is possible to ensure and
improve the accuracy and quality of the data.

Many statistical studies are conducted on samples which are used as bases to describe or infer
about a population. Any characteristic that describes the sample is called statistic. It is a quantity,
calculated from a sample of data, used to estimate a parameter. For example, the average of the data in a
sample is used to give information about the overall average in the population from which that sample was
drawn. Parameters are often estimated since their value is generally unknown, especially when the
population is large enough that it is impossible or impractical to obtain measurements for all units.
Parameters are normally represented by Greek letter such as the population mean and population
variance, represented by the Greek letters, µ and ∂2 , respectively.On the other hand, the sample mean
and variance, two of the most common statistics derived from samples, are denoted by the symbols x and
s2, respectively.

Sampling Techniques

Samples can be drawn from a population using any of the two general types of sampling: Probability
Sampling and Nonprobability Sampling.

Probability Sampling is one in which every unit in the population has a chance (greater than zero) of
being selected in the sample. In other words, all units in the population are given equal chance to be
included in the sample. Probability sampling includes: simple random sampling, systematic sampling,
stratified sampling, multistage sampling and cluster sampling.

1. Simple Random Sampling is a method of sampling wherein all units in the population are given
equal chance to be included in the sample. It is most appropriate when the entire population from
which the sample is taken is homogeneous. This is accomplished by drawing lots or by using a
table of random numbers.

2. Systematic Sampling is a method of selecting sample members from a larger population


according to a random start which is drawn from the first k after which every kth element in the
population is selected where k the sampling interval sometimes called the “skip” is calculated as:

where n is the sample size, and N is the population size.

Here are the steps to be followed in order to achieve a systematic sample:

 number the units in the population from 1 to N


 decide on the n (sample size) that you want or need
 k = N/n = the interval size
 randomly select an integer from 1 to k
 then take every kth unit thereafter.

If your subjects are in a random order then systematic sampling is equivalent to random
sampling.
2
3. Stratified Sampling is a probability sampling method wherein the entire population is divided
into different subgroups or strata, then the final subjects are randomly selected proportionally and
independently from the different strata. Stratified sampling techniques are generally used when the
population is heterogeneous, or dissimilar, where certain homogeneous, or similar, sub-populations
can be isolated (strata). It is designed to organize the population into homogenous subsets before
sampling. If sampling is done using simple random sampling, the method is called stratified
random sampling and if it is done using systematic sampling, it is called stratified systematic
sampling. The most common strata used in stratified random sampling are age, gender,
socioeconomic status, religion, nationality and educational attainment.

4. Cluster Sampling is a sampling method used when "natural" groupings are evident in a statistical
population. It is a method of sampling wherein the entire population is divided into clusters.

5. Multistage Sampling is a sampling process involving several stages in which units at each
subsequent stage are subsampled from previously selected larger units. At the first stage, large
groups or clusters of population units are selected. These clusters are designed to contain more
units than are required for a final sample. At the second stage, units are sampled from the selected
clusters to derive the final sample. If more than two stages are used, the process of selecting "sub-
clusters" within clusters continues until the final sample is achieved. The same practical
considerations apply to multi-stage sampling as to the cluster sampling. Multi-stage sampling is
generally used when it is costly or impossible to form a list of all the units in the target population.
This is our most sophisticated sampling strategy and it is often used in large epidemiological
studies. To obtain a representative national sample, researchers may select zip codes at random
from each state. Within these zip codes, streets are randomly selected.

Example1: Three-stage household survey.

Stage 1: Municipalities or Towns


Municipalities or towns are sampled from a province

Stage 2: Barangays
Barangays are selected from within the selected municipalities or towns.

Stage 3: Houses
Houses are selected from within the selected barangays.

Example 2: Two-stage plant sampling of a certain species

Stage 1: Plots: Take a sample of plots

Stage 2: Plants : Take a sample of plants within each selected plot

Nonprobability sampling is any sampling method where some elements of the population have
no chance to be included in the sample.

1. Quota sampling is a nonprobability method of sampling wherein the researcher deliberately


sets the proportions of levels or strata within the sample. This is generally done to insure the
inclusion of a particular segment of the population. The proportions may or may not differ
dramatically from the actual proportion in the population. The researcher sets a quota,
independent of population characteristics.
Example: A researcher is interested in the attitudes of members of different religions
towards the death penalty. In Iowa a random sample might miss Muslims (because there are
not many in that state). To be sure of their inclusion, a researcher could set a quota of 3%
Muslim for the sample. However, the sample will no longer be representative of the actual
proportions in the population. This may limit generalizing to the state population. But the
quota will guarantee that the views of Muslims are represented in the survey.
2. Purposive sampling is a method of sampling wherein subjects are selected in a deliberative
and non-random fashion to achieve a certain goal or to serve a very specific need or purpose.
A researcher may have a specific group in mind and may not be possible to specify the
population-they would not all be known, and access will be difficult. The researcher will

3
attempt to zero in on the target group, interviewing whoever is available. In purposive
sampling, we sample with a purpose in mind. The researcher chooses the sample based on
who they think would be appropriate for the study. For example, you are interested in
studying cognitive processing speed of young adults who have suffered closed head brain
injuries in automobile accidents. This would be a difficult population to find. Hence, purposive
sampling will only be the option.

3. Snowball sampling is a technique for developing a sample where existing study subjects
recruit future subjects from among their acquaintances. Thus the sample group appears to
grow like a rolling snowball. As the sample builds up you gain enough data to use for your
data. This sampling technique is often used in hidden populations which are difficult
researchers to access. For example, populations would be drug users. Because sample
members are not selected from a sampling frame, snowball samples are subject to numerous
biases. For example, people who have many friends are more likely to identify more
respondents than those who have very few friends.

4. Convenience sampling is a non-probability sampling technique where subjects are selected


because of their convenient accessibility and proximity to the researcher. A convenience
sample is simply one where the units that are selected for inclusion in the sample are the
easiest to access.

Sample Calculation:
(http:www.research-advisors.com/tools/SampleSize.htm)

There are various formulas for calculating the required sample size based upon whether the
data collected is to be of a categorical or quantitative nature (e.g. to estimate a proportion or a
mean). These formulas require knowledge of the variance or proportion in the population and a
determination as to the maximum desirable error, as well as the acceptable Type I error risk (e.g. ,
confidence level).

It is possible to use one of them to construct a table that suggests the optimal sample size-
given a population size; a specific margin of error, and a desired confidence interval. This can help
researchers avoid the formulas altogether. The table in

http:www.research-advisors.com/tools/SampleSize.htm

shown on page 11, presents the results of one set of these calculations. It maybe used to determine
the appropriate sample size for almost any study.

Many researchers (and research texts) suggest that the first column within the table should
suffice (Confidence Level = 95%, Margin of Error = 5%).  To use these values, simply determine the
size of the population down the left most column (use the next highest value if your exact population
size is not listed).  The value in the next column is the sample size that is required to generate a
Margin of Error of ±5% for any population proportion. 

However, a 10% interval may be considered unreasonably large. Should more precision be


required (i.e., a smaller, more useful Margin of Error) or greater confidence desired (0.01), the other
columns of the table should be employed.
Thus, if you have 5000 customers and you want to sample a sufficient number to generate a
95% confidence interval that predicted the proportion who would be repeat customers within plus or
minus 2.5%, you would need responses from a (random) sample of 1176 of all your customers. 
As you can see, using the table is much simpler than employing a formula that require
knowledge of the variance or proportion in the population and a determination as to the maximum
desirable error, as well as the acceptable Type I error risk (e.g., confidence level).

The following formula can also be used to determine the sample size given N and e above.

n = N/ (1+N*e2)
4
= 5000/ (1+5000*.0252)= 1,212. 12 = 1,212

The result of the formula given above yields a sample size greater than the result shown in the
table which is 1176 which is good.

How to Choose Sample Size for a Simple Random Sample


https://stattrek.com/sample-size/simple-random-sample.asp

Consider the following problem. You are conducting a survey to estimate a population mean or
proportion. The sampling method is simple random sampling, without replacement. You want
your survey to provide a specified level of precision.

To choose the right sample size for a simple random sample, you need to define the following
inputs.

1. Specify the desired margin of error (ME). This is your measure of precision.


2. Specify alpha.

 For a hypothesis test, alpha is the significance level.


 For an estimation problem, alpha is: 1 - Confidence level.

3. Find the critical standard score z.


a. For an estimation problem or for a two-tailed hypothesis test, the critical
standard score (z) is the value for which the cumulative probability is 1 -
alpha/2.
b. For a one-tailed hypothesis test, the critical standard score (z) is the value for
which the cumulative probability is 1 - alpha.
4. Unless the population size is very large relative to sample size (e.g., 20 times larger),
you need to specify the size of the population (N).

You will also need to know the variance of the population, σ2. Given these inputs, the following
formulas find the smallest sample size that provides the desired level of precision.

Sample Population
Sample size
statistic size
Mean Known n = { z2 * σ2 * [ N / (N - 1) ] } / { ME2 + [ z2 * σ2 / (N - 1) ] }
Mean Unknown n = ( z2 * σ2 ) / ME2
Proportion Known n = [ ( z2 * p * q ) + ME2 ] / [ ME2 + z2 * p * q / N ]
Proportion Unknown n = [ ( z2 * p * q ) + ME2 ] / ( ME2 )

This approach works when the sample size is relatively large (greater than or equal to 30). Use
the first or third formulas when the population size is known. When the population size is large
but unknown, use the second or fourth formulas.

For proportions, the sample size requirements vary, based on the value of the proportion. If you
are unsure of the right value to use, set p equal to 0.5. This will produce a conservative sample
5
size estimate; that is, the sample size will produce at least the precision called for and may
produce better precision.

Sample Problem

At the end of every school year, a certain country administers a reading test to a simple
random sample drawn without replacement from a population of 100,000 third graders. Over
the last five years, students who took the test correctly answered 75% of the test questions.

What sample size should you use to achieve a margin of error equal to plus or minus 4%,
with a confidence level of 95%?

Solution: To solve this problem, we follow the steps outlined above.

 Specify the margin of error. This was given in the problem definition. The margin of
error is plus or minus 4% or 0.04.
 Specify the confidence level. This was also given. The confidence level is 95% or 0.95.
 Compute alpha. Alpha is equal to one minus the confidence level. Thus, alpha = 1 -
0.95 = 0.05.
 Determine the critical standard score (z). Since this is an estimation problem, the
critical standard score is the value for which the cumulative probability is 1 - alpha/2 =
1 - 0.05/2 = 0.975. To find that value, we use the Normal Calculator. Recall that the
distribution of standard scores has a mean of 0 and a standard deviation of 1.
Therefore, we plug the following entries into the normal calculator: Value = 0.975;
Mean = 0; and Standard deviation = 1. The calulator tells us that the value of the
standard score is 1.96.
 And finally, we assume that the population proportion p is equal to its past value over
the previous 5 years. That value is 0.75. Given these inputs, we can find the smallest
sample size n that will provide the required margin of error.

n = [ (z2 * p * q ) + ME2 ] / [ ME2 + z2 * p * q / N ]

n = [ (1.96)2 * 0.75 * 0.25 + 0.0016] / [ 0.0016 + (1.96)2 * 0.75 * 0.25 / 100,000 ]

n = (0.7203 + 0.0016) / ( 0.0016 + 0.0000072)

n = 449.2

Therefore, to achieve a margin of error of plus or minus 4 percent, we will need to survey
450 students, using simple random sampling

Sample Size: Stratified Random Samples


https://stattrek.com/sample-size/stratified-sample.aspx.

6
The precision and cost of a stratified design are influenced by the way that sample elements are
allocated to strata.

How to Assign Sample to Strata

One approach is proportionate stratification. With proportionate stratification, the sample size of each
stratum is proportionate to the population size of the stratum. Strata sample sizes are determined by
the following equation :

nh = ( Nh / N ) * n

where nh is the  sample size for stratum h, Nh is the population size for stratum h, N is total population
size, and n is total sample size.

Another approach is disproportionate stratification, which can be a better choice (e.g., less cost, more
precision) if sample elements are assigned correctly to strata. To take advantage of disproportionate
stratification, researchers need to answer such questions as:

 Given a fixed budget, how should sample be allocated to get the most precision from a
stratified sample?
 Given a fixed sample size, how should sample be allocated to get the most precision from a
stratified sample?
 Given a fixed budget, what is the most precision that I can get from a stratified sample?
 Given a fixed sample size, what is the most precision that I can get from a stratified sample?
 What is the smallest sample size that will provide a given level of survey precision?
 What is the minimum cost to achieve a given level of survey precision?
 Given a particular sample allocation plan, what level of precision can I expect?
 And so on.

(To answer the questions, consider using the Sample Size Calculator.)

Sample Size Calculator


https://stattrek.com/sample-size/stratified-sample.aspx. Retrieved Sept. 13, 2018

Stat Trek's Sample Size Calculator can help you find the right sample allocation plan for your stratified
design. You specify your main goal - maximize precision, minimize cost, stay within budget, etc.
Based on your goal, the calculator prompts you for the necessary inputs and handles all computations
automatically. It tells you the best sample size for each stratum. The calculator creates a summary
report that lists key findings, including the margin of error. And it describes analytical techniques. And
the calculator is free. You can find the Sample Size Calculator in Stat Trek's main menu under the Stat
Tools tab. Or you can tap the button below.

How to Maximize Precision, Given a Stratified Sample With a Fixed Budget

7
The ideal sample allocation plan would provide the most precision for the least cost. Optimal
allocation does just that. Based on optimal allocation, the best sample size for stratum h would be:

nh = n * [ ( Nh * σh ) / sqrt( ch ) ] / [ Σ ( Ni * σi ) / sqrt( ci ) ]

where nh is the sample size for stratum h, n is total sample size, Nh is the population size for
stratum h, σh is the standard deviation of stratum h, and ch is the direct cost to sample an individual
element from stratum h. Note that chdoes not include indirect costs, such as overhead costs.

The effect of the above equation is to sample more heavily from a stratum when

 The cost to sample an element from the stratum is low.


 The population size of the stratum is large.
 The variability within the stratum is large.

How to Maximize Precision, Given a Stratified Sample With a Fixed Sample Size

Sometimes, researchers want to find the sample allocation plan that provides the most precision,
given a fixed sample size. The solution to this problem is a special case of optimal allocation,
called Neyman allocation.

The equation for Neyman allocation can be derived from the equation for optimal allocation by
assuming that the direct cost to sample an individual element is equal across strata. Based on
Neyman allocation, the best sample size for stratum h would be:

nh = n * ( Nh * σh ) / [ Σ ( Ni * σi ) ]

where nh is the sample size for stratum h, n is total sample size, Nh is the population size for
stratum h, and σh is the standard deviation of stratum h.

Test Your Understanding

This section presents a sample problem that illustrates how to maximize precision, given a fixed
sample size and a stratified sample. (In a subsequent lesson, we re-visit this problem and see how
stratified sampling compares to other sampling methods.)

Problem 1

At the end of every school year, a country administers a reading test to a sample of 36 third graders.
The school system has 20,000 third graders, half boys and half girls. The results from last year's test
are shown in the table below.

Stratum Mean score Standard deviation


Boys 70 10.27
This year, the
Girls 80 6.66
researchers plan to use
a stratified sample, with one stratum consisting of boys and the other, girls. Use the results from last
year to answer the following questions?

8
 To maximize precision, how many sampled students should be boys and how many should be
girls?
 What is the mean reading achievement level in the population?
 Compute the confidence interval.
 Find the margin of error

Assume a 95% confidence level.

Solution: The first step is to decide how to allocate sample in order to maximize precision. Based on
Neyman allocation, the best sample size for stratum h is:

nh = n * ( Nh * σh ) / [ Σ ( Ni * σi ) ]

where nh is the sample size for stratum h, n is total sample size, Nh is the population size for
stratum h, and σh is the standard deviation of stratum h. By this equation, the number of boys in the
sample is:

nboys = 36 * ( 10,000 * 10.27 ) / [ ( 10,000 * 10.27 ) + ( 10,000 * 6.67 ) ]

nboys = 21.83

Therefore, to maximize precision, the total sample of 36 students should consist of 22 boys and (36 -
22) = 14 girls.

The remaining questions can be answered during the process of computing the confidence interval.
Elsewhere on this website, we described how to compute a confidence interval. We employ that
process below.

 Identify a sample statistic. For this problem, we use the overall sample mean to estimate the
population mean. To compute the overall sample mean, we use the following equation (which
was introduced in a previous lesson):

x = Σ ( Nh / N ) * xh

x = ( 10,000/20,000 ) * 70 + ( 10,000/20,000 ) * 80

x = 75

Therefore, based on data from the sample strata, we estimate that the mean reading
achievement level in the population is equal to 75.

 Select a confidence level. In this analysis, the confidence level is defined for us in the problem.
We are working with a 95% confidence level.
 Find the margin of error. Elsewhere on this site, we show how to compute the margin of
error when the sampling distribution is approximately normal. The key steps are shown below.

 Find standard deviation or standard error. The equation to compute the standard error
was introduced in aprevious lesson. We use that equation here:

9
SE = (1 / N) * sqrt { Σ [ Nh2 * ( 1 - nh/Nh ) * sh2 / nh ] }

SE = (1 / 20,000) * sqrt { [ 10,0002 * ( 1 - 22/10,000 ) * (10.27)2 / 22 ] + [ 10,0002 *


( 1 - 14/10,000 ) * (6.66)2 / 14 ] }

SE = 1.41

Thus, the standard deviation of the sampling distribution (i.e., the standard error) is
1.41.

 Find critical value. The critical value is a factor used to compute the margin of error.
We express the critical value as a z-score. To find the critical value, we take these
steps.

o Compute alpha (α):

α = 1 - (confidence level / 100)

α = 1 - 99/100 = 0.01

o Find the critical probability (p*):

p* = 1 - α/2 = 1 - 0.05/2 = 0.975

o The critical value is the z-score having a cumulative probability equal to 0.975.


From the Normal Distribution Calculator, we find that the critical value is 1.96.

 Compute margin of error (ME):

ME = critical value * standard error

ME = 1.96 * 1.41 = 2.76

 Specify the confidence interval. The range of the confidence interval is defined by the sample
statistic + margin of error. And the uncertainty is denoted by the confidence level. Thus, with
this sample design, we are 95% confident that the sample estimate of reading achievement is
75 + 2.76.

In summary, given a total sample size of 36 students, we can get the greatest precision from a
stratified sample if we sample 22 boys and 14 girls. This results in a 95% confidence interval of 72.24
to 77.76. The margin of error is 2.76.

Benefits of Sampling

Sampling is done in a wide variety of research settings. Some advantages of sampling


include reduced cost, accuracy of data, greater speed in collecting data and greater
scope. It is obviously less costly to obtain data for a selected subset of a population, rather
than the entire population. Data collected through a carefully selected sample are highly
accurate measures of the larger population. Researchers can usually draw accurate
10
inferences for the entire population. Observations are easier to collect and summarize with a
sample than with a complete count.
It is important to bear in mind that the very purpose of sampling is to create a small group from
a population that has similar characteristics with the larger population as possible. In other words, we
want to have a small group that can represent the big group. In short, we want a sample that can
represent the larger population from which it was drawn.

11
Activity/Assignment

With all the experiences that you have had from the past to the present, literatures that
you have read, social issues that you have heard:
List down all possible researchable areas in your field or
specialization.
Identify the most pressing problem from the list you made in
number 1 and make a diagram/drawing/ or anything that
shows the role of theory in the research process.
3. Write at least one research question.
4. Write a brief explanation about your diagram/drawing.
5. Identify the respondents and other sources of data of your study.
6. What sampling method will you use? (that is, if sampling is
necessary) Explain why.

References:

Best, J. W., & Kahn, J. V., (2003). Research in Education, Ninth Edition, A Pearson
Education Company, Boston USA

Cochran, W.G. (1977). Sampling Techniques, 3rd ed. Wiley


https://www.wiley.com/en-ph/Sampling+Techniques,+3rd+Edition-p-9780471162407

McNabb, David E. 2015. Research Methods for Political Science, 2 nd ed., Quantitative
and Qualitative Approaches, Routledge, Taylor and Francis Group, London and New
York, USA.

Sample Calculation:
(http:www.research-advisors.com/tools/SampleSize.htm)

Sample Size: Simple Random Sampling


https://stattrek.com/sample-size/simple-random-sample.aspx. Retrieved Sept.13, 2018

Sample Size: Stratified Random Sampling


https://stattrek.com/sample-size/stratified-sample.aspx. Retrieved Sept. 13, 2018

12

You might also like