Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 33

BUSI 1450 Statistics

Chapter 7
Dr. Khalil

1
7.1 Sampling
Reasons for Sampling
• The sample can save money
• The sample can save time
• For given resources, the sample can broaden the scope of
the study
• Because the research process is sometimes destructive, the
sample can save product
• If accessing the population is impossible, the sample is the
only option
Copyright ©2020 John Wiley & Sons, Inc. 2
7.1 Sampling
Reasons for Taking a Census
• Eliminate the possibility that by chance a randomly selected sample may
not be representative of the population
• For the safety of the consumer
• To benchmark data for future studies
Frame
• List, map, directory, or some other source used in the sampling process
to represent the population
• Also called the working population
Copyright ©2020 John Wiley & Sons, Inc. 3
7.1 Sampling
Frame
• Overregistered: contains units that are not in the target population
• Underregistered: does not contain some units that are in the target
population
FIGURE 7.1 The Frame and the Target Population

Copyright ©2020 John Wiley & Sons, Inc. 4


7.1 Sampling
Random Versus Nonrandom Sampling
• Random sampling: every unit of the population has the same probability of being
selected into the sample
• Nonrandom sampling: not every unit of the population has the same probability of
being selected into the sample
o Generally NOT an appropriate technique for gathering data for statistical analysis
Simple Random Sampling
• Each unit in the frame is numbered from 1 to N (the size of the population)
• A random number table or generator is used to select n items into the sample

Copyright ©2020 John Wiley & Sons, Inc. 5


7.1 Sampling
Example
• From the table of random numbers (Table 7.1), two-digit numbers are
selected, discarding any that are over 30
• In the table of random numbers, the first two digits are 91, which is
unusable
• The second two digits are 56, also unusable, as is 74, the next two digits
• The fourth set of two digits is 25, which corresponds with Occidental
Petroleum

Copyright ©2020 John Wiley & Sons, Inc. 6


7
7.1 Sampling
TABLE 7.1 A Brief Table of Random Numbers

91567 42595 27958 30134 04024 86385 29880 99730


46503 18584 18845 49618 02304 51038 20655 58727
34914 63974 88720 82765 34476 17032 87589 40836
57491 16703 23167 49323 45021 33132 12544 41035
30405 83946 23792 14422 15059 45799 22716 19792
09983 74353 68668 30429 70735 25499 16631 35006
85900 07119 97336 71048 08178 77233 13916 47564

Copyright ©2020 John Wiley & Sons, Inc. 8


7.1 Sampling
Example
• Continue moving across the rows until 6 two-digit numbers are selected
• Sample will be:
o (25) Simcoe Canada Land Development Inc.
o (27) Sweetspot.ca Inc.
o (01) Acceleware Corp.
o (04) Audability Inc.
o (02) Apption Software (Apption corp.)
o (29) Unity Telecom Corp

Copyright ©2020 John Wiley & Sons, Inc. 9


7.1 Sampling
Stratified Random Sampling
• Population is divided into nonoverlapping subpopulations
(strata)
• Analyst selects a random sample from each
o Can reduce sampling error, because sample will more closely match
the population
o More costly than a simple random sample
o Strata are usually chosen based on available information about the
population

• Within each group, there should be homogeneity


• Between each group, there should be heterogeneity
FIGURE 7.2 Stratified Random Sampling of Cable Television Viewers

Copyright ©2020 John Wiley & Sons, Inc. 10


7.1 Sampling
Stratified Random Sampling
• Proportionate stratified random sampling occurs when the
percentage of the sample taken from each stratum is proportionate to
the percentage that each stratum is within the whole population
• Disproportionate stratified random sampling occurs when the
percentage of each stratum in the sample is different from the
percentage that each stratum is within the whole population
Systematic Sampling
• Every kth item is selected to produce a sample of size n from a
population of size N N
k
n
Copyright ©2020 John Wiley & Sons, Inc. 11
Example (Problem 7.8)
• If a company employs 3,500 people and if a random sample of 175 of these employees
has been taken by systematic sampling, what is the value of k? The analyst would start
the sample selection between which two values? Where could the analyst obtain a
frame for this study?

12
7.1 Sampling
Cluster (or Area) Sampling
• Dividing population into nonoverlapping areas
• Clusters that are internally heterogeneous
o Example: states, cities
• If clusters are too large, a second set of clusters can be taken from the initial
cluster (two-stage sampling)
• Advantages: convenience, cost
• Disadvantages: may be less efficient than simple random sampling if the
elements of the cluster are similar

Copyright ©2020 John Wiley & Sons, Inc. 13


Example (Problem 7.9)
• For each of the following research projects, list at least one area or cluster that could be used in
obtaining the sample.

• a. A study of road conditions in the province of Nova Scotia

• b. A study of Canadian offshore oil wells

• c. A study of the environmental effects of petrochemical plants west of the St. Lawrence River

14
7.1 Sampling
Nonrandom Sampling
• Any method that does not involve a random selection process
Convenience Sampling
• Selected for the convenience of the analyst
Judgment Sampling
• Chosen by the judgement of the analyst
o Since the probability of an element being selected cannot be determined, cannot determine
sampling error
o Can be biased due to systematic errors in judgment

Copyright ©2020 John Wiley & Sons, Inc. 15


7.1 Sampling
Quota Sampling
• Population subclasses, such as age or gender, are used as strata
• Can be useful if no frame is available for the population
• Can be less costly
• Nonrandom, and thus probabilities cannot be calculated
Snowball Sampling
Survey subjects are selected based on referral from other survey respondents

Copyright ©2020 John Wiley & Sons, Inc. 16


7.1 Sampling
Sampling Error
• Occurs when the sample is not representative of the population
Non-sampling Error
• All other errors other than sampling error
o Missing data
o Recording errors
o Measurement errors
o Input processing errors
o Analysis errors
o Response errors
o And many more!
Copyright ©2020 John Wiley & Sons, Inc. 17
7.2 Sampling Distribution of
Suppose that a small, finite population contains only N = 8 numbers:
54 55 59 63 64 68 69 70
• Distribution of the population data:

• Suppose that all possible samples of size n = 2 are taken from this
population
Copyright ©2020 John Wiley & Sons, Inc. 18
7.2 Sampling Distribution of
(54, 54) (55, 54) (59, 54) (63, 54)
(54, 55) (55, 55) (59, 55) (63, 55)
Population: (54, 59) (55, 59) (59, 59) (63, 59)
(54, 63) (55, 63) (59, 63) (63, 63)
54 55 59 63 64 68 69 70 (54, 64) (55, 64) (59, 64) (63, 64)
(54, 68) (55, 68) (59, 68) (63, 68)
All possible samples of n = 2: (54, 69) (55, 69) (59, 69) (63, 69)
(54, 70) (55, 70) (59, 70) (63, 70)
• Then take the means of all of (64, 54) (68, 54) (69, 54) (70, 54)

the samples (64, 55) (68, 55) (69, 55) (70, 55)
(64, 59) (68, 59) (69, 59) (70, 59)
(64, 63) (68, 63) (69, 63) (70, 63)
(64, 64) (68, 64) (69, 64) (70, 64)
(64, 68) (68, 68) (69, 68) (70, 68)
(64, 69) (68, 69) (69, 69) (70, 69)
(64, 70) (68, 70) (69, 70) (70, 70)

Copyright ©2020 John Wiley & Sons, Inc. 19


7.2 Sampling Distribution of
Means of the samples:
54 54.5 56.5 58.5 59 61 61.5 62
54.5 55 57 59 59.5 61.5 62 62.5
56.5 57 59 61 61.5 63.5 64 64.5
58.5 59 61 63 63.5 65.5 66 66.5
59 59.5 61.5 63.5 64 66 66.5 67
61 61.5 63.5 65.5 66 68 68.5 69
61.5 62 64 66 66.5 68.5 69 69.5
62 62.5 64.5 66.5 67 69 69.5 70

Copyright ©2020 John Wiley & Sons, Inc. 20


7.2 Sampling Distribution of
Distribution of the means of the samples:

Copyright ©2020 John Wiley & Sons, Inc. 21


7.2 Sampling Distribution of
Distribution of the mean of the samples looks different from
the original distribution:

Copyright ©2020 John Wiley & Sons, Inc. 22


7.2 Sampling Distribution of
The Central Limit Theorem
• If random samples of size n are repeatedly drawn from a population
that has a mean of μ and a standard deviation of σ, the sample means,
x , are approximately normally distributed for sufficiently large sample
sizes (n ≥ 30), regardless of the shape of the population distribution. If
the population is normally distributed, the sample means are normally
distributed for any size sample.
• It can be shown that the mean of the sample means is the population
mean: x  
• The standard deviation of the sample means (the standard error of
the mean) is:

x 
n
Copyright ©2020 John Wiley & Sons, Inc. 23
7.2 Sampling Distribution of

24
7.2 Sampling Distribution of

Figure 7.6 Shapes of the Distributions of Sample Means for Three Sample Sizes
Drawn from Four Different Population Distributions
Copyright ©2020 John Wiley & Sons, Inc. 25
Example (Problem 7.13)
• A population has a mean of 50 and a standard deviation of 10. If a random sample of 64
is taken, what is the probability that the sample mean is each of the following?
• a. Greater than 52
• b. Less than 51
• c. Less than 47
• d. Between 48.5 and 52.4
• e. Between 50.6 and 51.3

26
7.2 Sampling Distribution of
Sampling from a Finite Population
• In cases of a finite population, a statistical adjustment must be made
to the z formula for sample means
x 
z
 N n
n N 1
• A rough rule of thumb is to use the finite population correction
n
factor when  .05
N
• The correction factor reduces the size of the standard error of the
mean, because when the sample is large relative to the population,
the sample mean is less likely to vary from the population mean

Copyright ©2020 John Wiley & Sons, Inc. 27


Example (Problem 7.17)

28
7.3 Sampling Distribution of
The sample proportion is computed by dividing the frequency with which a given characteristic
occurs in the sample by the number of items in the sample
𝑥
^=
𝑝
𝑛
where x = the number of items in a sample that have the characteristic
n = the number of items in the sample
The central limit theorem applies to sample proportions in that the normal distribution
approximates the shape of the distribution of sample proportions if
n  p  5 and n  q  5
(p is the population proportion and q = 1 − p)
• The mean of sample proportions is p (the population proportion)

Copyright ©2020 John Wiley & Sons, Inc. 29


7.3 Sampling Distribution of
• The standard error of the proportion is
pq Where
n p = population proportion
• The z formula is q = 1− p
n = sample size
p̂  p pˆ  sample proportion
z
pq
n

Copyright ©2020 John Wiley & Sons, Inc. 30


7.3 Sampling Distribution of
Example: Suppose 60% of the electrical contractors in a region use
a particular brand of wire. What is the probability of taking a
random sample of size 120 from these electrical contractors and
finding that .50 or less use that brand of wire?
.50  .60
z  2.24
.60  .40
120
• The z distribution table gives a value of .4875
• .5 − .4875 = .0125
Copyright ©2020 John Wiley & Sons, Inc. 31
7.3 Sampling Distribution of
There is only a 1.25% chance
of finding that 50% or less of
a sample of 120 contractors
use a given brand of wire if
the population proportion
is .60

Copyright ©2020 John Wiley & Sons, Inc. 32


Example (Problem 7.23)

33

You might also like