Professional Documents
Culture Documents
Chapter 5 Stratified Random Sampling Completed
Chapter 5 Stratified Random Sampling Completed
Chapter 5 Stratified Random Sampling Completed
A stratified random sample is obtained by dividing the popln into groups (called strata) and
obtaining a simple random sample from each stratum.
We control (ie decrease) variation by selecting the strata so that there is little variation within
strata. For a given cost, we obtain more information than for simple random sampling.
Another example:
CPI (Consumer Price Index) = average change in price for a fixed collection of goods and
services. There are 85 strata chosen on the basis of geographic location, popln size, popln
change, % urban.
Notation
L = number of strata
N1 N2 NL
y st = y 1+ y 2 +…+ y
N N N L
This is a weighted average of the averages from all groups, with weight proportional to stratum
size.
1
(
ni s2i
)
L
1
^(y ) =
Estimated variance of y st : Show that V st ∑ i N n
N 2 i=1
N
2
1−
i i
V st
N 1 N 2 (
^ N 1 y + N 2 y + …+ N L y
^ (y ) = V
N L )
^
¿V ( NN y )+V^ ( NN y )+…+V^ ( NN y )
1
1
2
2
L
L Why can we do this? V(X+Y)=V(X)+V(Y) +2cov
(X,Y). Cov(X, Y) = 0 because the readings from different strata are indep of one another
N 21 N 22 N 2L
¿ ^ ( y 1) +
V V^ ( y 2 ) + …+ V^ ( y L ) Why can we do this? V(aX) = a 2 V ( X)
2 2 2
N N N
n i s2i
( ) ( )
L 2
1
¿ 2 ∑ N i 1−
2 ^ ( y )=¿ 1− ni s i from Chap 4
since V i
N i=1 N i ni N i ni
Example 5 A work-study investigator wishes to estimate how long employees spend per week in union
meetings. Employees are stratified by type of job, as follows: floor workers, supervisors, janitorial staff
and administrators. A s.r.s. is obtained from each stratum, as follows:
2
Make boxplots to check for outliers. (Why? The sample mean is affected by outliers when the sample is
small.)
(a) Estimate the average time spent by employees in union meetings per week, and place a bound on
the error of the estimate. Interpret the resulting interval.
Boxplots first:
2.5
2.0
1.5
Data
1.0
0.5
0.0
N1 N N
y st = y 1+ 2 y 2 +…+ L y L
N N N
150 15 10 12
¿ ( 0.376 )+ (1.92 ) + ( 0.25 )+ ( 1.125 )
187 187 187 187
¿ .53
2 2 2
^ ( y st ) = N 12 V
V ^ ( y 1) + N 22 V^ ( y 2 ) + …+ N L2 V^ ( y L )
N N N
=
1
187 2
[1502 1−
15
150 (
.0524 /15+ 152 1−
6
15 )
.142/6+102 1−
4
10
.083/ 4+122 1−
4
(
12 )
.0625/4 ] ( ) ( )
3
= .002193
∴B = 2 √.002193=0.09
Interpret: We estimate that the average number of hours per week spent in union meetings by all
employees is somewhere between .44 hours and .62 hours.
y=0.776 (compare with y st =.53 ¿ Why are they so different? The readings from the floor
workers are down-weighted in the stratified estimate
V ( 187 29 )
^ ( y )= 1− 29 .4926 =.0144 .∴ B=√ V
^ ( y )=.24 ( Compare with Bst =0.09 )
HW Suppose that the stratified sample on p 120 was a SRS. Calculate y and B and compare with
y st and Bst .
τ
y st estimates μ=
N
∴ τ^ st =N y st estimates Nμ=τ
τ^ st =N y st =N 1 y1 + N 2 y 2+ … N L y L
( )
n i s2i
L
^ ( y st )=∑ N 2i 1−
^ ( τ^ st )=V^ ( N y st ) =N 2 V
V
i=1 N i ni
Example 5 A work-study investigator wishes to estimate how long employees spend per week in union
meetings. Employees are stratified by type of job, as follows: floor workers, supervisors, janitorial staff
and administrators. A s.r.s. is obtained from each stratum, as follows:
4
Time spent 0 0.5 0.5 2 2.5 1.5 2 0 0.5 0.5 0 1 1 1 1.5
(hours) 0.5 0.5 0.5 1.5 2
0.5 0.5 0.5
0 0 0.5 0
0.5 0.5
Average ( y i ¿ 0.367 1.92 0.25 1.125 0.78
2
Variance ( si ¿ 0.052 0.142 0.083 0.0625 0.4926
Stratum size ( 150 15 10 12 187
Ni ¿
Sample size ( 15 6 4 4 29
ni ¿
(b) Estimate the total number of hours lost per week on account of union meetings. Place a bound on
the error of the estimate. Interpret the resulting interval.
^ ( τ^ st )=V^ ( N y st ) =N 2 V
Solution 1: Use τ^ st =N y st and V ^ ( y )=0.002193
^ ( y st ) with y st =0.53∧V st
τ^ st =N y st =187∗.53=99.8 hours
^ ( τ^ st )=V^ ( N y st ) =N 2 V
V ^ ( y st )=187 2∗0.002193=76.687
Solution 2: Direct:
( )
L 2
ni s i
^ ( τ^ )=¿ ∑ N 2i 1−
V =¿ ¿
i=1 N i ni
Interpret: We estimate that the total number of hours spent by all employees in union meetings
per week is somewhere between 82.29 hours and 117.31 hours.
The allocation problem: The allocation problem is: Given a desired margin of error, how many
individuals need to be randomly selected from each stratum?
Basic ideas:
If the stratum is large, it should have greater representation in the sample ie we want
ni ∝ N i
If the readings in one stratum are more variable than the readings in another stratum then
we need more info from the stratum with the larger variability ie ni ∝ σ i
5
If it’s expensive to get readings from a stratum then we want fewer readings from that
1
stratum ie ni ∝
ci
Proportional allocation takes into account only the size of each stratum. The weight of each
N
stratum in the sample is w i= i
N
Neyman allocation takes into account the sizes of the strata and the variation within each
Ni σi
stratum. The weight of each stratum in the sample is w i=
∑ Njσ j
Optimal allocation takes into account the sizes of the strata, the variation within each stratum
and the cost of sampling from each stratum. The weight of each stratum in the sample is
Ni σi
w i=
√ ci
N jσj
∑
√cj
In order to solve an allocation problem:
To find n:
N 2i σ 2i
∑ wi
n= 2
B
+∑ N i σ i
2 2
N
4
To find ni :ni=wi n
Example 6 Suppose we wish to estimate, to within $200, the average starting salary (p.a.) of teaching
graduates from last year. There were 150 teaching graduates: 90 female and 60 male.
σ F =$ 800 ; σ M =$ 500.We specify proportional samples. Find the required sample size (n) and the
gender breakdown within the sample.
6
Solution: B = $200 , N = 150 N F =90 N M =60 σ F =$ 800 σ M =$ 500
N F 90
wF= = =.6
N 150
N M 60
wM = = =¿ .4
N 150
Step 2: Find n
N 2i σ 2i 2 2
90❑ 800❑ 60❑ 500❑
2 2
∑ wi .6
+
.4
n= 2
= 2
=36.59 so we need n=37
2B 2 200
N +∑ N i σ i 150
2 2 2
+(90∗800❑ +60∗500❑ )
4 4
n F =w F n=¿.6*37= 22
n M =w M n=¿.4*37=15
Check: 22+15=37!
Note: If there are more than two groups, it will be quicker to use a spreadsheet.
Example 7.1 Find allocation for given error bound Suppose we wish to estimate to within $200 the
average starting salary (p.a.) of the 150 teaching graduates from 2020. We find that the cost of surveying
graduates who have left town is greater than the cost of surveying those who have not left town: $5 for a
graduate who is in town and $10 for a graduate who has left town. Stratify on gender and on location (ie
out of town or not). Of the 90 female graduates, 35 have left town and 55 have not. Of the 60 male
graduates, 35 have left town and 25 have not. Assume that the approximate standard deviations of salaries
for women and men are respectively $800 and $500.
7
Ni σi
Optimal allocation so w i=
√ ci
N σ
∑ jc j
√ j
Set up spreadsheet:
N i σi
Compute , add them up (use Sum in Stat>Basic Stats> Display descriptive) and then get the
√c i
weights. Be sure to write down the weights.
8
Get the sum in the numerator of the formula for n (Adding up these):
Get the sum in the denominator of the formula for n (adding up these):
N 2i σ 2i
∑ wi 10715884138
n= 2
= 2
=36.0077
B 2 200
+∑ N i σ i
2 2
N 150 +72600000
4 4
Step 3: Then compute the sample size for each group as ni =wi n.
n1 =¿ 18 n2 =8 n3=5 n 4=5
Check: 18+8+5+5=36
(b) What is the cost of the sample in (a)? Show your work!
9
Next: what if there’s a limit on the cost of sampling?
Example 7.2 Find allocation for given cost of sampling. Suppose we wish to estimate the average
starting salary (p.a.) of the 150 teaching graduates from 2020. We have $150 to cover the cost of the
sample. We find that the cost of surveying graduates who have left town is greater than the cost of
surveying those who have not left town: $5 for a graduate who is in town and $10 for a graduate who has
left town. Stratify on gender and on location (ie out of town or not). Of the 90 female graduates, 35 have
left town and 55 have not. Of the 60 male graduates, 35 have left town and 25 have not. Assume that the
approximate standard deviations of salaries for women and men are respectively $800 and $500. This is
the same as before.
(a) Find the optimal n1 , n2 , n3 , n 4 and check that the total cost is at most $150.
(b) What is the error bound (B) for the allocation in (a)?
Solution: (a)
The weights are the same as before because the sigma’s, costs and stratum sizes are as before.
150 = ∑ ni ci =∑ wi n c i=n ∑ wi c i
150
∴150 = n ∑ wi ci =n∗6.814 Solve for n: n = =22.01 This is our new total sample size.
6.814
Now we have to figure out how many from each stratum using the weights: (I called the new sample sizes
nn_i)
10
ni =wi n :
n1 =¿ 11 n2 =5 n3=3 n 4=3
Find the total cost of this sample: 11*5+5*10+3*5+3*10=$150 Yes! We could not exceed this
(b) What is the error bound (B) for the allocation in (a)? Use spreadsheet!
( )
L 2
1 ni si
First, calculate V ( y st ) = 2 ∑ N i 1−
^ 2
:
N i=1 N i ni
( )
L 2
^ ( y st ) = 1 ∑ N 2i 1− ni si =¿ 414366667/150^2=18416.29631
V 2
N i=1 N i ni
∴ B=2 √ V^ ( y st )=2 √ 18416.29631=$ 271 Bigger than what it was before ($200)
Notice that, in Example 7.1, part (a), n is chosen so that the estimate of the average salary is within $200
(=B) of μ. Consequently n is chosen to satisfy the restriction on B:
2 2
N i σi
∑ wi
n= 2
B
+∑ N i σ i
2 2
N
4
11
The cost turns out to be $245. Compare this situation with the situation in Example 7.2. In Example 7.2,
part (a), n is chosen so that the cost is at most $150. (The error bound turns out to be $271.). The weights
are the same in both Example 7.1 and 7.2 (Weights are calculated according to optimal allocation. )
Example 8 Refer to Examples 7.1 and 7.2. Suppose that all the out-of-town graduates are surveyed at a
reunion, so the cost of surveying out-of-town graduates is the same as the cost of surveying a graduate
who did not leave town. We had N 1=55 ; N 2=35 ; N 3 =25; N 4=35 ; σ 1=σ 2 =$ 800; σ 3=σ 4=$ 500.
Suppose that we wish to estimate the average salary of the 150 graduates to within $200. Find
n1 , n2 , n3 , n 4 .
Solution: What sort of allocation are we using? (proportional, Neyman or optimal?) Neyman
Ni σi
Step 1: Calculate the weights: w i=
∑ Njσ j
First get the numerators N i σ i
Ni σi
w i=
∑ Njσ j
12
w 1=¿ .431373 w 2=.274510 w 3=.122549 w 4=¿ .171569
N 2i σ 2i
∑ wi
Step 2: Calculate n n= 2
B
N2
+∑ N i σ 2i
4
13
Now get the summation in the denominator: ∑ N i σ i
2
2
Calculate the N i σ i
2
Add up the N i σ i using Sum in Stat>Basic Stats>Display descriptive:
N 2i σ 2i
∑ wi 10404000000
n= 2
= 2
=34.96
B 2 200
+∑ N i σ i
2 2
N 150 +72600000
4 4
So use n=35
ni =wi n :
n1 =¿ 15 n2 =10 n3=4 n4 =6
14
2 2
N i σi
∑ wi
Next: Where does the formula for n come from? n=
B2
N2 +∑ N i σ 2i
4
Derivation:
√∑ √∑
2 2 2
Ni Ni ni σ i
√
Now B=2 V ( y st ) =2
N
2
V ( y i )≈ 2
N
2
(1− )
N i ni
2 2 2
B Ni ni σ i
∴ =∑ 2 (1− ) We want to find n=∑ ni
4 N N i ni
B2 N 2i
∴ =∑ 2 1−
4 N
ni σ 2i 1
= ∑ 1−
N i ni N 2 (
wi n N 2i σ 2i
Ni win N2
1
)
= ∑(
win
− (
N 2i σ 2i wi n N 2i σ 2i
N i wi n
¿) ¿ )
2 2
N2 B2 N i σi
=¿ ∑ −∑ N i σ i
2
∴
4 wi n
2 2
1 N i σ i N 2 B2
∴ ∑ = +∑ N i σ 2i
n wi 4
2 2
Ni σi
∑ wi
∴ n= 2
B
N 2
+ ∑ N i σ 2i
4
General Guidelines
1) If the stratum means are widely different then stratified random sampling with
proportional allocation will yield an estimator with smaller variance than that for SRS.
2) If costs are nearly the same for all strata but the stratum variances are widely different
then optimal allocation will yield an estimator with smaller variance than that from
proportional allocation.
Note: The allocation problem for estimating a population total (τ): weights are as for means
B
(proportional or Neyman or optimal). A bound of B onτ is the same as a bound of on μ .
N
15
Modify the formula for n:
N 2i σ 2i N i2 σ i2
∑ wi
∑ wi
n= 2
= 2
where B is the bound on τ
B B
2 ∑
N2
+ N i σ 2i + ∑ N i σ 2i
4N 4
Estimating p
L
Ni
^pst =∑ ^p
i=1 N i
This is a weighted average of the proportions from each stratum. The weights are proportional to
the sizes of the strata. (More weight is given to the large strata and smaller weights are given the
small strata.)
(∑ ) (
ni p^ i q^ i
)
L L L
Ni 1 1
^ ( ^p ) = V
^ ^pi = 2 ∑ N i Var ( ^p i)= 2 ∑ N i 1−
2 2
V st
i =1 N N i=1 N i=1 N i ni−1
^ ( ^p )= 1− n pq
From Chapter 4: V (
N n−1 )
Example. Refer to the situation in Exercise 5.6 p 154 of the text. A school desires to estimate the
average score that may be obtained on a reading comprehension exam for students in the sixth
grade. The school’s students are grouped into three tracks, with the fast learners in track I, the
slow learners in track III, and the rest in track II. The school decides to stratify on tracks because
this method should reduce the variability of test scores. The sixth grade contains 55 students in
track I, 80 in track II, and 65 in track III. A stratified random sample of 50 students is
proportionally allocated and yields simple random samples of n1 = 14, n2 = 20, and n3 = 16
from tracks I, II, and III. The test is administered to the sample of students.
16
Estimate the proportion of sixth-graders who scored 60 or more and place a bound on the error of
the estimate.
Solution:
14 12 2
^p I = =1 ^p II = =.6 ^p III = =.125
14 20 16
Spreadsheet:
L
Ni
^pst =∑ ^p =.55563
i=1 N i
17
L
^ ( ^p st ) = 1 ∑ N 2 Var ( ^p )=
V i i
N 2 i=1
( ) ni ^p i q^ i
L
1 1 1
2∑
2
N i 1− = 2 ( 83.85554 )= ( 83.5554 )=¿ .0020889
N i=1 N i ni −1 N 200
2
∴ B=2 √ .0020889=.091
Interpret: We estimate that between 46% and 64% of Grade 6 scored over 60%
28
What if we had not stratified? Then ^p= =.56
50
Compare the margins of error for stratified and unstratified. Which is better? Stratified is better
because the margin of error is smaller (.123 if not stratified. .091 if stratified.)
Example: Refer to Exercise 5.6 page 154. In terms of the percentage scoring over 60%, is the
difference between Tracks I and II significant?
Solution:
^p I −^p II =1−.6=.4
^ ( ^p − ^p )=V^ ( ^p ) + V
V ^ ( ^p )=0+.0094737
I II I II
∴ B=2∗ √.0094737=.19
What does this tell us about the significance of the difference in % scoring over 60%? Zero is not
in the CI so the difference (in terms of the % scoring over 60%) between tracks I and II is
“significant”.
Note: the procedures for differences are the same as for two independent SRS because
stratification means drawing a SRS independently from each stratum.
The allocation problem for estimating a proportion: same as for estimating means, except replace
σ i with √ p i qi.
18
Example 9: A survey is conducted amongst new residents of Abbotsford to discover whether their
housing needs have adequately been met. Stratification was by type of housing (apartment, free-standing
house or in between (duplex, townhouse, shared house)). Figures for summer 2020 were used as initial
estimates for summer 2021.
(a) Find n1 , n2 , n3and n so as to estimate to within 5% the percentage who are happy with their
accommodation. Also find the cost of this sample.
(b) Find the optimal stratum sample sizes assuming that we have only $150 for sampling.
Solution:
(a) B = 0.05
Step 1: Find w i
N i √ pi q i
w i=
√c i
∑ j √c j j
N pq
√ j
Spreadsheet:
Then get the weights (divide the last column above by the 64.7944):
19
w1= .3172, w2=.2810, w3 = .3929
Step 2: Find n:
N 2i p i qi
∑ wi
n= 2
B
+∑ N i p i q i
2
N
4
Now calculate n:
N 2i p i qi
∑ wi 11492
n= 2
= 2
=131.4
B 2 .05
N2 +∑ N i p i q i 268 +42.54
4 4
20
n1= 42; n2=38; n3= 52
Check: This allocation yields n=132 (Compare with n = 131.4 calculated above.)
(b) Find the optimal stratum sample sizes assuming that we have only $150 for sampling.
150/362=.414
New n1 = .414*42= 17
New n3 = .414*52=22
Will the bound for this allocation be larger or smaller than 5%? Larger How do you know? The
sample size n is smaller so the error bound will be larger
( )
ni p^ i q^ i
L
1
2∑
Ans: B=2∗√ V ( pst ) where
^ ^ ^
V ( ^
p )
2
N i 1−
st =
N i=1 N i ni−1
Additional comments
1) Variation within each stratum should be small otherwise V(estimator) will be larger for
stratified sampling than for SRS.
2) Many surveys ask more than one question. What if we need to estimate both popln mean
and popln %? We have to choose the stratum sample sizes in order to satisfy error bounds
on both mean and %.
21
Example 10 What if we have two targets? (eg an error bound for estimating mu and an error bound for
estimating p. How do we find a suitable sample size and allocate the sample?) Do for HW.
Suppose that an investigator wishes to find out (a) what percentage of local teachers supported a recent
strike action and (b) their average length of service in BC. There are 250 teachers in the area. The
percentage is to be estimated to within 10 percentage points and the average is to be estimated to within
one year. The teachers are stratified by salary: 100 teachers who earn less than $x p.a. form stratum 1 and
the rest form stratum 2. From past records, experience in BC ranges from 0 to 5 years in stratum 1 and 5-
30 years in stratum 2. Find the sample size and allocation necessary to achieve both of these bounds.
Post-stratification
Divide up respondents into strata after sampling. We must know the proportional representation
N
of each stratum in the popln ie we must know W i = i
N
eg Suppose we know that the popln is 52% female and 48% male. We do not know N 1=¿
number of males in popln and N 2=¿ number of females in popln.
( )
L 2
1 ni si
V ( y st ) = 2 ∑ N i 1−
^ 2
N i=1 N i ni
But now, in post-stratified sampling, the ni are random (not known in advance and may change
in repeated sampling).
( )∑ wi s2i + n12 ∑ ( 1− NN ) s
L L
^ p ( y st ) = 1 1− n
V i 2
Formula **
i
n N i=1 i=1
Ni N n
To derive this, first note that, under proportional allocation, w i= ∧ni=wi n= i .
N N
( )
Ni n
( )
2 2
^
( )∑ w s
L L L
Then V ( y st ) = 1 ni si N si 1 n
2∑
2
N i 1− =∑ wi 1 −
2
= 1− 2
i i
N i=1 N i ni i=1 N i wi n n N i=1
22
( )
N
( )
L L
^ ( y ) = 1 1− n
Now derive the result: V
1
∑ w s + n2 ∑ 1− Ni si2
2
p st i i
n N i=1 i=1
( ) ( )
L L 2 2
^ ( y ) = 1 ∑ N 2i 1− ni si =∑ w2i 1− ni s i
V st 2
N i=1 N i ni i=1 N i ni
Ni
where w i= . The reason for this is that, in a random sample, the size of each group in the
N
sample will theoretically be proportional to the size of that group in the population—that is,
N i ni
w i= ≈
N n
( ni s 2i
)
2
L
2 si N 2i s 2i ni 2
2 si 1
V ( y st ) = ∑ i
^ ∑ ∑ ∑ − ∑ wi s i ¿
2 2
w 1− = w i − =¿ w i
i=1 N i ni ni N ni N i
2
ni N
1 1
In the case of post-stratified samples, ni is random so replace with an approximation of E( )
ni ni
1 1 N
E( ) is like an average value of . ni is hypergeometric so E ( ni ) = i n=wi n
ni ni N
Then E ( )
1
≈
1 1−w i
+
ni n wi n2 w2i
This works well only if n is large.
^ ( y ) =¿ ∑ w2i s 2i
V p st ( 1 1−wi 1
)
+ 2 2 − ∑ wi s 2i =∑ w2i s 2i
n wi n wi N
1
n wi
1−w 1
( )
+¿ ∑ w 2i s2i 2 2i − ∑ wi s 2i ¿
n wi N
¿
1
n
∑ n
1 1
N
1 1
wi s 2i + 2 ∑ s 2i ( 1−wi ) − ∑ w i s2i = −
n N ( )∑ w s + n1 ∑ (1−w ) s
2
i i 2 i
2
i
¿
1
n(1−
n
N )∑ w s + n1 ∑ (1−w ) s , as required.
2
i i 2 i
2
i
ni pi q i
For proportions: replace s2i with in the formula **
ni−1
Example 11 An opinion poll on EI payments was undertaken by phone (cell and landline): 1000 people
were asked whether EI payments were adequate for basic needs. Respondents were stratified according to
whether they were employed or not. (Respondents were asked during the survey about their employment
status.) The results were summarised as follows:
23
Number of respondents EI adequate (Yes) ^pi
Unemployed 120 48 0.4
Employed 880 704 0.8
(a) The national unemployment figure was 6%. Post-stratify and find ^pst , the estimate of the
proportion of the population who think that EI is adequate for basic needs.
(b) Give a 95% confidence interval for the percentage of the proportion of the population who think
that EI is adequate for basic needs.
Solution: First use the usual estimator to estimate the proportion of the population who think that EI
is adequate for basic needs. What’s wrong with this estimator?
^p= ❑ = ❑ =? ?
❑ ❑
Is ?? a good estimate of the % of the popln who think that EI is adequate for basic needs? Why or why
not?
Solution (a): In order to give each person the popln the same weight, we need the unemployed to count
6% and the employed to count 94%.
ie w U =¿ w E =¿
Then ^pst =¿
Solution (b)
( ) (N
)
L L
^ ( y ) = 1 1− n
V
1
∑ w s + n2 ∑ 1− Ni si2
2
p st i i
n N i=1 i=1
2 ni pi qi
For proportions, si =
ni −1
n1 p 1 q 1 ❑ n p q
2
So s1= = =?∧s 22= 2 2 2 = ❑ =?
n1 −1 ❑ n2−1 ❑
p st
n (
N ❑ )
^ ( y ) = 1 1− n ()+ ❑ [ ( 1−wU ) ()+ ( 1−w E ) ()]
V
24
¿
1
n( n
)
1− ()+ ❑ [ ( 1−wU ) ()+ ( 1−w E ) ()]
N ❑
∴ B=2∗ √❑
The CI for p is
Interpret:
Caveats:
Ni
1) must be known accurately
N
Eg Suppose that
This will be too high because short men tend not to answer, making y M is too high.
The problem is that the non-response caused bias. This *can’t* be fixed by adjusting for the
proportion of men in the popln. The bias has affected y M . Can’t fix this because we don’t know
what % of men are tall, short, etc. Also, we can’t assume that the proportions of tall/short men in
the popln are the same as in the sample.
Notice that this problem is the exactly the problem faced by pollsters trying to predict outcomes
of political elections when certain types of voters (eg Trump supporters) are less likely to
participate in polls than other voters. We can’t predict the outcome of the vote because the % of
Trump supporters in the sample is not the % of Trump supporters in the popln, and we can’t
assume that those who did respond will vote like those who did not respond.
HW. Exercises 5.1, 5.3, 5.5, 5.7, 5.9, 5.11, 5.13, 5.15, 5.19 (Assume that the stratum sizes are the
same), 5.27, 5.31, 5.33
Below are the formulae for the allocation problem, all on one page for your convenience.
25
Sample size calculation and allocation to strata: Calculate w i, then n , then ni =wi n
Ni
Proportional Allocation: w i= Accounts for sizes of strata
N
Optimal allocation: Accounts for sizes of strata, different variances and costs
Ni σi
w i=
√ ci
N jσj
∑
√cj
Neyman Allocation: Assumes costs the same for all strata
Ni σi
w i=
∑ Njσ j
2 2
Ni σi
∑ wi
n= Formula(¿)
B2
N 2
+∑ N i σ 2i
4
2
B B2
Same for τ except 2 replaces For p, replace σ i with √ p i qi
4N 4
26