Chapter 3: Element Sampling Design (Part 2) : Jae-Kwang Kim

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Chapter 3: Element sampling design (Part 2)

Jae-Kwang Kim
Iowa State University
Spring, 2013
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 1 / 26
Systematic sampling
1
Systematic sampling
2
Stratied sampling
3
Domain estimation
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 2 / 26
Systematic sampling
Setup:
1
Have N elements in a list.
2
Choose a positive integer, a, called sampling interval. Let n = [N/a].
That is, N = na + c, where c is an integer 0 c < a.
3
Select a random start, r , from {1, 2, , a} with equal probability.
4
The nal sample is
A = {r , r + a, r + 2a, , r + (n 1)a} , if c < r a
= {r , r + a, r + 2a, , r + na} , if 1 r c.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 3 / 26
Systematic sampling
Sample size can be random
n
A
=
_
n if c < r a
n + 1 if r c
Inclusion probabilities

k
=

kl
=
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 4 / 26
Systematic sampling
Remark
This is very easy to do.
This is a probability sampling design.
This is not measurable sampling design: No design-unbiased
estimator of variance (because only one random draw)
Pick one set of elements (which always go together) & measure each
one: Later, we will call this cluster sampling.
Divide population into non-overlapping groups & choose an element
in each group: closely related to stratication.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 5 / 26
Systematic sampling
Estimation
Partition the population into a groups
U = U
1
U
2
U
a
where U
i
: disjoint
Population total
Y =

i U
y
i
=
a

r =1

kU
r
y
k
=
a

r =1
t
r
where t
r
=

kU
r
y
k
.
Think of nite population with a elements with measurements
t
1
, , t
a
.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 6 / 26
Systematic sampling
Estimation (Contd)
HT estimator:

Y
HT
=
t
r
1/a
,
if A = U
r
.
Variance: Note that we are doing SRS from the population of a
elements {t
1
, , t
a
}.
Var
_

Y
HT
_
=
a
2
1
_
1
1
a
_
S
2
t
where
S
2
t
=
1
a 1
a

r =1
(t
r

t)
2
and

t =

a
r =1
t
r
/a.
When the variance is small ?
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 7 / 26
Systematic sampling
Estimation (Contd)
Now, assuming N = na
V
_

Y
HT
_
= a (a 1) S
2
t
= n
2
a
a

r =1
( y
r
y
u
)
2
where y
r
= t
r
/n and y
u
=

t/n.
ANOVA: U =
a
r =1
U
r
SST =

kU
(y
k
y
u
)
2
=
a

r =1

kU
r
(y
k
y
u
)
2
=
a

r =1

kU
r
(y
k
y
r
)
2
+ n
a

r =1
( y
r
y
u
)
2
= SSW + SSB.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 8 / 26
Systematic sampling
V
_

Y
HT
_
= na SSB = N SSB = N (SST SSW) .
If SSB is small, then y
r
are more alike and V
_

Y
HT
_
is small.
If SSW is small, then V
_

Y
HT
_
is large.
Intraclass correlation coecient measures homogeniety of clusters.
= 1
n
n 1
SSW
SST
More details about will be covered in the cluster sampling.
(Chapter 4).
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 9 / 26
Systematic sampling
Comparison between systematic sampling (SY) and SRS
How does SY compare to SRS when the population is sorted by the
following way ?
1
Random ordering: Intuitively should be the same
2
Linear ordering: SY should be better than SRS
3
Periodic ordering: if period = a, SY can be terrible.
4
Autocorrelated order: Successive y
k
s tend to lie on the same side of
y
u
. Thus, SY should be better than SRS.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 10 / 26
Systematic sampling
How to quantify ? :
V
SRS
_

Y
HT
_
=
N
2
n
_
1
n
N
_
1
N 1
N

k=1
_
y
k


Y
N
_
2
V
SY
_

Y
HT
_
= n
2
a
a

r =1
( y
r
y
u
)
2
Cochran (1946) introduced superpopulation model to deal with this
problem. (treat y
k
as a random variable)
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 11 / 26
Systematic sampling
Example: Superpopulation model for a population in random order.
Denote the model by : {y
k
} iid
_
,
2
_
E

_
V
SRS
_

Y
HT
__
=
N
2
n
_
1
n
N
_

2
E

_
V
SY
_

Y
HT
__
=
N
2
n
_
1
n
N
_

2
Thus, the model expectations of the design variances are the same
under the IID model.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 12 / 26
Stratied sampling
1
Systematic sampling
2
Stratied sampling
3
Domain estimation
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 13 / 26
Stratied sampling
Stratied sampling:
1
The nite population is stratied into H subpopulations.
U = U
1
U
H
2
Within each population (or stratum), samples are drawn independently
across the strata.
Pr (i A
h
, j A
g
) = Pr (i A
h
) Pr (j A
g
) , for h = g
where A
h
is the index set of the sample in stratum h, h = 1, 2, , H.
Example: Stratied SRS
1
Stratify the population. Let N
h
be the population size of U
h
.
2
Sample size allocation: Determine n
h
.
3
Perform SRS independently (select n
h
sample elements from N
h
) in
each stratum.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 14 / 26
Stratied sampling
Why stratication ?
1
Control for domains of study
2
Flexibility in design and estimation
3
Convenience
4
Eciency
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 15 / 26
Stratied sampling
Estimation
HT estimation for t =

H
h=1
t
h
, where t
h
=

i U
h
y
i
.
1
HT estimator:

t
HT
=
H

h=1

t
h,HT
where

t
h,HT
is unbiased for t
h
.
2
Variance
Var
_

t
HT
_
=
H

h=1
Var
_

t
h,HT
_
by independence
3
Variance estimation

V
_

t
HT
_
=
H

h=1

V
h
_

t
h,HT
_
where

V
h
_

t
h,HT
_
is unbiased for Var
_

t
h,HT
_
.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 16 / 26
Stratied sampling
Example: Stratied SRS
1
HT estimator:

t
HT
=
H

h=1
N
h
y
h
where y
h
= n
1
h

i A
h
y
i
.
2
Variance
Var
_

t
HT
_
=
H

h=1
N
2
h
n
h
_
1
n
h
N
h
_
S
2
h
where S
2
h
= (N
h
1)
1

i U
h
_
y
i


Y
h
_
2
.
3
Variance estimation

V
_

t
HT
_
=
H

h=1
N
2
h
n
h
_
1
n
h
N
h
_
s
2
h
where s
2
h
= (n
h
1)
1

i A
h
(y
i
y
h
)
2
.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 17 / 26
Stratied sampling
Sample allocation: Given n =

H
h=1
n
h
, how to choose n
h
?
1
Proportional allocation: choose n
h
N
h
.
2
Optimal allocation: choose n
h
such that
minimize Var
_

t
HT
_
subject to
H

h=1
c
h
n
h
= C,
where c
h
is the cost of observing an element in stratum h and C is a
given total cost. The solution (Neyman, 1934) is
n
h
N
h
S
h
/

c
h
.
3
Properties
Under proportional allocation, the weights are all equal.
In general,
V
opt

t
HT

V
prop

t
HT

V
SRS

t
HT

where V
opt

t
HT

is the variance of the stratied sampling estimator


under optimal allocation, V
prop

t
HT

is the variance of the stratied


sampling estimator under proportional allocation, and V
SRS

t
HT

is the
variance of SRS estimator.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 18 / 26
Stratied sampling
Method of collapsed strata
n
h
1: One-per-stratum design
1
Most ecient
2
No unbiased estimator of Var
_

t
HT
_
under stratied sampling.
Form pairs of strata:

t
1
, ,

t
H

_

t
j 1
,

t
j 2
_
, j = 1, 2, , H/2
where H: even
Variance estimator

V
coll
=
H/2

j =1
_

t
j 1

t
j 2
_
2
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 19 / 26
Stratied sampling
Method of collapsed strata (Contd)
Property
E
_

V
coll
_
= E
_
_
H/2

j =1
__

t
j 1
t
j 1
_

t
j 2
t
j 2
_
(t
j 2
t
j 1
)
_
2
_
_
=
H/2

j =1
_
Var
_

t
j 1
_
+ Var
_

t
j 2
_
+ (t
j 2
t
j 1
)
2
_
=
H

h=1
Var
_

t
h
_
+
H/2

j =1
(t
j 1
t
j 2
)
2
Var
_

t
HT
_
Thus, it is a conservative variance estimator.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 20 / 26
Domain estimation
1
Systematic sampling
2
Stratied sampling
3
Domain estimation
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 21 / 26
Domain estimation
Basic setup
Estimation for domains (subpopulation): Usually want to make
inference about subpopulations as well as the whole population.
Often, we dont plan for all subpopulation of interest => random
sample size within subpopulations.
Denote domain d by U
d
U. Parameters are
N
d
= |U
d
|: number of elements in U
d
P
d
= N
d
/N: proportion of elements in U
d
. Often, N is known but N
d
is unknown.
t
d
=

i U
d
y
i
: domain total of y in domain d

Y
d
= t
d
/N
d
: domain mean of y in domain d
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 22 / 26
Domain estimation
Domain estimation
For k = 1, 2, , N, dene
z
kd
=
_
1 if k U
d
0 if k / U
d
Note that z
id
is not a random variable. (i.e., it does not depend on
the sampling scheme.)
Properties of z
kd
1

kU
z
kd
= N
d
2

Z
d
=

kU
z
kd
/N = N
d
/N = P
d
3
S
2
zd
=
1
n 1
_

kU
z
2
kd
N

Z
2
d
_
=
N
N 1
P
d
(1 P
d
)
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 23 / 26
Domain estimation
HT estimation of N
d

N
d
=

kU
z
kd
I
k

k
Under SRS,

N
d
=

kU
z
kd
I
k
n/N
= Nn
d
/n = Np
d
and
Var
_

N
d
_
=
N
2
n
_
1
n
N
_
S
2
zd
=
N
2
n
_
1
n 1
N 1
_
P
d
(1 P
d
)

V
_

N
d
_
=
N
2
n
_
1
n
N
_
s
2
zd
= N
2
_
1
n
N
_
p
d
(1 p
d
)
n 1
.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 24 / 26
Domain estimation
HT estimation of t
d
=

kU
d
y
k
=

kU
y
k
z
kd
:

t
d
=

kU
y
k
z
kd
I
k

k
=

kA
y
k
z
kd

k
.
It is unbiased for t
d
.
HT estimator of

Y
d
= t
d
/N
d
:
y
d
=

t
d

N
d
Probably not unbiased, because its a non-linear function of unbiased
estimators.
Generally, we will make population parameters look like functions of
population totals and then do HT estimation on each totals.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 25 / 26
Domain estimation
The statistical properties of y
d
can be derived from the following
approximation:
y
d
=

t
d

N
d
= f
_

N
d
,

t
d
_
.
= f (N
d
, t
d
) +
_

t
d
f (N
d
, t
d
)
_
_

t
d
t
d
_
+
_

N
d
f (N
d
, t
d
)
_
_

N
d
N
d
_
=
t
d
N
d
+
_
1
N
d
_
_

t
d
t
d
_
+
_

t
d
N
2
d
_
_

N
d
N
d
_
Thus,
Var ( y
d
)
.
= Var
_
1
N
d
_

t
d


Y
d

N
d
_
_
.
Under SRS,
Var ( y
d
)
.
=
_
1
E(n
d
)

1
N
d
_
1
N
d
1

i U
d
_
y
i


Y
d
_
2
.
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 26 / 26

You might also like