Chapter 3: Element Sampling Design (Part 2) : Jae-Kwang Kim

Chapter 3: Element sampling design (Part 2)
Jae-Kwang Kim
Iowa State University
Spring, 2013
Kim (ISU) Ch. 3: Element sampling design Spring, 2013 1 / 26
Systematic sampling
1
Systematic sampling
2
Stratied sampling
3
Domain estimation
Systematic sampling
Setup:
1
Have N elements in a list.
2
Choose a positive integer, a, called sampling interval. Let n = [N/a].
That is, N = na + c, where c is an integer 0 c < a.
3
Select a random start, r , from {1, 2, , a} with equal probability.
4
The nal sample is
A = {r , r + a, r + 2a, , r + (n 1)a} , if c < r a
= {r , r + a, r + 2a, , r + na} , if 1 r c.
Systematic sampling
Sample size can be random
n
A
=
_
n if c < r a
n + 1 if r c
Inclusion probabilities
k
=
kl
=
Systematic sampling
Remark
This is very easy to do.
This is a probability sampling design.
This is not measurable sampling design: No design-unbiased
estimator of variance (because only one random draw)
Pick one set of elements (which always go together) & measure each
one: Later, we will call this cluster sampling.
Divide population into non-overlapping groups & choose an element
in each group: closely related to stratication.
Systematic sampling
Estimation
Partition the population into a groups
U = U
1
U
2
U
a
where U
i
: disjoint
Population total
Y =
i U
y
i
=
a
r =1
kU
r
y
k
=
a
r =1
t
r
where t
r
=
kU
r
y
k
.
Think of nite population with a elements with measurements
t
1
, , t
a
.
Systematic sampling
Estimation (Contd)
HT estimator:
Y
HT
=
t
r
1/a
,
if A = U
r
.
Variance: Note that we are doing SRS from the population of a
elements {t
1
, , t
a
}.
Var
_
Y
HT
_
=
a
2
1
_
1
1
a
_
S
2
t
where
S
2
t
=
1
a 1
a
r =1
(t
r

t)
2
and

t =
a
r =1
t
r
/a.
When the variance is small ?
Systematic sampling
Estimation (Contd)
Now, assuming N = na
V
_
Y
HT
_
= a (a 1) S
2
t
= n
2
a
a
r =1
( y
r
y
u
)
2
where y
r
= t
r
/n and y
u
=

t/n.
ANOVA: U =
a
r =1
U
r
SST =
kU
(y
k
y
u
)
2
=
a
r =1
kU
r
(y
k
y
u
)
2
=
a
r =1
kU
r
(y
k
y
r
)
2
+ n
a
r =1
( y
r
y
u
)
2
= SSW + SSB.
Systematic sampling
V
_
Y
HT
_
= na SSB = N SSB = N (SST SSW) .
If SSB is small, then y
r
are more alike and V
_
Y
HT
_
is small.
If SSW is small, then V
_
Y
HT
_
is large.
Intraclass correlation coecient measures homogeniety of clusters.
= 1
n
n 1
SSW
SST
More details about will be covered in the cluster sampling.
(Chapter 4).
Systematic sampling
Comparison between systematic sampling (SY) and SRS
How does SY compare to SRS when the population is sorted by the
following way ?
1
Random ordering: Intuitively should be the same
2
Linear ordering: SY should be better than SRS
3
Periodic ordering: if period = a, SY can be terrible.
4
Autocorrelated order: Successive y
k
s tend to lie on the same side of
y
u
. Thus, SY should be better than SRS.
Systematic sampling
How to quantify ? :
V
SRS
_
Y
HT
_
=
N
2
n
_
1
n
N
_
1
N 1
N
k=1
_
y
k

Y
N
_
2
V
SY
_
Y
HT
_
= n
2
a
a
r =1
( y
r
y
u
)
2
Cochran (1946) introduced superpopulation model to deal with this
problem. (treat y
k
as a random variable)
Systematic sampling
Example: Superpopulation model for a population in random order.
Denote the model by : {y
k
} iid
_
,
2
_
E
_
V
SRS
_
Y
HT
__
=
N
2
n
_
1
n
N
_
2
E
_
V
SY
_
Y
HT
__
=
N
2
n
_
1
n
N
_
2
Thus, the model expectations of the design variances are the same
under the IID model.
Stratied sampling
1
Systematic sampling
2
Stratied sampling
3
Domain estimation
Stratied sampling
Stratied sampling:
1
The nite population is stratied into H subpopulations.
U = U
1
U
H
2
Within each population (or stratum), samples are drawn independently
across the strata.
Pr (i A
h
, j A
g
) = Pr (i A
h
) Pr (j A
g
) , for h = g
where A
h
is the index set of the sample in stratum h, h = 1, 2, , H.
Example: Stratied SRS
1
Stratify the population. Let N
h
be the population size of U
h
.
2
Sample size allocation: Determine n
h
.
3
Perform SRS independently (select n
h
sample elements from N
h
) in
each stratum.
Stratied sampling
Why stratication ?
1
Control for domains of study
2
Flexibility in design and estimation
3
Convenience
4
Eciency
Stratied sampling
Estimation
HT estimation for t =
H
h=1
t
h
, where t
h
=
i U
h
y
i
.
1
HT estimator:
t
HT
=
H
h=1
t
h,HT
where

t
h,HT
is unbiased for t
h
.
2
Variance
Var
_
t
HT
_
=
H
h=1
Var
_
t
h,HT
_
by independence
3
Variance estimation
V
_
t
HT
_
=
H
h=1
V
h
_
t
h,HT
_
where

V
h
_
t
h,HT
_
is unbiased for Var
_
t
h,HT
_
.
Stratied sampling
Example: Stratied SRS
1
HT estimator:
t
HT
=
H
h=1
N
h
y
h
where y
h
= n
1
h
i A
h
y
i
.
2
Variance
Var
_
t
HT
_
=
H
h=1
N
2
h
n
h
_
1
n
h
N
h
_
S
2
h
where S
2
h
= (N
h
1)
1
i U
h
_
y
i

Y
h
_
2
.
3
Variance estimation
V
_
t
HT
_
=
H
h=1
N
2
h
n
h
_
1
n
h
N
h
_
s
2
h
where s
2
h
= (n
h
1)
1
i A
h
(y
i
y
h
)
2
.
Stratied sampling
Sample allocation: Given n =
H
h=1
n
h
, how to choose n
h
?
1
Proportional allocation: choose n
h
N
h
.
2
Optimal allocation: choose n
h
such that
minimize Var
_
t
HT
_
subject to
H
h=1
c
h
n
h
= C,
where c
h
is the cost of observing an element in stratum h and C is a
given total cost. The solution (Neyman, 1934) is
n
h
N
h
S
h
/
c
h
.
3
Properties
Under proportional allocation, the weights are all equal.
In general,
V
opt
t
HT
V
prop
t
HT
V
SRS
t
HT
where V
opt
t
HT
is the variance of the stratied sampling estimator

under optimal allocation, V
prop
t
HT
is the variance of the stratied

sampling estimator under proportional allocation, and V
SRS
t
HT
is the
variance of SRS estimator.
Stratied sampling
Method of collapsed strata
n
h
1: One-per-stratum design
1
Most ecient
2
No unbiased estimator of Var
_
t
HT
_
under stratied sampling.
Form pairs of strata:
t
1
, ,
t
H

_
t
j 1
,
t
j 2
_
, j = 1, 2, , H/2
where H: even
Variance estimator
V
coll
=
H/2
j =1
_
t
j 1
t
j 2
_
2
Stratied sampling
Method of collapsed strata (Contd)
Property
E
_
V
coll
_
= E
_
_
H/2
j =1
__
t
j 1
t
j 1
_
t
j 2
t
j 2
_
(t
j 2
t
j 1
)
_
2
_
_
=
H/2
j =1
_
Var
_
t
j 1
_
+ Var
_
t
j 2
_
+ (t
j 2
t
j 1
)
2
_
=
H
h=1
Var
_
t
h
_
+
H/2
j =1
(t
j 1
t
j 2
)
2
Var
_
t
HT
_
Thus, it is a conservative variance estimator.
Domain estimation
1
Systematic sampling
2
Stratied sampling
3
Domain estimation
Domain estimation
Basic setup
Estimation for domains (subpopulation): Usually want to make
inference about subpopulations as well as the whole population.
Often, we dont plan for all subpopulation of interest => random
sample size within subpopulations.
Denote domain d by U
d
U. Parameters are
N
d
= |U
d
|: number of elements in U
d
P
d
= N
d
/N: proportion of elements in U
d
. Often, N is known but N
d
is unknown.
t
d
=
i U
d
y
i
: domain total of y in domain d
Y
d
= t
d
/N
d
: domain mean of y in domain d
Domain estimation
Domain estimation
For k = 1, 2, , N, dene
z
kd
=
_
1 if k U
d
0 if k / U
d
Note that z
id
is not a random variable. (i.e., it does not depend on
the sampling scheme.)
Properties of z
kd
1
kU
z
kd
= N
d
2

Z
d
=
kU
z
kd
/N = N
d
/N = P
d
3
S
2
zd
=
1
n 1
_
kU
z
2
kd
N
Z
2
d
_
=
N
N 1
P
d
(1 P
d
)
Domain estimation
HT estimation of N
d
N
d
=
kU
z
kd
I
k
k
Under SRS,
N
d
=
kU
z
kd
I
k
n/N
= Nn
d
/n = Np
d
and
Var
_

N
d
_
=
N
2
n
_
1
n
N
_
S
2
zd
=
N
2
n
_
1
n 1
N 1
_
P
d
(1 P
d
)
V
_
N
d
_
=
N
2
n
_
1
n
N
_
s
2
zd
= N
2
_
1
n
N
_
p
d
(1 p
d
)
n 1
.
Domain estimation
HT estimation of t
d
=
kU
d
y
k
=
kU
y
k
z
kd
:
t
d
=
kU
y
k
z
kd
I
k
k
=
kA
y
k
z
kd
k
.
It is unbiased for t
d
.
HT estimator of

Y
d
= t
d
/N
d
:
y
d
=
t
d
N
d
Probably not unbiased, because its a non-linear function of unbiased
estimators.
Generally, we will make population parameters look like functions of
population totals and then do HT estimation on each totals.
Domain estimation
The statistical properties of y
d
can be derived from the following
approximation:
y
d
=
t
d
N
d
= f
_
N
d
,
t
d
_
.
= f (N
d
, t
d
) +
_

t
d
f (N
d
, t
d
)
_
_
t
d
t
d
_
+
_

N
d
f (N
d
, t
d
)
_
_
N
d
N
d
_
=
t
d
N
d
+
_
1
N
d
_
_
t
d
t
d
_
+
_
t
d
N
2
d
_
_
N
d
N
d
_
Thus,
Var ( y
d
)
.
= Var
_
1
N
d
_
t
d

Y
d

N
d
_
_
.
Under SRS,
Var ( y
d
)
.
=
_
1
E(n
d
)

1
N
d
_
1
N
d
1
i U
d
_
y
i

Y
d
_
2
.

Chapter 3: Element Sampling Design (Part 2) : Jae-Kwang Kim

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 3: Element Sampling Design (Part 2) : Jae-Kwang Kim

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3: Element Sampling Design (Part 2) : Jae-Kwang Kim

Uploaded by

Copyright:

Available Formats

Chapter 3: Element sampling design (Part 2)

is the variance of the stratied sampling estimator

is the variance of the stratied

You might also like