Professional Documents
Culture Documents
SST408 - Lesson One
SST408 - Lesson One
1
INTRODUCTION
Welcome to this module. In this module, we will advance the concepts learnt in the earlier module
SST305-Design and analysis of sample survey I. This is an interactive instructional module that uses
both action and collaborative learning styles that provide you with diverse online learning experiences
and effective learning processes. The key purpose of this module is to quip you with advanced
concepts and skills in Sample surveys. The new sampling designs will enable you to estimate the
population parameters from the selected samples more precisely.
The aim of the module is to provide students with more advanced sampling and estimation methods.
MODULE DESCRIPTION..
Multistage designs, multiphase designs; regression estimators under double sampling,
repeated, successive, panel and rotation designs. Sources of error in survey sampling; non-
sampling errors- response and non-response errors. Bias and variance; method of estimating
the variance under non- response., sampling and non-sampling errors, organization of
national surveys, and the Kenya National Bureau of Statistics.
WEEK TOPIC
WEEK 0: Introduction
WEEK 1 & 2: Cluster sampling designs
WEEK 3 & 4: Multistage sampling designs.
WEEK 5 & 6: Multiphase sampling designs
CAT1
WEEK 7: Regression estimation using Multiphase sampling designs
WEEK 8: Ratio estimation using multiphase sampling designs.
MODULE OVERVIEW .
This lesson is intended to help you acclimatize to blended learning and to create a community of learners who
will motivate each other during the course. You will be required to introduce yourself to your students either
physically during a face to face session or even online before other academic interactions start. You can also
share you’re your knowledge of basic concepts of population growth.
Week 1 &2: Cluster sampling designs
In this first lesson, we will connect what has been learnt in the previous course SST305 where the sampling unit
was considered to be the smallest indivisible unit in the population. In this course we use a group of units as the
sampling unit. This group of units is called a cluster. The design associated with a group of units is called
cluster sampling designs. We will derive estimators of the population parameters for this design and study their
properties.
In this lesson we introduce another type of design where we use phases instead of stages. Phases are subsets of
each other. If more than one phase is considered, the design is called a multiphase sampling design. We will
derive estimators of the population parameters for this design and study their properties
In this lesson, we will assume that the relationship between the survey variable and the auxiliary variable is
through a regression line with an intercept. On the basis of this assumption we will estimate the population
parameters using the multistage sampling design. This type of estimation is called regression estimation with
multiphase sampling design.
In this lesson, we will assume that the relationship between the survey variable and the auxiliary variable is
through a regression line with zero intercept. On the basis of this assumption we will estimate the population
parameters using the multistage sampling design. This type of estimation is called ratio estimation with
multiphase sampling design.
These two weeks bring together the work you have been doing to an end. This course unit will be examined
and will partially contribute to the award of the degree in Bsc(Statistics)/B.A/Bed that you are undertaking.
Kenyatta University examinations regulations will apply.
COURSE REQUIREMENTS
This is a blended learning course that will utilize the flex model. This means that
learning materials and instructions will be given online and the lessons will be self-
guided with the lecturer being available briefly for face to face sessions and support
and also on-site (online) most of the time. Your lecturer will be meeting you face to
face to introduce a lesson and put it into perspective and you will actively participate
in your search for knowledge by undertaking several online activities. This means
that some of the 39 instructional hours of the course will be delivered face to face
while other lessons will be taught online through various learner and lecturer
activities. It is important for you to note that one instructional hour is equivalent to
two online hours. Three instructional hours will be needed per week. Out of these,
one will be used for face to face contact with your lecturer (also referred as e-
moderator in the online activities) while the other two instructional hours (translating
to four online hours) will be used for online activities otherwise referred to as e-
tivities in the lessons. This will add up to the 5 hours requirement per lesson earlier
mentioned.
You will be required to participate and interact online with your peers and the e-
moderator who in this case is your lecturer. Guidelines for the online activities
(which we shall keep referring to as e-tivities) will be provided whenever there is an
e-tivity. Please note that since the online e-tivities are part of the learning process,
they may be graded at the discretion of your e-moderator. Such grading will however
be communicated in the e-tivity guidelines and feedback given as soon as possible
after the e-tivity. The e-tivities will include but will not be limited to online
assessment quizzes, assignments and discussions. There are also assessment
questions that you can attempt at the end of every lesson to test your understanding of
the lesson
ASSESSMENT
It is important to note that the module has embedded certain learner formative assessment
feedback tools that will enable you gauge your own learning progress. The tools include
online collaborative discussions forums that focus on team learning and personal mastery
and will therefore provide you with peer feedback, lecturer assessment and self-
reflection .The project score in combination with scores for e-tivities (where graded) will
account for 30% of your final examination score with the remaining 70% coming from a
face to face sit-in final written examination.
TABLE OF CONTENTS
Page
Introduction……………………………………………………………………………………...2
Purpose of the Module..................................................................................................................2
Module Description….............................................................................................................….2
Module Flowchart…................................................................................................................….2
Course Requirements…...............................................................................................................4
Assessment…..................................................................................................................................5
LESSON 1
CLUSTER SAMPLING DESIGNS
1.1 Introduction...........................................................................................................8
1.3 Assessment….........................................................................................................18
- 1.4 References..............................................................................................................19
LESSON 2
MULTI-STAGE CLUSTER SAMPLING DESIGNS
2.1 Introduction..........................................................................................................20.
2.2 Learning Outcomes................................................................................................21
2.3 Assessment….........................................................................................................30
- 2.4 References..............................................................................................................31
LESSON 3
MULTI-PHASE CLUSTER SAMPLING DESIGNS WITH REGRESSION AND RATIO
ESTIMATORS.
3.1 Introduction...........................................................................................................32
3.3 Assessment.............................................................................................................47
3.4 References...............................................................................................................47
LESSON 4
REPEATED SAMPLING DESIGNS.
4.1 Introduction...........................................................................................................48
4.3 Assessment.............................................................................................................58
4.4 References..............................................................................................................59
LESSON 5
SOURCES OF ERROR IN SAMPLE SURVEYS.
5.1 Introduction...........................................................................................................60
5.3 Assessment.............................................................................................................62
5.4 References..............................................................................................................62
Yij
=the value of the characteristic under study for the jth element (j=1,2….M) in the ith
cluster(i=1,2……N).
M
1
Y i.
M
Y
j 1
ij
Mean per unit of the ith cluster.
N N M
1 1
Y
N
y i.
j 1 MN
Y
i 1 j 1
ij
Means of the cluster means in the population.
1 n 1 n M
y y i. y ij
n j 1 Mn i 1 j 1 Means of the cluster means in the sample.
1 M
S i2
M 1 j 1
(Yij Y i. )
2
1 N 2
s si
2
w
N i 1 Mean square within clusters.
N M
1
S2
NM 1 i 1
(Y i. Y ) 2
j 1
Mean square between elements in the population.
NOTE- Upper case letters are used to represent population units and lower case letters for sample units.
1.2.1.3 Properties of the estimators of the population mean.
Theorem 1.1
Prove that the means of the cluster means in the sample y is an unbiased estimator of the
population mean Y .
Proof.
1 n
y y i.
n j 1
1.......if ...ith ...cluster ..is..in..the..sample
Ii
Define an function 0........otherwise
1 n
y Y i. I i
n j 1
Then Taking expectation we get
1 n
E ( y) Y i. E ( I i )
n j 1 E(I i ) n / N
Now
Therefore
1 n 1
E ( y)
n j 1
Y i. xn / N )
N
Y i
=Y
Hence the result.
Theorem1.2
Prove that the variance of the means of the cluster means in the sample is given by;
1 N 2
Var ( y )
( N n) 2
Nn
Sb S b2 (Yi Y )
N 1 i 1
where
Proof.
By definition Var ( y ) E ( y Y ) 2
2 2
=E( y ) Y ……………………………………………………….(1.1)
Consider equation (1.1)
2 1 n
E ( y ) E ( ( y i ) 2
n i 1
1 n 2 n n
E ( ( y i y i y k )
n i 1 i 1 k 1
Y i Y k ( Y i ) 2 ( Yi 2 )
Substituting i 1 k 1 i 1 i 1 in (1.3) and then simplifying we get
2 N n N 2
n 1 2
E( y ) ( Yi N2Y
Nn( N 1) i 1 Nn( N 1) ………………………………….(1.4)
Substituting (1.4) in (1.1) we get
N n N 2 n 1 2 2
Var ( y ) i Nn( N 1)
Nn( N 1) i 1
Y N 2
Y Y
( N n) 1 N 2 [( n 1) N 2 Nn( N 1)] 2
............. . Y i
Nn N 1 i 1 Nn( N 1)
Y
Theorem 1.3.
1 n 2 1 N 2
sb2 i
n 1 i 1
( y y ) S 2
b (Yi Y )
N 1 i 1
Prove that the estimate is an unbiased estimator of . .
Proof.
1 n 2
sb2 ( yi y)
n 1 i 1
Consider the estimate
1 n 2 2
s
2
b ( ( y i n y )
n 1 i 1 …………………………………………(1.5)
Taking expectations of equation (1.5) we get;
n
1
E ( sb2 ) {E ( y i ) nE ( y ) 2 }
n 1 i 1 …………………………………………………(1.6)
We know that;
2 2
V ( y ) E ( y ) [ E ( y )] 2 E ( y ) V ( y ) [ E ( y )] 2
2 1 1 2 2
E ( y ) ( )S b Y
n N ………… ..…..(1.7)
Consider the first part of (1.6);
n N 2 2
1 1 2
E (s )
2
b
1
{
n 1 N
Y i n( ) S b n Y
n N
i 1 }
2 2
n N
1
n 1
{
N
(
i 1
Y i NY ) n( 1 1 ) S b2 }
n N
N 2
1
{
n 1 1
( N 1) S b2 n( ) S b2 } ( N 1) S b2 Yi 2 NY
n 1 N n N since i 1
= b
S2
Hence the result.
An unbiased estimator of the variance of the means of cluster means is therefore given by
1 n 2
Var ( y )
( N n) 2
Nn
sb sb2 ( yi y)
n 1 i 1
where and the standard error of the means of cluster means
( N n) 2
s tan dard .error ( y ) sb
is given by; Nn
Theorem 1.4
Show that
(1 f ) 2
V ( y) S (1 ( M 1) )
nM
E ( y ij Y )( y ik Y )
E ( y ij Y ) 2
N N
( y
i 1 k 1
ij Y )( y ik Y )
ik
Where ( M 1)( NM 1) S 2
Proof;
( N n) 2 (1 f ) 2
Var ( y ) Sb Sb
Nn n
(1 f ) 1 N
n N 1 i 1
( y i. Y ) 2
……………………………………….(1.9)
N ( yi1 Y ) ( y i 2 Y ) .....( y iM Y )
(y i. Y )2 2
i 1 i 1 M
N M
1
M2
[ ( y i. j Y )] 2
i 1 j 1
N M M M
1
M2
[ ( y i. j Y ) 2 ( yij Y )( yik Y )]
i 1 j 1 i 1 k 1
ik
1 N 2 1 N M 1 N M M
N 1 i 1
( yi Y )
( N 1) M 2
[ ( y i. j Y ) 2
i 1 j 1 ( N 1) M 2
i 1
( yij Y )( yik Y )]
j 1 k 1
j k
M 2 S b2 ( NM 1) S 2 2 ( NM 1)( M 1) S 2
M 2 S b2 ( NM 1) S 2 (1 2 ( M 1)) ……………………………………………………….(1.10)
1 f
V ( y) S 2 (1 ( M 1) 2 )
nM ………………………………………………….(1.13)
The relative efficiency of cluster sampling as compared to simple random sampling is obtained by dividing (1.12) with
(1.13)
(1 f ) 2 1 f S 2 (1 ( M 1) 2 )
Var srs ( y ) / Var cluster ( y ) S
Relative efficiency(E)= nM / nM /
1.2.1-E-Tivity- : Describe cluster sampling design, derive the properties of its estimators of the
population parameters and apply it to real data.
v=efToj06DJfg https://www.youtube.com/watch?
v=pV3FAVr086s
https://www.youtube.com/watch?v=-X5rxFSMXI8
Read these Notes;
http://home.iitk.ac.in/~shalab/sampling/chapter9-
sampling-cluster-sampling.pdf
Spark
Individual contribution
A survey on pepper was conducted to estimate the
number of pepper standards and production of pepper in
Kerala state in India. For this three clusters from 95 were
selected by simple random sampling without
replacement. The information on the number of pepper
standards is recorded below.
Cluster Cluster Number of pepper standards
Number size
1 7 41,16,19,144,212,57,199
2 7 39,70,38,161,219,128,20
3 7 115,59,46,37,219,120,46
Estimate
i) Average number of pepper standards along
with its standard error.
ii) The relative efficiency of cluster sampling as
compared with simple random sampling.
Interaction begins Post your answers on the discussion forum 1.2.1
1.2.2 Apply this design to estimate the population parameters of a real population and
compare its relative efficiency with other sampling designs.
In Umerpur-Neerna village of Allahabad district in India there are a total of 412 bearing trees
of guava. 15 clusters of 4 trees each were selected from 103 clusters of 4 trees each and their
yield recorded in kilograms below;
Cluster 1st tree 2nd tree 3rd tree 4th tree
1 5.53 4.84 0.69 15.79
2 26.11 10.93 19.08 11.18
3 11.08 0.65 4.21 7.56
4 12.66 32.52 16.92 37.02
5 0.87 3.55 16.92 37.02
6 6.40 11.68 40.05 5.15
7 54.21 34.63 52.55 37.96
8 1.94 35.97 29.54 25.98
9 37.94 47.07 16.94 28.11
10 56.92 17.69 26.24 6.77
11 27.59 38.10 24.24 6.53
12 45.98 5.17 1.17 6.53
13 7.13 34.35 12.18 9.86
14 14.23 16.89 28.93 21.70
15 3.53 40.76 5.15 1.25
(i) Estimate the average yield per tree of guava along with its standard error.
(ii) Compare the relative efficiency of the cluster sampling design with the simple
random sampling design.
Solution.
Here we have M=4, N=103 and n=15.
i) The means of cluster means in the sample
1 n M
y
Mn i 1
y
j 1
ij
292.04
19.47
15
The estimate of the variance of the sample cluster means is given by;
1 1 1 N 2 2 1 1 1
var ( y ) ( ). ( y i n y ) ( ) (7202.4262 15 x379.0809)
n N n 1 i 1 15 103 14
6.1686
nM 1 i 1 j 1 59
119 .4
15
(1 )
Var ( y ) 103 x119 .4 1.70
60
1.70
Var srs ( y ) / var cluster ( y ) 0.28
Relative efficiency E= 6.17
1.3 Assessment.
For estimating the total stationary sheep in a certain district with 100 divisions, four
divisions were selected using simple random sampling and each division has eight
villages. The total number of sheep in each village of the selected division were counted.
Following are the results.
Number of sheep in each Villages
Cluster(Divisions) 1 2 3 4 5 6 7 8
1 266 224 109 890 31 46 128 126
2 129 163 350 275 278 186 252 466
3 247 181 403 265 987 651 485 60
4 347 133 249 161 362 112 186 170
i) Estimate the total number of sheep in the district together with its standard error.
ii) Estimate the relative efficiency as compared with simple random sampling.
1.4 References
1. Lohr, S.L. (1999). Sampling Design and Analysis. Pacific Grove: Duxbury Press.
ISBN-13:9780495105275
2. Cochran, W. G. (1977). Sampling Techniques; 3rd edition. New York: Wiley ISBN-
047116240X