Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

1

SAMPLING DISTRIBUTION
Outline
2
 Sampling
 Sampling without replacement
 Sampling with replacement
 Sampling distribution
 Mean & variance of sampling distribution
 Standard error
 Law of large number
 Central limit theorem
Type of inference
3

 Estimation: We can estimate the value of a


population parameter.

 Testing: We can formulate a decision about a


population parameter.

 Regression: We can make predictions about the


value of a statistical variable.
Why sampling dist?
4

To evaluate the reliability of our inference,


we need to know about the probability
distribution of the sample statistic.

Our interest is in the formation of sampling


distributions for sample means(statistic)
and sample variances(statistic).
Need of sample
5

 Essentially, we would like to know the parameter.


But in most cases it is hard to know the parameter since the population is too
large. So we have to estimate the parameter by some proper statistics computed
from the sample.

For example, the mean of the data from a sample is used to give information
about the overall mean in the population from which that sample was drawn

 Our interest is to know something about the population, but because our time,
resources, and efforts are limited, we can take a sample to learn about the
population.
Need of sample
(otherwise for parameter calculation we have to pay the following)
6

 Time of researcher and those being surveyed.


 Cost to group or agency commissioning the survey.
 Confidentiality, anonymity, and other ethical issues.
 Interference with population.

 Large sample could alter the nature of population, e.g. opinion surveys.
 Destruction of population, e.g. crash test only a small sample of
automobiles.
 Cooperation of respondents – individuals, firms, administrative
agencies.
 In some cases partial data is all that is available, e.g. fossils and historical
records, climate change.
Choice of sample
7

Random Sample : A sample designed in such a way as to ensure that


(1) Every observation of the population has an equal chance of being
chosen and
(2) Every combination of n observation has an
equal chance of being chosen.
Statistic (sample statistic)
8

 A number that describe a sample


 Known after we take a sample

 Change from sample to sample

 Used to estimate an unknown parameter


Selection of simple random sample
9

 Simple random sample is a sample of size n selected in a


manner that each possible sample of size n has the same
probability of being selected.
Where
 N is the symbol given for the size of the population or the number of
elements in the population.

 n is the symbol given for the size of the sample or the number of
elements in the sample.
Expectation & variance of sample mean
10

( ̄) = ̄ ( ̄)

̄ = ̄ − ̄

= ̄ ( ̄) − ̄ ( ̄)
Sampling without replacement
11

Draw simple random sample of size 2 without replacement from a


population of 4 elements

Population elements are A, B, C, D. N=4, n=2.


If the order of selection does not matter (i.e. we are interested only in
what elements are selected), the equally likely random samples are
AB BC CD
AC BD
sample of size 3
ABC ACD BCD
sample
AD of size 3
ABD
This is the number of combinations
N N! 4!
C n
 
n!( N  n)! 2!(4  2)!
6
Sampling with replacement
12

After any element randomly selected, replace it and randomly select


another element. But this could lead to the same element being
selected more than once.
Draw simple random sample of size 2 with replacement from a
population of 4 elements

Population elements are A, B, C, D. N=4, n=2.

The number of random samples are Nn


sample of size 2 sample of size 3
AAA BAA CAA DAA
13 AAB BAB CAB DAB
AA BA CA DA
AAC BAC CAC DAC
AB BB CB DB
AAD BAD CAD DAD
AC BC CC DC
ABA BBA CBA DBA
AD BD CD DD ABB BBB CBB DBB
ABC BBC CBC DBC
ABD BBD CBD DBD
ACA BCA CCA DCA
ACB BCB CCB DCB
ACC BCC CCC DCC
ACD BCD CCD DCD
ADA BDA CDA DDA
ADB BDB CDB DDB
ADC BDC CDC DDC
ADD BDD CDD DDD
Sampling distribution serves as a bridge between the
sample and the population
14

Staticstic

Parameter
Sampling distribution
15

 This is not the distribution of the sample.

 The sampling distribution is the distribution


of sample statistic.
 If we take many samples and get the statistic
for each of those samples, the probability
distribution of all those statistics is the
sampling distribution.
Sampling distribution
16

 A sampling distribution is the probability distribution for all


possible values of the sample statistic.

 Each sample contains different elements so the value of the


sample statistic differs for each sample selected. These
statistics provide different estimates of the parameter. The
sampling distribution describes how these different values
are distributed.
Sampling distribution of the sample mean ͞x
17

 A probability distribution of sample means that would


be obtained from all possible samples of the same size

 If the expected value of the statistic ͞x is µ. This


characteristic of the sample mean is that of being an
unbiased estimator of µ. In this case,

E (x)  
Sampling distribution approximately a normal distribution
18

 If a simple random sample is drawn from a normally distributed


population, the sampling distribution of ͞x is normally distributed
 Sampling distribution will approximate a normal curve even if the
population you started with does NOT look normal (HOW?)

 The mean of the sampling distribution of ͞x is equal to the


population mean µ, x  
 If the sample size n is a reasonably small from the population size N,
then the standard deviation of the sampling distribution of ͞x is the
population standard deviation σ divided by the square root of the
sample size. 
 
x
n
Standard error
19

 The standard deviation of the sampling distribution is called the


standard error.
Law of large numbers
20

Q: Each sample of the same population will have a different mean ,


Why it is a reasonable estimate of the population mean?
EXAMPLE 1
21

 Population has 6 elements: 1, 2, 3, 4, 5, 6 (like numbers on dice)


 We want to find the sampling distribution of the mean for n=2
 If we sample with replacement, what will happen

1+2+3+4+5+6 = 21.
µ=21/6 = 3.5
2
2 91  21 
   
6  6 
There is only 1 way to get a mean of 1, but 6 ways to get a mean of
3.5
Sample with mean
22

1st 2nd M 1st 2nd M 1st 2nd M


1 1 3 1 5 1
1 2 3
1 2 3 2 5 2
1.5 2.5 3.5
1 3 3 3 5 3
2 3 4
1 4 3 4 5 4
2.5 3.5 4.5
1 5 3 5 5 5
3 4 5
1 6 3 6 5 6
3.5 4.5 5.5
2 1 4 1 6 1
1.5 2.5 3.5
2 2 4 2 6 2
2 3 4
2 3 4 3 6 3
2.5 3.5 4.5
2 4 4 4 6 4
3 4 5
2 5 4 5 6 5
3.5 4.5 5.5
2 6 4 6 6 6
4 5 6
Sampling distribution of sample mean
x f (x) P(x) x P(x ) 2
x23 P(x)
1 1 1/36 1/36 1/36
1.5 2 2/36 3/36 4.5/36
2 3 3/36 6/36 12/36
2.5 4 4/36 10/36 25/36
3 5 5/36 15/36 45/36
3.5 6 6/36 21/36 73.5/36
4 5 5/36 20/36 80/36
4.5 4 4/36 18/36 81/36 126
5 3 3/36 15/36 75/36 x  E(x)  xP(x)   3.5
5.5 2 2/36 11/36 60.5/36
36
6 1 1/36 6/36 36/36
SUM 36 1 126/36 493.5/36
2
493.5  126 
  x P ( x )   x P ( x )  
2 2 2
 x    1.45833
36  36 
With replacement
24

 x    3 .5
2
2  2.9166
 x    1.4583
n 2
25

 The sampling distribution shows the relation between the probability


of a statistic and the statistic’s value for all possible samples of size n
drawn from a population.
Hypothetical Distribution of Sample Means
f(M)

Mean Value
Example 2
26

Population has 5 elements: 0, 3, 6, 9,12


We want to find the sampling distribution of the mean for n=3
If we sample without replacement,

30
   6
5
2
  18
Sample with mean
27

sample mean sample mean sample mean


0, 3, 6 3 3, 6, 9 6 6, 9, 12 9
0, 3, 9 4 3, 6, 12 7
0, 3, 12 5 3, 9,12 8
0, 6, 9 5
0, 6, 12 6
0, 9, 12 7
Sampling distribution of sample mean
28
2
x f (x) P(x) x P(x ) x P(x)
3 1 1/10 3/10 9/10
60
4 1 1/10 4/10 16/10 x  E(x)  xP(x)   6
5 2 2/10 10/10 50/10 10
6 2 2/10 12/10 72/10
7 2 2/10 14/10 98/10
8 1 1/10 8/10 64/10
9 1 1/10 9/10 81/10
SUM 10 1 60/10 390/10

2
390  60 
  x P ( x )   x P ( x )  
2 2 2
 x    3
10  10 
29

x    6
2
2  N  n 18  2
 x    3
n N 1 3 4
Central limit theorem
30

Sampling error
 The sample cannot be fully representative of the population
 As such, there is variability due to chance
 We could have a thousand sample means and none of them equal
exactly the population mean.

The sampling error is the difference between the point estimate


(value of the estimator) and the value of the parameter. This
is the error caused by sampling only a subset of elements of a
population, rather than all elements in a population. Our
interest lies in minimizing the sampling error, but all samples
have some such error associated with them.
Central limit theorem
31

 For any population , regardless of form, the sampling distribution of


the mean will approach a normal distribution as the sample size (n)
gets larger.
 This of course begs the question of what n is ‘large enough’

 Furthermore, the sampling distribution of the mean will have a mean


equal to µ (the population mean), and a standard deviation equal to
2 2
  2      
X ~ Normal   ,    Normal   ,  
  n    
n 
   
Central limit theorem(CLT)
32

The sampling distribution of the sample mean, is approximated by a


normal distribution when the sample is a simple random sample and
the sample size n is large.
In this case, the mean of the sampling distribution is the population
mean, µ, and the standard deviation of the sampling distribution is
the population standard deviation, σ, divided by the square root of
the sample size.
A sample size of 100 or more elements is generally considered
sufficient to permit using the CLT.
If the population from which the sample is drawn is symmetrically
distributed, n > 30 may be sufficient to use the CLT.

You might also like