Summary (Session 11-15)

Sampling and Sampling Distributions

Random Sampling Methods
• Simple random sample (each sample of the same size has an equal chance of being
• Stratified sample (divide the population into groups called strata and then take a sample
from each stratum)
• Cluster sample (divide the population into strata and then randomly select some of the
strata. All the members from these strata are in the cluster sample.)
• Systematic sample (randomly select a starting point and take every n-th piece of data
from a listing of the population)

Sampling Distribution
• The sample mean is the most common estimator of the population mean, µ.
• The sample variance, s2, is the most common estimator of the population variance, σ2.
• The sample standard deviation, s, is the most common estimator of the population
standard deviation, σ.
• The sample proportion, p, is the most common estimator of the population proportion, 𝜋.
In practice, parameter values are not known. They are estimated using sample observations.
- Parameter values are fixed.
- Values of statistic varies sample to sample.

Unbiased Estimate
If E(statistic) = parameter, then the statistic is said to be unbiased estimate of the parameter.
Sample mean is an unbiased estimate of population mean.
• Unknown parameters are estimated using sample observations.
• Parameter values are fixed.
• Values of statistic varies from sample to sample.
• Each sample has some probability of being chosen.
• Each value of a statistic is associated with a probability.
• Statistic is a random variable.
• Distribution of a statistic is called sampling distribution.
• Distribution of a statistic may not be the same as the distribution of population.
Sampling Distribution of Sample Mean
When sampling from a normal population with mean µ and standard deviation σ, the sample mean, X,
has a normal sampling distribution:

Common Notion:

 x = E ( x ) =  ,  x2 = Var ( x ) =  2 n
Standard Error
• Different samples of the same size from the same population will yield different sample
• A measure of the variability in different values of sample mean is given by the Standard
Error of the sample mean.
• Standard error of a statistic is the standard deviation of its distribution.

standard error( x ) =  x = Var ( x ) =  n

Central Limit Theorem
• When the population distribution is not normal, then also the sample mean has a normal
distribution provided n is large. Practically, this result is true for n ≥ 30.
If the population standard deviation, σ, is unknown, replace σ with the sample standard
deviation, s. If the population is normal and n is small (<30), the resulting statistic:

t = X −
s/ n
has a t distribution with (n - 1) degrees of freedom
Sampling Distribution of Sample Proportion
• When the sample size n is large enough,

x − n
Z= ~ N (0,1)
n (1 −  )
p −
or Z= ~ N (0,1)
 (1 −  ) n
• This is a particular case of central limit theorem.
• Practically, this result is true for 𝑛 ≥ 30.
• Or, when 𝑛𝜋 ≥ 5 as well as 𝑛𝜋(1 – 𝜋 ) ≥ 5.

Interval Estimation

Hypothesis Testing [One sample test]

• Hypothesis testing for mean, proportion and variance
Research Hypothesis
A statement of what the researcher believes will be the outcome of an experiment or a study
Statistical Hypothesis
A formal structure used to statistically (based on a sample) test the research hypothesis
Null Hypothesis
• Denoted as H0
• Nothing new is happening
• The null condition exists
• It refers to the status quo (current or existing state of affairs)
• Similar to the notion of innocent until proven guilty
• Usually a hypothesis of no difference.
• Example: The average number of TV sets in U.S. Homes is equal to three.
H0: μ = 3
• Begin with the assumption that the H0 is true
• It is tested for rejection or acceptance.
Alternate Hypothesis
• Denoted as H1 or Ha
• Something new is happening
• It is the opposite of the null hypothesis
▪ E.g., The average number of TV sets in U.S. homes is not equal to 3 (H1: μ ≠ 3)
• It challenges the status quo
• The Null and Alternative Hypotheses are mutually exclusive.
• Only one of them can be true.
Critical (Rejection) and Acceptance
• Critical value divides the whole area under probability curve into two regions:
Critical (Rejection) region
• When the statistical outcome falls into this region, H0 is rejected.
• Size of this region is α.
Acceptance Region
• When the statistical outcome falls into this region, H0 is accepted.
• Size of this region is (1-α).
• Steps:
• State H0 and H1
• Compute the value of test statistic Zc
• Obtain critical value for fixed α and according to H1 (Right/ Left/ Two tailed
• Compare computed value of Zc with critical value
• Make the decision accordingly.
P-value approach
Let Zc be the computed value of test statistic
• Let Z ~ N(o,1)
• Then p – value is given by the following probability
• For two tailed tests:
• 2P(Z> |Zc|)
• For right tailed tests:
• P(Z> Zc)
• For left tailed tests:
• P(Z< Zc)
• Decision: H0 is rejected in the favor of H1 at α x100% level of significance, if
• p – value < α
• The p – value is the smallest level of significance at which H0 would be rejected.
The test statistics in hypothesis testing for population mean and proportion can be
computed in similar way as in sampling distribution.
Hypothesis Testing of Population Variance σ2
• Assumptions:
▪ Population is normal.
▪ Sample size is small

• Test Statistic:
• Distribution of above test statistic is Chi Square with (n-1) degree of freedom.
• Critical values are obtained from the Chi Square table for given level of significance and

