PHPS30020 Week1 (2) - 29nov2023 (Combined Sampling Variation - SEM SEP CIs)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Public Health Medicine, 2023-24

Module Code: PHPS30020

Module Title: Biostatistics Section

Coordinator: Dr. Carla Perrotta

Week & Date: Week 1_29 November 2023
Populations & Samples; Sampling
Topics: Variation; Sampling Distributions of
the Mean and %; Confidence intervals
Delivered by: Assoc. Prof. Mary Codd
Sampling & Sampling Distributions

This class addresses important Statistical Phenomena of:

ØPopulations & Samples
ØSampling Variation
ØSampling Distributions of Means or Proportions
ØPoint estimation and ‘confidence’ around point estimates
ØConfidence Intervals and their interpretation

as an essential basis for understanding much of what

statistics is about and why you need to know about the
Populations & Samples

ØPopulations, Sampling
ØSampling Error
Why do we Sample?
Ø In statistics, a population represents the entire group in
whom we are usually interested

Ø In reality, rarely possible (feasible) to study an entire

population. Also, may be excessively costly/labour intensive

Ø Usually collect data on a sample of the population of


Ø The objective is that the sample should be representative of

the population of interest, so that we can infer from the
sample to the population / draw conclusions about the
Issues with Sampling
Ø Thus, we make inferences about the population, based on
findings in a sample - in truth, happens all the time, without
enough attention to whether it is correct or incorrect
Ø Must recognise that, regardless of the method of sampling,
information from the sample may not fully represent the
population of interest. We have introduced Sampling Error
by studying only some members of the population

Ø Furthermore, it is well known that repeated samples from

the same population will likely give different results for the
population parameters* of interest (*mean, sd, %). This is
Sampling Variation
Student Group: Height in subgroups
Group n Mean sd

1 9 164.2 7.3

2 6 166.5 9.6

3 6 163.7 7.0

4 7 171.0 6.3

5 9 168.0 7.7

6 10 163.6 9.8

7 8 163.6 9.2

8 7 168.3 10.7

ALL 63 165.8 8.0

Sampling Variation
• … refers to the phenomenon that, in repeated samples of
approx. similar size from a population, it is unlikely that the
estimates of the population parameters will be the same in
each sample, or the same as the true population values

• However, they will be ‘close’ to the true population values

• It is possible to quantify the variation in sample statistics

to ascertain how good they are at estimating population
parameters (precision of the sample estimate), i.e:
• … how good an estimate of the true population mean (𝝁)
and std dev (𝝈) is a single sample mean (x̅ ) and std dev (s)?
Sampling Distribution of the Mean

• One can consider a whole series of sample means

distributed around a population mean

• They can even be ‘graphed’ to provide a picture of the

distribution of sample means
• much like the distribution of values of a single variable
around a mean
• The difference in that it a theoretical distribution of
sample means around a population mean

• That distribution will itself have a mean and a

standard deviation (sd) ……
Sampling Distribution of the Mean

That distribution will itself have:

• a mean, a.k.a the 'mean of sample means’ which is the

equivalent of the Population Mean; and is known as the
Sampling distribution of the (population) mean;

• a standard deviation, a.k.a. the ‘standard deviation of

the sample means’, which is the equivalent of the
Population Standard Deviation and is known as the
Standard Error of the Mean (SEM)
Sampling Distribution of the Mean

• The Standard deviation of the Sampling Distribution of

the Mean is the Standard Error of the Mean (SEM)

• We use the SEM to comment on the Population Mean,

not to describe the dispersion of values around a mean

• In reality, we usually only take one sample from the

population. However, we still make use of our knowledge
of the theoretical distribution of sample estimates to draw
inferences about the population parameter

• From this one sample we can calculate the SEM, which is

SEM = sd / √n
The SEM & Confidence Intervals
• The SEM is used to calculate Confidence Intervals.
These give some indication of how good the Sample
Mean is as an estimate of the Population Mean

• In the example given, we are estimating the

population mean (mu) from the sample mean (x), by
• the sample mean (x),

• the SEM (sd/√n); and

• (depending on sample size) knowledge of the scores

above and below the mean, within which 95% of
observations in a normal distribution fall
Calculating the CI for Mean

• Confidence Interval: Mean + z *(SEM)

For the purposes of this exercise use:

• 95% Confidence Interval: Mean + 1.96*SEM

• 99% Confidence Interval: Mean + 2.58*SEM

• 90% Confidence Interval: Mean + 1.65*SEM

Results of Height by Group
Group n Mean sd SEM Lower 95% Limit Upper 95% limit
Mean - 1.96*SEM Mean+1.96*SEM
1 9 164.2 7.3 2.43 159.4 168.9

2 6 166.5 9.6 3.9 158.9 174.1

3 6 163.7 7.0 2.86 158.1 169.6

4 7 171.0 6.3 2.4 166.3 175.7

5 9 168.0 7.7 2.57 163.0 173.0

6 10 163.6 9.8 3.1 157.5 170.0

7 8 163.6 9.2 3.3 157.1 170.1

8 7 168.3 10.7 4.04 160.4 176.2

ALL 63 165.8 8.0

Sampling Distribution of %
• In the same way that we can establish the our ‘confidence’
in a sample mean as an estimate of a population mean
using Confidence Intervals
• we can also to do this for a proportion from a sample to
estimate a population proportion.

• We first examine the Sampling Distribution of a

Proportion (as a representation of a series of sample
proportions around a population proportion)
Playing Cards

Take a sample size n = 8

– Your interest is in the No. (%) of diamonds
You expect 2 (25%) diamonds
– What did you get?
Take another sample sized n = 8
– What might you get?
– 1/8 (12.5%); 2/8 (25%); 3/8 (37.5%); 4/8 (50%); 5/8; etc
Repeat for several samples and look at the distribution of the
proportion of diamonds you get
Sampling Distribution of a Proportion
Sampling Distribution of a Proportion

• Is the distribution of all possible values of the proportion

that could be obtained in repeated samples of the same
size from the population of interest? Yes

• The sampling distribution of a proportion has

– A shape
– A mean
– A standard deviation
Study of smoking in PVD
– To estimate the proportion of smokers in patients
with peripheral vascular disease - PVD (the
Materials and Methods
– Sample of n=200 patients
– 130 are smokers in the sample of 200 = 65%
– The proportion of smokers in the sample is
p = 0.65
Sampling Variation

Many different samples of size

200 possible

Many different proportions of

smokers (of which we only got
one in the sample we happened
to take)

– Sample(1) 130/200 = 0.65

– Sample(2) 120/200 = 0.60
– Sample(3) 144/200 = 0.72

Sampling Distribution of a Proportion

The different sample proportions form a normal

• As long as the sample size is big enough
The sample proportions are scattered around the unknown
population proportion: p
• The mean of the sampling distribution of a proportion is: p
The standard deviation of the sampling distribution of the
proportion is given by:

This is the Standard Error of the Proportion (SEP) 20

Confidence Interval for a Proportion

Since the population proportion, p, is unknown the sample

proportion, p, and the sample Standard Error of the Proportion

are used to estimate the 95% Confidence Interval for the

population proportion

95% CI =

Compute 95% CI for proportion

95% chance that the unknown population proportion p is


p ± 1.96 p ( 1- p ) approximated by p ± 1.96 p ( 1- p )

n n

0.65 ± 1.96 0.65(1-0.65) 0.65 ± 1.96 (0.034)


0.65 ± 0.067 which is between 0.583 and 0.717

Confidence Interval
The level of confidence is 95%

The confidence interval for the proportion of smokers ranges

from 0.583 to 0.717

The confidence limits for the proportion of smokers are

0.583 and 0.717

Precision of CI depends on:

Sample Size (more precision with larger sample sizes), and
The parameter (mean, proportion, excess risk, relative risk)

Uses of Confidence Intervals
Constructed for Effect Measures /Estimates
A way to represent how "good" an estimate is at:
• Representing a population parameter (mean, %)
• Establishing if an effect measure is important (e.g. clinically
significant) or not – does the CI include or exclude the value of ‘no
• Establishing if Relative Risks or Odds Ratios derived from
cohort/case-control studies represent risks or benefits associated
with exposures
• Establishing if Risk Differences or Attributable Risks are indeed
greater than 0

A reminder of the limitations of Effect Measures /Estimates

You might also like