PHPS30020 Week1 (2) - 29nov2023 (Combined Sampling Variation - SEM SEP CIs)

Public Health Medicine, 2023-24
Module Code: PHPS30020
Module Title: Biostatistics Section
Coordinator: Dr. Carla Perrotta

Week & Date: Week 1_29 November 2023
Populations & Samples; Sampling
Topics: Variation; Sampling Distributions of
the Mean and %; Confidence intervals
Delivered by: Assoc. Prof. Mary Codd
Sampling & Sampling Distributions
This class addresses important Statistical Phenomena of:

ØPopulations & Samples
ØSampling Variation
ØSampling Distributions of Means or Proportions
ØPoint estimation and ‘confidence’ around point estimates
ØConfidence Intervals and their interpretation
as an essential basis for understanding much of what

statistics is about and why you need to know about the
Populations & Samples
ØPopulations, Sampling
ØRepresentativeness
ØSampling Error
Why do we Sample?
Ø In statistics, a population represents the entire group in
whom we are usually interested
Ø In reality, rarely possible (feasible) to study an entire

population. Also, may be excessively costly/labour intensive
Ø Usually collect data on a sample of the population of

interest
Ø The objective is that the sample should be representative of

the population of interest, so that we can infer from the
sample to the population / draw conclusions about the
population
Issues with Sampling
Ø Thus, we make inferences about the population, based on
findings in a sample - in truth, happens all the time, without
enough attention to whether it is correct or incorrect
Ø Must recognise that, regardless of the method of sampling,
information from the sample may not fully represent the
population of interest. We have introduced Sampling Error
by studying only some members of the population
Ø Furthermore, it is well known that repeated samples from

the same population will likely give different results for the
population parameters* of interest (*mean, sd, %). This is
Sampling Variation
Student Group: Height in subgroups
Group n Mean sd
1 9 164.2 7.3
2 6 166.5 9.6
3 6 163.7 7.0
4 7 171.0 6.3
5 9 168.0 7.7
6 10 163.6 9.8
7 8 163.6 9.2
8 7 168.3 10.7
ALL 63 165.8 8.0

Sampling Variation
• … refers to the phenomenon that, in repeated samples of
approx. similar size from a population, it is unlikely that the
estimates of the population parameters will be the same in
each sample, or the same as the true population values
• However, they will be ‘close’ to the true population values
• It is possible to quantify the variation in sample statistics

to ascertain how good they are at estimating population
parameters (precision of the sample estimate), i.e:
• … how good an estimate of the true population mean (𝝁)
and std dev (𝝈) is a single sample mean (x̅ ) and std dev (s)?
Sampling Distribution of the Mean
• One can consider a whole series of sample means

distributed around a population mean
• They can even be ‘graphed’ to provide a picture of the

distribution of sample means
• much like the distribution of values of a single variable
around a mean
• The difference in that it a theoretical distribution of
sample means around a population mean
• That distribution will itself have a mean and a

standard deviation (sd) ……
That distribution will itself have:
• a mean, a.k.a the 'mean of sample means’ which is the

equivalent of the Population Mean; and is known as the
Sampling distribution of the (population) mean;
• a standard deviation, a.k.a. the ‘standard deviation of

the sample means’, which is the equivalent of the
Population Standard Deviation and is known as the
Standard Error of the Mean (SEM)
• The Standard deviation of the Sampling Distribution of

the Mean is the Standard Error of the Mean (SEM)
• We use the SEM to comment on the Population Mean,

not to describe the dispersion of values around a mean
• In reality, we usually only take one sample from the

population. However, we still make use of our knowledge
of the theoretical distribution of sample estimates to draw
inferences about the population parameter
• From this one sample we can calculate the SEM, which is

SEM = sd / √n
The SEM & Confidence Intervals
• The SEM is used to calculate Confidence Intervals.
These give some indication of how good the Sample
Mean is as an estimate of the Population Mean
• In the example given, we are estimating the

population mean (mu) from the sample mean (x), by
using:
• the sample mean (x),
• the SEM (sd/√n); and
• (depending on sample size) knowledge of the scores

above and below the mean, within which 95% of
observations in a normal distribution fall
Calculating the CI for Mean
• Confidence Interval: Mean + z *(SEM)
For the purposes of this exercise use:

• 95% Confidence Interval: Mean + 1.96*SEM

Results of Height by Group
Group n Mean sd SEM Lower 95% Limit Upper 95% limit
Mean - 1.96*SEM Mean+1.96*SEM
1 9 164.2 7.3 2.43 159.4 168.9
2 6 166.5 9.6 3.9 158.9 174.1
3 6 163.7 7.0 2.86 158.1 169.6
4 7 171.0 6.3 2.4 166.3 175.7
5 9 168.0 7.7 2.57 163.0 173.0
6 10 163.6 9.8 3.1 157.5 170.0
7 8 163.6 9.2 3.3 157.1 170.1
8 7 168.3 10.7 4.04 160.4 176.2
ALL 63 165.8 8.0

Sampling Distribution of %
• In the same way that we can establish the our ‘confidence’
in a sample mean as an estimate of a population mean
using Confidence Intervals
• we can also to do this for a proportion from a sample to
estimate a population proportion.
• We first examine the Sampling Distribution of a

Proportion (as a representation of a series of sample
proportions around a population proportion)
Playing Cards
Take a sample size n = 8

– Your interest is in the No. (%) of diamonds
You expect 2 (25%) diamonds
– What did you get?
Take another sample sized n = 8
– What might you get?
– 1/8 (12.5%); 2/8 (25%); 3/8 (37.5%); 4/8 (50%); 5/8; etc
Repeat for several samples and look at the distribution of the
proportion of diamonds you get
Sampling Distribution of a Proportion
• Is the distribution of all possible values of the proportion

that could be obtained in repeated samples of the same
size from the population of interest? Yes
• The sampling distribution of a proportion has

– A shape
– A mean
– A standard deviation
Study of smoking in PVD
Aim
– To estimate the proportion of smokers in patients
with peripheral vascular disease - PVD (the
population)
Materials and Methods
– Sample of n=200 patients
Results
– 130 are smokers in the sample of 200 = 65%
– The proportion of smokers in the sample is
p = 0.65
18
Sampling Variation
Many different samples of size

200 possible
Many different proportions of

smokers (of which we only got
one in the sample we happened
to take)
– Sample(1) 130/200 = 0.65

– Sample(2) 120/200 = 0.60
– Sample(3) 144/200 = 0.72
19
The different sample proportions form a normal

distribution
• As long as the sample size is big enough
The sample proportions are scattered around the unknown
population proportion: p
• The mean of the sampling distribution of a proportion is: p
The standard deviation of the sampling distribution of the
proportion is given by:
This is the Standard Error of the Proportion (SEP) 20

Confidence Interval for a Proportion
Since the population proportion, p, is unknown the sample

proportion, p, and the sample Standard Error of the Proportion
are used to estimate the 95% Confidence Interval for the

population proportion
95% CI =
21
Compute 95% CI for proportion
95% chance that the unknown population proportion p is

within:
p ± 1.96 p ( 1- p ) approximated by p ± 1.96 p ( 1- p )

n n
0.65 ± 1.96 0.65(1-0.65) 0.65 ± 1.96 (0.034)

200
0.65 ± 0.067 which is between 0.583 and 0.717
22
Confidence Interval
The level of confidence is 95%
The confidence interval for the proportion of smokers ranges

from 0.583 to 0.717
The confidence limits for the proportion of smokers are

0.583 and 0.717
Precision of CI depends on:

Sample Size (more precision with larger sample sizes), and
The parameter (mean, proportion, excess risk, relative risk)
23
Uses of Confidence Intervals
Constructed for Effect Measures /Estimates
A way to represent how "good" an estimate is at:
• Representing a population parameter (mean, %)
• Establishing if an effect measure is important (e.g. clinically
significant) or not – does the CI include or exclude the value of ‘no
effect’.
• Establishing if Relative Risks or Odds Ratios derived from
cohort/case-control studies represent risks or benefits associated
with exposures
• Establishing if Risk Differences or Attributable Risks are indeed
greater than 0
A reminder of the limitations of Effect Measures /Estimates

PHPS30020 Week1 (2) - 29nov2023 (Combined Sampling Variation - SEM SEP CIs)

Uploaded by

Copyright:

Available Formats

You might also like

PHPS30020 Week1 (2) - 29nov2023 (Combined Sampling Variation - SEM SEP CIs)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PHPS30020 Week1 (2) - 29nov2023 (Combined Sampling Variation - SEM SEP CIs)

Uploaded by

Copyright:

Available Formats

Public Health Medicine, 2023-24

Module Code: PHPS30020

Module Title: Biostatistics Section

Coordinator: Dr. Carla Perrotta

This class addresses important Statistical Phenomena of:

as an essential basis for understanding much of what

Ø In reality, rarely possible (feasible) to study an entire

Ø Usually collect data on a sample of the population of

Ø The objective is that the sample should be representative of

Ø Furthermore, it is well known that repeated samples from

ALL 63 165.8 8.0

• However, they will be ‘close’ to the true population values

• It is possible to quantify the variation in sample statistics

• One can consider a whole series of sample means

• They can even be ‘graphed’ to provide a picture of the

• That distribution will itself have a mean and a

That distribution will itself have:

• a mean, a.k.a the 'mean of sample means’ which is the

• a standard deviation, a.k.a. the ‘standard deviation of

• The Standard deviation of the Sampling Distribution of

• We use the SEM to comment on the Population Mean,

• In reality, we usually only take one sample from the

• From this one sample we can calculate the SEM, which is

• In the example given, we are estimating the

• the SEM (sd/√n); and

• (depending on sample size) knowledge of the scores

• Confidence Interval: Mean + z *(SEM)

For the purposes of this exercise use:

• 99% Confidence Interval: Mean + 2.58*SEM

• 90% Confidence Interval: Mean + 1.65*SEM

2 6 166.5 9.6 3.9 158.9 174.1

3 6 163.7 7.0 2.86 158.1 169.6

4 7 171.0 6.3 2.4 166.3 175.7

5 9 168.0 7.7 2.57 163.0 173.0

6 10 163.6 9.8 3.1 157.5 170.0

7 8 163.6 9.2 3.3 157.1 170.1

8 7 168.3 10.7 4.04 160.4 176.2

ALL 63 165.8 8.0

• We first examine the Sampling Distribution of a

Take a sample size n = 8

• Is the distribution of all possible values of the proportion

• The sampling distribution of a proportion has

Many different samples of size

Many different proportions of

– Sample(1) 130/200 = 0.65

The different sample proportions form a normal

This is the Standard Error of the Proportion (SEP) 20

Since the population proportion, p, is unknown the sample

are used to estimate the 95% Confidence Interval for the

95% chance that the unknown population proportion p is

p ± 1.96 p ( 1- p ) approximated by p ± 1.96 p ( 1- p )

0.65 ± 1.96 0.65(1-0.65) 0.65 ± 1.96 (0.034)

0.65 ± 0.067 which is between 0.583 and 0.717

The confidence interval for the proportion of smokers ranges

The confidence limits for the proportion of smokers are

Precision of CI depends on:

A reminder of the limitations of Effect Measures /Estimates