Sampling and Estimation: Research Methods: Lecture 6 Sarah Griffiths Sarah - Griffiths@ucl - Ac.uk

SAMPLING AND ESTIMATION
Research Methods: Lecture 6

Sarah Griffiths
sarah.griffiths@ucl.ac.uk
OVERVIEW
• Populations vs samples
• Different types of sampling
• Random vs convenience
• Sampling distributions
• Central limit theorem
• Standard error
• Confidence intervals
POPULATIONS AND SAMPLES
• When we conduct a study, we are not usually interested in the particular participants in
our study (our sample), we are interested in people in general (the population).
• Say our study was about the verbal skills of 5-6 year old children:
Population
Sample
5-6 year old children that happen

to be in our study All 5-6 year old children
RANDOM SAMPLING
• Random sampling: selecting members of the population at random.

e.g. select randomly from all children starting school in the county of Surrey in a
particular year
But not everybody will consent to taking part, and reasons for not taking part are
often non-random e.g. level of education, socio-economic status.
Random sampling is the best way to ensure the sample approximates the
population but it is almost impossible to get a truly random sample in psychology
studies….
OTHER TYPES OF SAMPLING
• Stratified sampling: Oversampling particular populations of interest

E.g. children with language difficulties
• Snowball sampling: recruiting through social networks for hidden, hard to
recruit populations
E.g. Autistic mothers
• Convenience sampling: recruiting sub-populations that are easy to access
E.g. Undergraduate psychology students
This is often what is used in psychology studies!
BIASED SAMPLES
• Does this matter?

• Yes, if the non-randomness is associated with the outcome
e.g. only sample children from high socio-economic areas in a study about
language development or only sample autistic mothers from particular
support networks in a study about attitudes towards autism
• No, if it is unlikely to be associated with the outcome
e.g. only sample from one part of the country when looking at short term
memory (no theoretical reason to think STM differs by geographical region)
SAMPLE STATISTICS
VS POPULATION PARAMETERS
• How tall is a newly discovered species of alien?

• Population = 16,870
Average height of all 16,870 aliens is the population mean

5.16 cm
Average height of a sample of 5 aliens is a sample mean
3.50 cm
3.69 cm
4.02 cm

2.80 cm
SAMPLING ERROR
• How likely is it that the population mean is exactly the same as the sample
mean?
• Not very likely!
• The difference between the population parameter and the sampling statistic is
called sampling error
sample
mean=3.83 cm p op u lation mean=???
SAMPLING ERROR
• How can I reduce sampling error?

Increase sample size!
• How close is my sample mean of 5 aliens likely to be to the

population mean?
To answer this question we need to understand the

“Sampling Distribution of the Mean”
POPULATION OF SAMPLE MEANS
• There are many possible samples of 5 aliens that I could have selected.
• All of these samples will have slightly different means.
sample
mean=3.64 cm
sample mean=4.06 cm

cm
sample
mean=3.83 cm sample
mean=3.98 cm
SAMPLING DISTRIBUTION
• The distribution of all possible sample means is called the ‘sampling

distribution of the mean’
• The sampling distribution of the mean has a normal distribution
SAMPLING DISTRIBUTION OF THE
MEAN
The sampling distribution of the mean follows

two rules in relation to the population distribution: N=5
1) The mean of sample means is equal to the

population mean
2) The standard deviation of sample means is

equal to the standard deviation of the
population, divided by the square root of the
sample size
EFFECT OF SAMPLE SIZE
The larger the sample size the

narrower the distribution of sample
means.
CENTRAL LIMIT THEOREM
• The sampling distribution of the mean

has a normal distribution
• Even when the variable itself is not
normally distributed
• http://onlinestatbook.com/stat_sim/sam
pling_dist/
STANDARD ERROR
• The
standard deviation of the sampling distribution of the mean is also called
the ‘Standard error’
• Standard error can be estimated from the sample standard deviation and the
sample size using the following formula:
SAMPLE SIZES
Sample size Sample Sample SD

mean
Scientist 1 5 3.83 0.866
Scientist 2 15 3.98 0.766
Scientist 3 50 3.70 0.622
CALCULATING STANDARD ERROR
0.866
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0.387
Sample size Sample mean Sample SD √5
Scientist 1 5 3.83 0.866
0. 7 66
Scientist 2 15 3.98 0.766 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0.198
√ 15
Scientist 3 50 3.70 0.622
0. 622
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0. 122
√ 50
PLOTTING STANDARD ERROR
• It is common to plot the interval within

one standard error of the mean as an
error bar.
• No standard error function in R
• Must calculate from standard deviation
and sample size e.g. sd/sqrt(n)
• In R, error bars can be added using
geom_errorbar() layer (see notes from
week 4 practical)
PLOTTING STANDARD ERROR
Sample size Sample Sample SD

mean = 3.95
Scientist 1 5 3.83 0.866
3.70
−0.122=3.7 1
Scientist 2 15 3.98 0.766
Scientist 3 50 3.70 0.622
0. 622
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0. 122
√ 50
+ geom_errorbar(aes(ymin=means-SEs,
ymax=means+SEs), position=position_dodge(.9))
INTERPRETING STANDARD ERROR
• We can be 68% confident that the population mean lies within one standard
error of the sample mean.
• This is because, 68% of sample means taken from the sampling distribution
will lie within one standard deviation of the population mean
• Note that it is not the same to say there is a 68% probability that the
population mean lies within one standard error of the sample mean.
95% CONFIDENCE INTERVALS
• If we wanted to calculate an interval that we could be 95% confident contained

the population mean, we would instead calculate the interval within 1.96
standard errors of the sample mean.
Sampling distribution of the mean
CI95
CI95
SDs
95% CONFIDENCE INTERVALS
• If we wanted to calculate an interval that we could be 95% confident contained

the population mean, we would instead calculate the interval within 1.96
standard errors of the sample mean.
Sample size Sample mean Sample SD CI95

Scientist 1 5 3.83 0.866
Scientist 2 15 3.98 0.766
CI95
Scientist 3 50 3.70 0.622
CI95
CI95cm
SUMMARY
• The choice of sampling method is an important consideration in study design and can
effect the validity of conclusions.
• Sample statistics will never match exactly the true population parameters we are
interested in, so it is important to present measures of confidence.
• Larger sample sizes increase the chance that our sample statistics will be an accurate
estimate of the true population parameters.
• Standard error is the standard deviation of the sampling distribution of the mean. We
can be 68% confident that the population mean lies within one standard error of the
sample mean.
• The 95% confidence interval contain the values that lie within 1.96 standard errors of
the sample mean. We can be 95% confident that the population mean lies within this
range.
READING
• Chapter 7: Sampling
• Statistics 101: Standard error of the mean https://
www.youtube.com/watch?v=uIHFbMn8SBc

Sampling and Estimation: Research Methods: Lecture 6 Sarah Griffiths Sarah - Griffiths@ucl - Ac.uk

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sampling and Estimation: Research Methods: Lecture 6 Sarah Griffiths Sarah - Griffiths@ucl - Ac.uk

Uploaded by

Copyright:

Available Formats

SAMPLING AND ESTIMATION

Research Methods: Lecture 6

5-6 year old children that happen

• Random sampling: selecting members of the population at random.

• Stratified sampling: Oversampling particular populations of interest

• Does this matter?

• How tall is a newly discovered species of alien?

Average height of all 16,870 aliens is the population mean

• How can I reduce sampling error?

• How close is my sample mean of 5 aliens likely to be to the

To answer this question we need to understand the

• The distribution of all possible sample means is called the ‘sampling

The sampling distribution of the mean follows

1) The mean of sample means is equal to the

2) The standard deviation of sample means is

The larger the sample size the

• The sampling distribution of the mean

Sample size Sample Sample SD

• It is common to plot the interval within

Sample size Sample Sample SD

• If we wanted to calculate an interval that we could be 95% confident contained

Sampling distribution of the mean

• If we wanted to calculate an interval that we could be 95% confident contained

Sample size Sample mean Sample SD CI95

You might also like