Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

SAMPLING AND ESTIMATION

Research Methods: Lecture 6


Sarah Griffiths
sarah.griffiths@ucl.ac.uk
OVERVIEW

• Populations vs samples
• Different types of sampling
• Random vs convenience
• Sampling distributions
• Central limit theorem
• Standard error
• Confidence intervals
POPULATIONS AND SAMPLES

• When we conduct a study, we are not usually interested in the particular participants in
our study (our sample), we are interested in people in general (the population).
• Say our study was about the verbal skills of 5-6 year old children:
Population
Sample

5-6 year old children that happen


to be in our study All 5-6 year old children
RANDOM SAMPLING

• Random sampling: selecting members of the population at random.


e.g. select randomly from all children starting school in the county of Surrey in a
particular year
But not everybody will consent to taking part, and reasons for not taking part are
often non-random e.g. level of education, socio-economic status.
Random sampling is the best way to ensure the sample approximates the
population but it is almost impossible to get a truly random sample in psychology
studies….
OTHER TYPES OF SAMPLING

• Stratified sampling: Oversampling particular populations of interest


E.g. children with language difficulties
• Snowball sampling: recruiting through social networks for hidden, hard to
recruit populations
E.g. Autistic mothers
• Convenience sampling: recruiting sub-populations that are easy to access
E.g. Undergraduate psychology students
This is often what is used in psychology studies!
BIASED SAMPLES

• Does this matter?


• Yes, if the non-randomness is associated with the outcome
e.g. only sample children from high socio-economic areas in a study about
language development or only sample autistic mothers from particular
support networks in a study about attitudes towards autism
• No, if it is unlikely to be associated with the outcome
e.g. only sample from one part of the country when looking at short term
memory (no theoretical reason to think STM differs by geographical region)
SAMPLE STATISTICS
VS POPULATION PARAMETERS

• How tall is a newly discovered species of alien?


• Population = 16,870

Average height of all 16,870 aliens is the population mean


5.16 cm
Average height of a sample of 5 aliens is a sample mean
3.50 cm
3.69 cm
4.02 cm
 
2.80 cm
SAMPLING ERROR

• How likely is it that the population mean is exactly the same as the sample
mean?
• Not very likely!
• The difference between the population parameter and the sampling statistic is
called sampling error

sample
  mean=3.83   cm  p op u lation mean=???
SAMPLING ERROR

• How can I reduce sampling error?


Increase sample size!

• How close is my sample mean of 5 aliens likely to be to the


population mean?

To answer this question we need to understand the


“Sampling Distribution of the Mean”
POPULATION OF SAMPLE MEANS

• There are many possible samples of 5 aliens that I could have selected.
• All of these samples will have slightly different means.

sample
  mean=3.64   cm

sample mean=4.06   cm
 
cm

sample
  mean=3.83   cm sample
  mean=3.98   cm
SAMPLING DISTRIBUTION

• The distribution of all possible sample means is called the ‘sampling


distribution of the mean’
• The sampling distribution of the mean has a normal distribution
SAMPLING DISTRIBUTION OF THE
MEAN

The sampling distribution of the mean follows


two rules in relation to the population distribution: N=5

1) The mean of sample means is equal to the


population mean

2) The standard deviation of sample means is


equal to the standard deviation of the
population, divided by the square root of the
sample size
EFFECT OF SAMPLE SIZE

The larger the sample size the


narrower the distribution of sample
means.
CENTRAL LIMIT THEOREM

• The sampling distribution of the mean


has a normal distribution
• Even when the variable itself is not
normally distributed

• http://onlinestatbook.com/stat_sim/sam
pling_dist/
STANDARD ERROR

• The
  standard deviation of the sampling distribution of the mean is also called
the ‘Standard error’
• Standard error can be estimated from the sample standard deviation and the
sample size using the following formula:
SAMPLE SIZES

Sample size Sample Sample SD


mean
Scientist 1 5 3.83 0.866
Scientist 2 15 3.98 0.766
Scientist 3 50 3.70 0.622
CALCULATING STANDARD ERROR

  0.866 
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0.387
Sample size Sample mean Sample SD √5
Scientist 1 5 3.83 0.866
  0. 7 66 
Scientist 2 15 3.98 0.766 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0.198
√ 15
Scientist 3 50 3.70 0.622
  0. 622
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0. 122
√ 50
PLOTTING STANDARD ERROR

• It is common to plot the interval within


one standard error of the mean as an
error bar.
• No standard error function in R
• Must calculate from standard deviation
and sample size e.g. sd/sqrt(n)
• In R, error bars can be added using
geom_errorbar() layer (see notes from
week 4 practical)
PLOTTING STANDARD ERROR

Sample size Sample Sample SD


mean   = 3.95
Scientist 1 5 3.83 0.866
3.70
  −0.122=3.7 1
Scientist 2 15 3.98 0.766
Scientist 3 50 3.70 0.622

  0. 622
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟= =0. 122
√ 50

+ geom_errorbar(aes(ymin=means-SEs,
ymax=means+SEs), position=position_dodge(.9))
INTERPRETING STANDARD ERROR

• We can be 68% confident that the population mean lies within one standard
error of the sample mean.
• This is because, 68% of sample means taken from the sampling distribution
will lie within one standard deviation of the population mean
• Note that it is not the same to say there is a 68% probability that the
population mean lies within one standard error of the sample mean.
95% CONFIDENCE INTERVALS

• If we wanted to calculate an interval that we could be 95% confident contained


the population mean, we would instead calculate the interval within 1.96
standard errors of the sample mean.

Sampling distribution of the mean

  CI95

  CI95

SDs
95% CONFIDENCE INTERVALS

• If we wanted to calculate an interval that we could be 95% confident contained


the population mean, we would instead calculate the interval within 1.96
standard errors of the sample mean.

Sample size Sample mean Sample SD CI95


 
Scientist 1 5 3.83 0.866
Scientist 2 15 3.98 0.766
  CI95
Scientist 3 50 3.70 0.622

  CI95

  CI95cm
SUMMARY

• The choice of sampling method is an important consideration in study design and can
effect the validity of conclusions.
• Sample statistics will never match exactly the true population parameters we are
interested in, so it is important to present measures of confidence.
• Larger sample sizes increase the chance that our sample statistics will be an accurate
estimate of the true population parameters.
• Standard error is the standard deviation of the sampling distribution of the mean. We
can be 68% confident that the population mean lies within one standard error of the
sample mean.
• The 95% confidence interval contain the values that lie within 1.96 standard errors of
the sample mean. We can be 95% confident that the population mean lies within this
range.
READING

• Chapter 7: Sampling
• Statistics 101: Standard error of the mean https://
www.youtube.com/watch?v=uIHFbMn8SBc

You might also like