Professional Documents
Culture Documents
Chapter 3 - Sampling - 2019
Chapter 3 - Sampling - 2019
3
Sampling & Standard Errors
Contents
Standard Error of the Mean: It’s Nature via Resampling .............................................................. 2
Standard Error of the Mean: It’s Nature Via Formula ................................................................... 6
How Large of a Sample Size is Enough? ........................................................................................ 8
Confidence Intervals ..................................................................................................................... 9
Confidence Intervals: 68% ....................................................................................................... 10
Confidence Intervals: 95% ....................................................................................................... 12
One-Sample t-Test ...................................................................................................................... 13
One-Sample t-Test: 95% Confidence Intervals........................................................................ 15
One-Sample t-Test: p-value (N = 10) ....................................................................................... 17
One-Sample t-Test: p-value (N = 100) ..................................................................................... 19
One-Sample t-Test: SPSS ......................................................................................................... 20
p-Values ....................................................................................................................................... 21
Types of One-Sample t-Tests ...................................................................................................... 22
Assumptions ................................................................................................................................ 23
Standard Errors and More Statistics ........................................................................................... 23
Standard Error of Skew ........................................................................................................... 23
Standard Error of Kurtosis ....................................................................................................... 25
Standard Error of the Median ................................................................................................. 26
Summary ..................................................................................................................................... 26
Advanced Topics ......................................................................................................................... 27
Effect Size ................................................................................................................................ 27
Monte Carlo Simulation .......................................................................................................... 28
One-Sample t-Test: Bootstrapping.......................................................................................... 29
p-Values and Point-Estimate Confidence Intervals: Revisited ................................................ 29
Central Limit Theorem ............................................................................................................ 30
Practice Questions ...................................................................................................................... 31
References................................................................................................................................... 34
CHAPTER 3: SAMPLING & STANDARD ERRORS
In the first chapter of this textbook, I stated that it was not necessary to have access to
a population, in order to make inferences about the population. Instead, one could use a
random sample (or a convenience sample) of data from which an estimate could be calculate
to represent the population of interest. For example, in chapter 2, based on a sample of 11
observations (N = 11), I estimated the mean amount of sleep people have per day to be 7.36.
In the context of statistics, an estimate represents a value that equals the population
parameter, with a certain margin of error. The estimated quantity is associated with a certain
amount of error, in the sense that it is highly unlikely that the estimate will equal the
population parameter exactly. The amount of error associated with various statistical
estimates can be calculated with a broad classification of statistics known as standard error.
Standard errors are essential to the application of statistics, as they form the basis of testing
hypotheses statistically, when we only have access to samples. Virtually all statistics have a
corresponding standard error that has been discovered. Perhaps the mostly commonly
calculated standard error is the standard error of the mean.
C3.2
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
parameter (i.e., with not much error), based on the following illustration. The illustration was
prepared for two principal reasons. First, to show you how good samples of only 1,000 cases
are a producing point-estimates remarkably close to the population parameter consistently.
Secondly, to illustrate the fundamental relationship between repeated sampling and standard
error.
To illustrate the above two points, I have created a data set with a population of
100,000 cases. I could have created a data set (population) that was larger (say, 1 billion);
however, doing so would not have made much difference to the point of this illustration.
Based on the data file with 100,000 cases, the population mean (a.k.a., μ or mu)1 was
calculated at 7.31. With a computer program, I randomly sampled six different samples (N)
from the population of 100,000 cases. Each sample had a different sample size (N): 5, 10, 50,
100, 1,000 and 5,000.
The sample means (M) estimated from the six samples are reported in Table C3.1. It
can be seen that none of the sample means were equal to the population mean of 7.31.
However, it will be noted that greater accuracy tended to be observed as the sample sizes
increased. In particular, the difference between the sample mean and the population mean
was relatively large for small sample sizes. For example, the sample size of N = 5 suggested
that the population mean was 6.60. That’s an “error” of -.71, given that the population mean
was 7.31. By contrast, there was very little difference between the sample means and the
population mean at sample sizes of 1,000 and 5,000. For example, with a sample size of 1,000,
the sample mean was estimated at 7.35, which was close to the population mean of 7.31.
Based on a sample size of 5,000, the sample mean of 7.29 was very close to the population
mean of 7.31. The results reported in Table C3.1 should correspond to your intuition: larger
sample sizes give more accurate results (estimates) (Watch Video 3.1: Understanding the
standard error of the mean via resampling).
Table C3.1. Random Sample Means With Various Sample Sizes Drawn From a Population
Sample Size (N)
5 10 50 100 1,000 5,000
M 6.60 8.30 7.46 7.18 7.35 7.29
Deviation -.71 .99 .15 -.13 .04 -.02
Note. N = sample size; M = sample mean; Deviation = difference from population mean (7.31)
To extend the illustration further, I randomly sampled each of the six sample sizes
above from the population of 100,000 cases a total of 10 separate times. I did so to illustrate
that larger sample sizes provide more stable estimates consistently, whereas smaller sample
1
The mean calculated from a population is commonly symbolized as μ or mu (pronounced
‘mew’). In this textbook, for the sake of simplicity, I tend to refer to a mean derived from a
sample as a ‘sample mean’ and the mean associated with a population as a ‘population mean’.
C3.3
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
sizes are much more variable in their estimates. To repeat, standard error is fundamentally
related to the notion of repeated sampling. Sample estimates from relatively large sample
sizes should not change much from sample to sample.
As can be seen in Table C3.2, the sample means derived from the sample sizes of N = 5
cases yielded, again, the least accurate estimates of the population mean. Specifically, the
values associated with the N = 5 sample mean estimates ranged from as low as 6.6 to as high
as 8.0. By contrast, the amount of variability (range) in the N = 5,000 sample mean estimates
was very narrow: 7.29 to 7.34.
To represent the amount of variability in the sample mean estimates numerically, I
calculated the standard deviation (SD) associated with the 10 sample means for each of the six
sample sizes. As can be seen in the bottom row of Table C3.2, the standard deviation (SD)
associated with the 10 mean estimates obtained from the smallest sample size (N = 5) was SD
= .51. By contrast, the means estimated from the largest sample size of 5,000 yielded a
miniscule SD of .02. Again, not only did the sample size of 5,000 yield accurate estimates of the
population mean, the estimates were consistently good. Importantly, the standard deviation
associated with the repeatedly sampled means is an estimate of the standard error of the
mean (Watch Video 3.2: The impact of sample size on the standard error of the mean). That’s
right, standard error is a standard deviation. Let that sink in!
To represent the results in Table C3.2 in graphical form, I created six histograms for
the 10 separate mean estimates across the six sample sizes. As can be seen in Figure C3.1, as
the sample size increased, the amount of variability in the estimated means decreased (Watch
Video 3.3: The standard error of the mean is a standard deviation? Yes). Stated alternatively,
as the sample size increased, the standard error decreased. In particular, it will be noted that
with a sample size of 5, the sample means were highly variable, in comparison to the sample
size of 5,000, which yielded an extremely narrow histogram. Specifically, the means obtained
from the random samples of 5,000 were very consistently around the population value of 7.31.
The results displayed in Table C3.2 and Figure C3.1 should go some way to convince you that
there is not typically much benefit to conducting a study with a sample size with more than
5,000 cases.2 Even a sample of 1,000 gives a high level of accuracy. I’ll note that it would not
matter how large the population is to which you want to infer your results. Also, convenience
samples would yield the same effect demonstrated in Table C3.2 and Figure C3.1. However,
with convenience samples, we simply would not know if the sample statistics would actually
hover around the population parameter associated with the population of interest.
The nature of standard error, as represented by SD in the last row displayed in Table
3.2, was demonstrated by actual resampling, which is a laborious process. Rather than
2
Exceptions include studies where one is interested in studying a very rare phenomenon (e.g.,
rare disease), or a large number of exploratory predictors are used to predict a phenomenon
with very weak explanatory capacity, for example.
C3.4
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
estimate several randomly sampled means across several samples to gauge the accuracy of a
mean estimate, it is possible to estimate the standard error of the mean efficiently with a
formula, which I describe next.
Table C3.2. Ten Random Sample Means With Various Sample Sizes Drawn From a Population
Sample Size (N)
5 10 50 100 1,000 5,000
M1 7.60 7.30 7.04 7.17 7.31 7.29
M2 7.00 7.30 7.34 7.27 7.27 7.34
M3 6.60 6.40 7.48 7.30 7.29 7.32
M4 7.00 7.00 7.18 7.38 7.31 7.32
M5 7.80 7.20 7.62 7.33 7.35 7.31
M6 6.60 7.60 7.30 7.33 7.30 7.32
M7 6.80 7.10 7.38 7.12 7.31 7.33
M8 7.40 7.40 7.36 7.27 7.32 7.30
M9 7.60 7.50 7.30 7.48 7.36 7.30
M10 8.00 7.30 7.32 7.06 7.34 7.32
GM 7.24 7.21 7.33 7.27 7.32 7.32
GM-Mu -.07 -.10 .02 -.04 .01 .01
SD .51 .33 .16 .13 .03 .02
Note. M = mean; GM = grand mean; GM-Mu = grand mean minus Mu; SD = standard deviation.
C3.5
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Figure C3.1. Histograms of Sample Means Across Sample Sizes Drawn From a Population (Mu = 7.31)
C3.6
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Recall that the standard error of the mean represents an estimate of the standard deviation of
repeatedly sampled means, drawn from a population, all with the same sample size. The
formula developed to estimate the standard error of the mean from a single sample is:
SD
SE X (1)
N
where SD equals the standard deviation associated with the data points used to estimate the
mean, and N equals the total number of data points in the sample. Thus, with formula (1), the
standard error associated with a sample mean can be estimated very efficiently. All you need
is the standard deviation and the sample size: Brilliant!
To demonstrate the use of the standard error of the mean formula, I have re-reported
in Table C3.3 the means estimated from the population of 100,000 cases that I obtained from
the first set of sampling (i.e., Table C3.2, first row). I have also reported new information in
Table C3.3: the corresponding standard deviations (SD) for each of the samples. For example,
the standard deviation associated with the mean of 7.60 (N = 5) was 1.673. Similarly, the
standard deviation associated with the mean of 7.29 (N = 5,000) was 1.070. With the standard
deviations and the sample sizes, I then calculated the standard errors for the six sample means
(for thoroughness, I have reported the calculations in Table C3.4). It can be seen in Table C3.3
that the N = 5 standard error of the mean was estimated at a whopping .75. By contrast, the N
= 5,000 standard error of the mean was estimated at a miniscule .02. These results imply that
we can have much more confidence in the accuracy of the mean estimate derived from the
sample size of 5,000, in comparison to the sample size of only N = 5. Such a conclusion is the
same as that reached from the repeated sampling approach to the estimation of the standard
error of the mean I described in the previous section. As can be seen in Table C3.3, the
repeated sampling standard error of the mean (SER) estimates corresponded rather closely to
the formula-based standard error of the mean estimates (SEF) (Watch Video 3.4: Standard
error of the mean: Formula vs resampling). This is not a coincidence. They are two approaches
to the estimation of the same phenomenon of standard error. I hope you can now appreciate
the importance of repeated sampling in the context of the nature of standard error. Also, I
hope you can appreciate that standard error is the standard deviation of the repeatedly
samples estimate from a population. Finally, standard error is the fundamental basis of
inferential statistics, as I demonstrate later in this chapter with the introduction of the one-
sample t-test.
C3.7
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Table C3.3. Descriptive Statistics Associated with the First Samples Drawn From the Population
Sample Size (N)
5 10 50 100 1000 5000
M1 7.60 7.30 7.04 7.17 7.31 7.29
SD1 1.673 1.059 1.049 .900 1.057 1.070
SEX . R .51 .33 .16 .13 .03 .02
SEX . F .75 .34 .15 .09 .03 .02
M = mean; SD = standard deviation; SEX .R = standard error of the mean from resampling; SEX . F
Table C3.4. Calculated Standard Error of the Means for the First Six Samples
Sample Size (N)
5 10 50 100 1,000 5,000
1.673 1.059 1.049 .900 1.057 1.070
SE SE SE SE SE SE
5 10 50 100 1000 5000
SEX . F .748 SEX . F .335 SEX . F .148 SEX . F .090 SEX . F .033 SEX . F .015
C3.8
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Figure C3.2. Plots of Standard Error of the Means Across Sample Sizes
Panel A: N = 10 to 5,010
Panel B: N = 10 to 1,010
Confidence Intervals
The results reported in Figure C3.2 should give you a sense of the influence of sample
size on the accuracy of the estimation of a statistic such as the mean. As you will discover
throughout this textbook, standard error forms the basis of a great many statistical analyses.
Furthermore, a fuller appreciation of the meaningfulness and utility of a standard error is
arguably incomplete, unless it is complimented with the calculation of confidence intervals.
The mean estimates reported in Table C3.2, for example, are point-estimates. A point-
estimate is calculated from the available data and represents the ‘best guess’ value of a
population parameter. By contrast, confidence intervals are a range of values, a lower-bound
and an upper-bound, which give some indication of the confidence with which we can place in
the precision of the point-estimate.
To repeat, all of the means (M1 to M10) reported in Table C3.3 are point-estimates.
Theoretically, each mean’s corresponding standard error could be added to and subtracted
from the mean, in order to obtain 68% confidence intervals around the point-estimate (Watch
C3.9
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Video 3.6: Understanding what confidence intervals are: 68% CIs). For example, the mean (M1)
point-estimate of 7.04 (N = 50) could have .15 added to and subtracted from itself to yield the
following values:
7.04 - .15 = 6.89
7.04 + .15 = 7.19
Theoretically, the values of 6.89 and 7.19 correspond to the sample mean’s 68%
confidence intervals. Thus, it may be suggested with 68% confidence that the population mean
is somewhere between 6.89 and 7.19. Where does 68% come from, you might ask? It comes
from the standard normal distribution (i.e., z-distribution). If you revisit Figure C2.6 in chapter
2, you will notice that the values between -1.0 and 1.0 within the standard normal distribution
represent 68.26% of the sample observations. The same phenomenon applies here, in theory.
Recall that the standard error is a standard deviation. It represents the standard deviation
associated with the repeatedly sampled estimates.
I wrote ‘in theory’, because it is known that data obtained from samples do not follow
the z-distribution perfectly. The z-distribution only works perfectly with population data.
Consequently, statisticians have developed a slightly different distribution known as the t-
distribution to better represent data obtained from a sample with a specified sample size.
Sample size is a key consideration, here, because it has been discovered that the smaller the
sample size, the larger the discrepancy between the z-distribution and the t-distribution. With
a sample size of about 100 or greater, the z-distribution and the t-distribution become very
similar. However, with sample sizes less than, say, 50, the difference between the z-
distribution and the t-distribution is appreciable, especially at the tail ends of the distribution
(Watch Video 3.7: What’s the difference between the z- and t-distribution?).
As an example, I calculated the accurate 68% confidence intervals for the N = 5 sample
data, as reported in Table C3.3. The sample mean was M = 7.60 and the standard deviation
was SD = 1.673 (step 1). As reported in Table C3.4, the standard error of the mean (SEF) was
C3.10
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
estimated at .748 (step 2). Next, the degrees of freedom were calculated at 5 – 1 = 4 (step 3).
Next, the relevant t-value from the t-distribution was identified with the following Excel
function: (Watch Video 3.8: Referencing the t-distribution in Excel):
=T.INV.2T(0.32, 4)
The value of 0.32 that was inputted into the Excel function corresponds to 1.00 - .68
(i.e., 100% - 68%). The value of .32, or 32%, represents the proportion of the t-distribution that
will not be covered by the 68% confidence intervals. In statistics, this value (i.e., .32, in this
case) is known as alpha (α). The value of 4 in the Excel function corresponds to N – 1 (i.e., 5 – 1
= 4, in this example). The Excel function produced a t-value of 1.134 (step 4). Thus, 68.26% of
observations lay somewhere between -1.134 and 1.134 within the t-distribution with 4
degrees of freedom. I’ll note that a t-value of |1.134| is fairly close to a z-value of |1.0|, so,
the t-distribution and the z-distribution do have some resemblance to each other, even with df
as low as 4 (N = 5). Again, though, the t-distribution is more accurate with samples, especially
small samples, in comparison to the z-distribution.
Next, the identified t-value and the standard error were multiplied together which
yielded the following product: 1.134 * .747 = .847 (step 5). Finally, the product obtained from
step 5 was added to and subtracted from the sample mean of 7.60 (step 6):
68%CI Lower-Bound: 7.60 - .847 = 6.753
68%CI Upper-Bound: 7.60 + .847 = 8.447
Thus, the lower-bound and upper-bound 68% confidence intervals corresponded to 6.753 and
8.447, respectively (Watch Video 3.9: Calculating 68% confidence intervals).
One way to interpret the 6.753 and 8.447 confidence intervals is to suggest that the
chances that the mean in the population is somewhere between 6.75 and 8.45 is equal to 68%
(or 68.26% to be ultra-precise). Stated alternatively, there is a 68% chance that the mean in
the population is somewhere between the lower-bound and upper-bound 68% confidence
intervals, i.e., 6.75 and 8.45. I suspect most researchers and students accept intuitively such an
interpretation of confidence intervals. However, it would be fair to suggest that there is a
substantial amount of conflicting information in the literature on the “appropriate”
interpretation of confidence intervals (Hoekstra, Morey, Router, & Wagenmakers, 2014). At
the risk of aggravating some people, I suggest you avoid the mess in this area of theoretical
statistics, as the experts themselves cannot agree on the precise, technical interpretation of a
confidence interval. Instead, I recommend that you interpret confidence intervals as I did
above, as I believe it is fairly accurate representation of the nature of a confidence interval.
To help support the intuitive interpretation of a confidence interval, consider that, in
this example, we know that the per day population sleep mean was equal to 7.31 hours (based
on a population of 100,000 cases). That’s an indisputable fact. It is also a fact that the 68%
confidence intervals reported above for the N = 5 sample (6.753 and 8.447) captured the
population mean of 7.31 (i.e., 7.31 is somewhere in between 6.753 and 8.447). Finally, based
C3.11
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
on my own simulation, samples of N = 5 redrawn 1,000 times from the 100,000 population
(with μ = 7.31) yielded 68% confidence intervals that captured the population mean of 7.31
across 659 of the resamples, i.e., 66% of the time (659 / 1000 = .659). True, 66% is not equal to
68%, however, it is rather close. Also, it should be kept in mind that the standard error of the
mean formula (1) is itself an estimate with its own standard error. Consequently, we can speak
only in approximate terms, here. In summary, I believe normal theory confidence intervals do
a good job at representing the chances with which a population parameter resides within the
lower- and upper bound estimates. As I demonstrate later in this chapter, confidence intervals
can also be shown to be directly relevant to conventional hypothesis testing, which is the
cornerstone of inferential statistics.
C3.12
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
estimated. When 68% confidence intervals are estimated, the researcher accepts that there is
32% chance (i.e., alpha) that the population mean resides outside the confidence intervals. To
repeat, the chances that the population mean is somewhere outside the confidence intervals
is the margin of error a researcher has specified as maximally acceptable. Again, the margin of
error is known as alpha (α) in statistics. In practice, researchers almost always specify α = .05.
Clearly, the 95% confidence intervals of 5.53 and 9.67 represent a wide range. It
suggests that, on average, people report sleeping somewhere between 5.53 and 9.67 hours
per night (in the population). For the purposes of illustration, I calculated the accurate 68%
and 95% confidence intervals for all six of the first samples drawn from the population (i.e., N
= 5 to N = 5,000). As can be seen in Table C3.5, the sample mean of 7.29, based on N = 5,000,
yielded very narrow 95% confidence intervals of 7.25 and 7.33. Again, the confidence intervals
captured the population mean of 7.31 (there was a 95% chance that they would). Thus,
confidence intervals derived from a collection of samples with N = 5,000 observations yield a
narrow range of values relevant to where the population value might be. Finally, I’ll note that
the reduction in the range of the confidence intervals from a sample size of 1,000 to 5,000
should be considered rather negligible (Watch Video 3.11: Compare 68% and 95% CIs).
One-Sample t-Test
Prior to writing this textbook, I was under the impression that adults slept, on average,
8 hours a night. There were two reasons why I thought that, as an average, adults slept 8 hours
a night. First, everyone in my immediately family slept around 8 hours per night (at least). Also,
anyone else I lived with as an adult slept a solid 8 hours a night. Thus, at least based on my
experience, it was reasonable to hypothesize that adults slept, on average, 8 hours every 24
hours. Over the years, I told a lot of people that the average amount of sleep adults get is
about 8 hours a night.
C3.13
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
In preparation for this textbook, I came across Kripke et al.’s (2002) study (see chapter
2), which is when I began to have serious doubts about the accuracy associated with my long-
held impression. Based on a community sample of adults, Kripke et al. (2002) reported the
average amount of hours adults slept in a 24 hour period was 7.31 (SD = 1.06). Ultimately, my
impression of 8.00 was just that – an impression based on anecdotal evidence. As a scientist, I
should put my impressions to the test, statistically, when possible.
Of course, given what was learned about sampling error in this chapter, Kripke et al.’s
(2002) sample mean estimate of N = 7.31 is unlikely to be a perfectly accurate representation
of the mean in the population. In fact, Kripke et al.’s estimate of 7.31 might not be statistically
significantly different to my long-held impression of 8.00 hours. If that were the case, then
there would not be any convincing statistical evidence to abandon my impression of 8.00
hours. When a sample mean (say, 7.31) is tested statistically against a non-sample mean (say,
8.00), one is said to have conducted a ‘one-sample t-test’. In this section of the chapter, I will
demonstrate how to conduct a one-sample t-test. Along the way, you will learn if my long-held
impression of 8.00 hours is untenable statistically. I will not go down without a fight!
Hypotheses
Formally, most statistical analyses are associated with underlying hypotheses (see
chapter 1). In this example, I have specified two hypotheses: a null hypothesis and an
alternative hypothesis. With respect to the one-sample t-test, the null hypothesis specifies
that there is no difference between the sample mean and the non-sample mean (other than
sampling fluctuations). By contrast, the alternative hypothesis states that the sample mean
and the non-sample mean are unequal in the population. Thus, the generic one-sample t-test
null and alternative hypotheses are:
Null Hypothesis (H0): The sample mean and the non-sample mean are equal.
Alternative Hypothesis (H1): The sample mean and the non-sample mean are unequal.
Of course, it is unlikely that the sample mean and the non-sample mean will be exactly
equal for any particular sample. However, a sample mean may be observed to be numerically
different to the non-sample mean only because of sampling fluctuations (i.e., chance). A
statistical analysis that can test the hypotheses above is the one-sample t-test. The one-
sample t-test was introduced as a test of the difference between a sample mean and another
mean value that was not obtained from a sample: that’s why it is called a one-sample t-test:
there’s only one-sample. It is also called a one-sample t-test because it is reliant upon the t-
distribution, just like the confidence intervals estimated in the previous section were based on
the t-distribution, rather than the z-distribution. In the following sections, I describe how to
perform a one-sample t-test with two similar approaches. The two approaches give exactly the
C3.14
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
same concluding results. However, both approaches give interesting and useful information,
which makes it worthwhile to understand both approaches. I’ll admit that it will take some
effort to understand the next couple of sections. It will be rewarded effort, though, as an
understanding of the following procedures and principles will hold you in good stead to
understanding the remaining chapters of this textbook.
For the purposes of this illustration, let’s pretend that Kripke et al.’s (2002) study was
based on a sample of N = 10 people. As described in chapter 2, the Kripke et al. (2002) hours of
sleep per day data were associated with a mean of 7.31 and a standard deviation of 1.06 (step
1). The difference between the sample mean and the non-sample mean was equal to -.69 (i.e.,
7.31 – 8.00; step 2).3 Next, the standard error of the sample mean was estimated at SEX =
3
Hypothetically, if there was absolutely no numerical difference between the sample mean
and the non-sample mean, the analysis would be terminated.
C3.15
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
.335 (1.06 / √10 = .34; step 3). The degrees of freedom corresponded to 9 (N – 1; step 4). I’m
halfway done the analysis, already!
Next, in order to calculate the 95% confidence intervals, the standard error of the
mean needs to be multiplied by a value (i.e., step 5). I could multiply the standard error of the
mean by z = 1.96, in order to get an approximate value. However, to be more accurate, I
should multiply the standard error of the mean by the t-value which corresponds to N-1 (step
5). I used the following function in Excel to identify the t-value:
=T.INV.2T(0.05, 9)
The Excel function above yielded a t-value of 2.262 (step 5). Therefore, the confidence interval
multiplier equaled: 2.262 * .335 = .758 (step 6). Finally, the lower-bound and upper-bound
95% confidence intervals corresponded to -1.448 and .068 (step 7):
95%CI Lower-Bound: -.69 - .758 = -1.448
95%CI Upper-Bound: -.69 + .758 = .068
In simple terms, the numerical difference of -.69 was associated with 95% confidence
intervals equal to -1.448 and .068, based on a sample of 10 cases. Thus, it may be suggested
with 95% confidence that the difference between the sample mean estimate of how many
hours people sleep per night on average and my long-held impression of 8.00 hours is
somewhere between -1.45 and .07 (rounded) in the population. Because the lower-bound and
upper-bound 95% confidence intervals did intersect with zero, it cannot be suggested with
95% confidence that the -.69 numerical difference is different from zero (step 8). Stated
alternatively, the null hypothesis of no difference between the sample mean and my long-held
impression of 8.00 hours cannot be rejected. At least, not based on this sample of N = 10.
(Watch Video 3.12: One Sample t-Test via 95%CI (Non-Significant Example)).
As I mentioned in my description of the results reported in in Figure C3.2, a study
conducted with a sample size of 10 is probably a waste of time. Kripke et al. (2002) used a
sample larger than 10, but I don’t want to mention what it was, yet. A sample size of 100 offers
much greater chances of finding a statistically significant effect, in comparison to N = 10, all
other things equal. Consequently, it would be more useful to test the null hypothesis of no
difference between a sample mean and the non-sample mean of 8.00 with an N = 100 sample.
For the second illustration, let’s pretend Krikpe et al. (2002) collected data from a sample of
100 community participants. Furthermore, let’s pretend that the sample mean and standard
deviation were the same as the N = 10 example (i.e., M = 7.31; SD = 1.06; step 1).
Correspondingly, the numerical difference between the sample mean and the non-sample
mean was the same at -.69 (i.e., 7.31 – 8.00; step 2).
By contrast, the standard error of the mean needed to be recalculated in this follow-
up example, because the sample size was different (i.e., N = 100). Thus, SEX = 1.06 / √100 =
1.06 / 10 = .106 (step 3). The degrees of freedom were df = 99 (100 - 1; step 4). Recall that, in
C3.16
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
order to calculate the 95% confidence associated with the numerical difference of -.69, the
standard error of the mean needs to be multiplied by a value from the t-distribution. Again,
the z-distribution value of 1.96 could be used, but greater accuracy will be obtained from using
the t-distribution. I used the Excel function to obtain the relevant t-value with N – 1 degrees of
freedom and alpha = 05.
=T.INV.2T(0.05, 99)
The t-value was equal to |1.984| (step 5). Thus, the standard error of the mean (i.e., .11) was
multiplied by |1.984|, which yielded .210 (step 6). Next, the lower-bound and upper-bound
95% confidence intervals corresponded to:
95%CI Lower-Bound: -.69 - .210 = -.900
95%CI Upper-Bound: -.69 + .210 = -.480
In simple terms, the numerical difference of -.69 was associated with 95% confidence intervals
equal to -.900 and -.480. Because the lower-bound and upper-bound 95% confidence intervals
did not intersect with zero, the null hypothesis of no difference between the sample mean and
non-sample mean of 8.00 can be rejected (p < .05) (Watch Video 3.13: One Sample t-Test via
95%CI (Significant Example)). Thus, there likely is a difference between the amount of hours
people sleep per night and my long-held impression of 8 hours, based on N = 100. My
impression was likely an overestimate all this time. How embarrassing. I misled all of those
people! The truth is that the Kripke et al.’s (2002) study was based on a sample of 1,116,936
participants! I knew my impression of 8.00 hours was wrong, as soon as I saw the sample size
reported in their paper.4
4
Technically, Kripke et al. (2002) only asked people to report how much sleep they get per
night. People may not have given accurate information. So, who knows, 8.00 hours may be a
better reflection of what people actually sleep. Like I said, I’m not going down without a fight.
C3.17
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
The formula for the one-sample t-test (i.e., step 2) consists of two parts: (1) the
difference between the sample mean and the non-sample mean; and (2) the standard error of
the sample mean. As can be seen in formula (6), the difference between the sample mean ( X )
and the non-sample mean is placed in the numerator. In this formula, the non-sample mean is
symbolized with μ (pronounced ‘mew’). The standard deviation and sample size are placed in
the denominator. Recall that the denominator portion of the formula corresponds exactly to
the standard error of the mean. The ratio is known as a calculated one-sample t-test t-value:
X
One Sample t (6)
SD / N
As per the 95% confidence interval approach, I conducted the analysis twice: once
with N = 10 and once with N = 100. In this example, the non-sample mean (μ) was equal to my
long-held impression of 8.00 hours. Additionally, I used a sample mean of 7.31, based on a
sample size of N = 10 cases. Based on SD = 1.06 and N = 10, the standard error of the mean
was equal to .335. Thus, the calculated t-value corresponded to -2.060:
7.31 8.00 .69
t 2.060
1.06 / 10 .335
In rough terms, you should consider a t-value of approximately |1.96| or larger (either positive
or negative) as so large as to be ‘beyond chance’ (i.e., p < .05), or at least very nearly so, in
most cases (i.e., depending on sample size). Recall from the z-distribution that a value of 1.96
or -1.96 was considered relatively large, in the sense that it was a very appreciably distant
from the mean of 0. In fact, only approximately 5% of all z-values exceed 1.96 and -1.96.
To repeat, when one obtains a calculated t-value of approximately |1.96| or larger, it
is the starting point of the possibility of having obtained a statistically significant effect. One
then has to find out precisely the probability of having obtained the sample data (i.e.,
calculated t-value), or even more extreme sample data (i.e., larger calculated t-value), under
the expectation that the null hypothesis is true. Whether a t-value of |1.96| or larger is
‘statistically significant’ will, ultimately, depend upon sample size. A larger sample size will
suggest a greater chance of statistical significance, all other things equal. In this illustrative
example, the sample size was only 10 cases. As depicted in Figure C3.2, a sample size of 10
cases does not give us much statistical confidence. Consequently, a t-value of -2.060, based on
a sample size of 10 cases, could have arisen simply by chance. In order to determine precisely
the probability with which a t-value of -2.060 could have arisen simply by chance, the
percentile associated with a t-value of -2.060 and N = 10 can be determined by placing it
C3.18
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
within the context of the theoretical t-distribution. To do so, I used the following Excel
function:
=TDIST(t, N-1, 2)
Where t = |t-value| (only positives)5, N – 1 equals the sample size minus 1, and 2 = both sides
of the t-distribution. Thus, for this example (Watch Video 3.14: Obtaining a p-value from Excel
(for t-value)):
=TDIST(2.060, 9, 2)
When I applied the above Excel function, I obtained the following result: .069. The value of
.069 is a probability, or p-value. So, what does this p-value of .069 mean? In the next section, I
describe how to interpret p-values in detail. Briefly, I mention here that a p = .069 implies that
I cannot reject the null hypothesis of equality between the sample mean and the non-sample
mean. The reason I cannot reject the null hypothesis is because the p-value was not less than
.05. I’ll note that it was not a coincidence that the confidence intervals associated with the
95% confidence interval approach to the one-sample t-test intersected with zero (described in
the previous section; 95%CI: -1.448/.068) and the p-value from the one-sample t-test reported
here was greater than .05 (for N = 10). They are fundamentally the same analysis; they’re just
slightly different approaches to the same test. If you understand this, you’re doing really well,
as it is the fundamental basis of inferential statistics.
A t-value of -6.509 is much larger than |1.96|, so you should be thinking that the difference
between the sample mean of 7.31 and 8.00 is ‘beyond chance’ (i.e., p < .05). To calculate the
probability exactly, the following Excel function was used:
=TDIST(6.27, 99, 2)
The Excel function above yielded 3.13945E-09. This value corresponds to p = .0000000031394.
Because the magnitude of the p-value was less than .05, we can reject the null hypothesis of
equality between the sample mean (7.31) and my long-standing hunch of 8.00 hours sleep per
night. Such a result is consistent with the 95% confidence intervals reported earlier: they did
not intersect with zero when based on N = 100. As mentioned in the previous section on
5
As the implications associated with a negative and positive t-value are the same, the Excel
function only works with positive t-values.
C3.19
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
confidence intervals, Kripke et al.’s (2001) study was based on more than 1 million people.
Again, I knew my impression of 8.00 hours was wrong, as soon as I saw the sample size
reported in their paper.
Next, SPSS reported the one-sample t-test results in the SPSS table entitled ‘One-Sample Test’.
It can be seen that the calculated t-value was reported at -2.058, which is very similar to the -
2.060 I calculated in the preceding section (difference due to rounding). With 9 degrees of
freedom, the p-value was reported at .070. Because the p-value was not less than .05, the null
hypothesis of equal sample and non-sample means cannot be rejected. The ‘One-Sample Test’
table also includes the 95% confidence intervals. Based on SPSS’s calculations, the difference
between the sample mean and the non-sample mean of 8.00 is somewhere between -1.4483
and .0683 with 95% confidence. Again, these results are the same to those I calculated by hand
above.
I next conducted the same one-sample t-test analysis, but based on a sample of 100 cases. I
obtained the following results.
C3.20
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
It can be seen that the mean and standard deviation were exactly the same as the N = 10
analyses, as expected. However, the standard error of the mean was lower at .106.
Furthermore, the calculated t-value was reported at -6.509. With 99 degrees of freedom, the
null hypothesis was rejected, p < .001. Finally, the 95% confidence intervals corresponded to -
.9003 and -.4797. All of the results are the same as what I obtained ‘by hand’ (above).
p-Values
There is a lot of misunderstanding in relation to the correct interpretation of a p-value
(see Oakes, 1986, for detailed discussion). In the context of a one-sample t-test, most
researchers would state that a p-value of .073 (obtained from the N = 10 example) suggests
that there is a 7.3% chance that there is not a difference between the sample mean and the
non-sample mean (7.31 versus 8.00) in the population. Stated alternatively, many researchers
may think that there is a 92.7% (100% – 7.3%) chance that there is an actual difference
between the sample mean (7.30) and the non-sample mean (8.00) in the population. However,
such an interpretation, as appealing as it may be, is not quite correct.
I believe much of the misunderstanding is due to the fact that p-values do not (quite)
represent what researchers want them to represent. A p-value should not be interpreted as
directly or exclusively relevant to a single event, such as a single study. Instead, in the context
of inferential statistics, probabilities are long run frequencies. Stated alternatively,
probabilities are relevant to a collection of events, not a single event (e.g., a single study).
Understandably, researchers would like to make a clear probabilistic statement about their
own study, but they cannot do so with a p-value, at least not justifiably.
In the context of a one-sample t-test, a p-value is a statement about the chances of
concluding erroneously that there is a difference between the sample mean and the non-
sample mean, when, in fact, no actual difference would be observed, if the same study were
re-conducted (with different random samples) a large number of times with the same sample
C3.21
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
size. Thus, the p-value of .073 reported in the one-sample t-test above (N = 10) implies that if
the study were re-conducted many times over again with N = 10, we would expect to see a t-
value of |2.03|, or an even greater t-value, 7.3% of time, even if there really were no
difference between the non-sample mean (8.00) and the mean in the population (which I
estimated at 7.31 based on N = 10). As stated previously, a p-value of less than .05 needs to be
observed in order to be confident enough to reject the null hypothesis. The value of .05
represents alpha (α), the maximum margin of error researchers tend to accept. Because the p-
value of .073 was not less than .05, I could not reject the null hypothesis of no difference
between 8.00 and the sample mean of 7.30, based on N = 10. It was only in the one-sample t-
test case based on N = 100, which yielded a p-value of much less than .05 (i.e., .00000000949),
that the null hypothesis could be rejected justifiably with sufficient confidence.
You may wonder how we know the above to be true, given that it seems completely
unrealistic that a researcher has ever re-conducted the same study, with the same sample size
(but different random sample) over, and over, and over again, to verify the accuracy of a p-
value estimated from a single sample. There are two reasons why we know the above to be
true: (1) statistical theory; and (2) Monte Carlo simulations with computers. If you are
interested, I provide an example of a Monte Carlo simulation in the Advanced Topics section of
this chapter.
In summary, researchers use 95% confidence as the minimum requirement to reject a
null hypothesis. Correspondingly, researchers view p < .05 as the demarcation criterion for
‘statistical significance’. Based on the N = 100 one-sample t-test, the null hypothesis of equality
between the sample mean and the non-sample mean was rejected, p < .05, in the preceding
section. Again, my long-held impression of 8.00 hours per night is probably wrong. Of course, a
replication study would be useful.
C3.22
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
value I generated and wanted to test against the sample mean. That’s why I referred to the
value of 8.00 as a non-sample mean, rather than a population mean.
Another useful example of a one-sample t-test is where a researcher wishes to
compare a sample mean associated with a rating scale against the mid-point of the rating
scale. For example, a researcher may administer a questionnaire which includes an item such
as: “I feel this country is heading in the right direction.” Suppose the rating scale was 1 =
strongly disagree, 2 = disagree, 3 = neither agree/nor disagree, 4 = agree, and 5 = strongly
agree. Suppose further that the sample mean came in at 2.88. A researcher may wish to
compare the sample mean of 2.88 against the neutral point, i.e., 3. If a statistically significant
difference between the sample mean of 2.88 and 3.00 were observed with a one-sample t-test
(p < .05), then one could conclude that, an average, people do not believe the country is
heading in the right direction. However, if the one-sample t-test were not statistically
significant (p > .05), then such a conclusion could not be made. I believe such a use of the one-
sample t-test may realistically arise, from time to time, whereas using the one-sample t-test to
make a comparison against a population mean will not. To help consolidate the information
provided in the previous sections, check out this video where I discuss the similarities
associated with three ways I conducted the same one-sample t-test.
Assumptions
All statistics have assumptions that need to be met, in order for statistics such as
confidence intervals and p-values to be perfectly accurate. I discuss assumptions in detail in
several chapters throughout this textbook. I prefer not to treat the topic in detail here. For the
sake of simplicity, I will only mention that the one-sample t-test (and the estimation of 95%
confidence, as described above), assumes the data are normally distributed. What ‘normally
distributed’ implies, in this context, is treated in chapter 6.
C3.23
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
a sample, there is an amount of error associated with the point-estimate, from the perspective
of repeated sampling. Skew is no exception. In contrast to the standard error of the mean, the
standard error of skew is based exclusively upon sample size:
6 * N * ( N 1)
SEskew (2)
( N 2) * ( N 1) * ( N 3)
In the annual earnings example (see chapter 2), the sample size was N = 5,225.
Therefore, the standard error of skew was estimated at .034.
6 * 5,225 * (5,225 1)
SEskew .034
(5,225 2) * (5,225 1) * (5,225 3)
Thus, if a sample of 5,225 US participants were to be re-sampled from the US population
many, many times over, the standard deviation of sample skew estimates would be estimated
at .034. Given that the skew estimate was 2.26, a SD of .034 is really small, which is a good
thing from a confidence perspective.
The 95% confidence intervals associated with a sample skew estimate can be
calculated using the same steps as that used for the sample mean. In particular, the standard
error needs to be multiplied by a value derived from the t-distribution (step 4).6 As per the
preceding section of this chapter, an Excel function can be used to obtain the t-distribution
value for a sample size of 5,225 (df = N – 1). Again, because 95% confidence is sought, α was
specified at .05:
=T.INV.2T(0.05, 5224)
The result associated with the above function was t = 1.960. In order to obtain the 95%
confidence intervals associated with the skew 2.26 point-estimate, the value of .067 (i.e.,
1.960*.034) was added and subtracted from the skew point-estimate of 2.26. Thus, based on
the annual earnings example, the lower-bound and upper-bound 95% confidence intervals
corresponded to:
95%CI Lower-Bound: 2.26 - .067 = 2.193
95%CI Upper-Bound: 2.26 + .067 = 2.327
Thus, it may be suggested with 95% confidence that the skew associated with annual earnings
in the population is somewhere between 2.193 and 2.327. That’s a very respectable narrow
range around the point-estimate of 2.20. Again, the sample size of N = 5,000 is very large, from
a statistical perspective.
As mentioned previously, when the 95% confidence intervals do not intersect with
zero, it implies that the point-estimate is statistically significantly different to zero (with 95%
confidence; or p < .05). In this example, the lower- and upper-bound intervals were both
positive (i.e., did not intersect with zero), therefore, it may be concluded that the skew point-
estimate of 2.20 was statistically significantly different from zero with 95% confidence. I
6
Again, we could use the well-known z-distribution, here, but it would not be exactly accurate.
C3.24
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
mention, here, briefly that another way to communicate ‘with 95% or greater confidence’ is ‘p
< .05’.
Additionally, instead of calculating 95% confidence intervals, one can divide the point-
estimate by the standard error. In this case, the ratio of 2.20 to .034 was equal to 64.71. You
can view the value of 64.71 as if it were a t-value that follows the t-distribution.
skew (3)
skew t
SEskew
Recall that t-values greater than approximately |1.96| would be expected to be
observed by chance less than 5% of the time, when the sample size is approximately 50 or
greater, and the null hypothesis is true. What I’m getting at here is that the positive skewness
associated with the annual earnings data is very unlikely to have occurred by chance. Instead,
the distribution of annual earnings in the population is very likely skewed positively (Watch
Video 3.16.: How to calculate standard error of skew).
As per the description of skew above, the standard error of kurtosis can be multiplied by 1.960
(via Excel function: T.INV.2T(0.05, 5224), in order to obtain the 95% confidence intervals (.068
* 1.960 = .133):
95%CI Lower-Bound: 7.78 - .133 = 7.647
95%CI Upper-Bound: 7.78 + .133 = 7.913
Thus, it may be suggested with 95% confidence that the kurtosis in the population is
somewhere between 7.647 and 7.913. Finally, calculated kurtosis t-value was obtained with
the following formula.
kurtosis (#)
kurtosist
SEkurtosis
Based on the annual earnings data, the kurtosis calculated t-value corresponded to t = 114.41
(i.e., 7.78 / .068) which is clearly greater than the value of |1.96|, within the context of the t-
distribution. Consequently, as was the case with skew, the distribution of annual earnings in
C3.25
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
the US is very likely kurtotic, p < .05 (Watch Video 3.17: How to calculate standard error of
kurtosis).
In the annual earnings example, the standard error of the mean corresponded to 420.51 (i.e.,
$30,396/√5225). Thus, the standard error of the median was estimated at 420.51 * 1.253 =
$526.90. Unlike the standard error of the mean, the accuracy of standard error of the median
is dependent upon the presence of normally distributed data. As the distribution becomes
progressively less normal, the standard error of the median formula will yield progressively
less accurate estimates. Consequently, I would not place much confidence in the estimated
standard error of the median associated with the annual earnings data, in this case.
The fact that the standard error of the median is always approximately 25% larger
than the standard error of the mean implies that estimates based on the median will be less
precise. I suspect that this one of the reasons that so many statistical analyses are based on
the mean, rather than the median.
Summary
This was a relatively tough chapter, as several important concepts were introduced. In
particular, standard error, the theoretical normal distribution, and confidence intervals.
Additionally, I demonstrated how to conduct an inferential statistical analysis: the one-sample
t-test. You will likely find that the jump from Chapter to 2 to Chapter 3 is one of the largest in
this textbook. Consequently, it is expected to take some time to understand thoroughly.
C3.26
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Advanced Topics
Effect Size
The Foundations section of this chapter focused upon the calculation of standard
errors, confidence intervals, and p-values associated with point-estimates. Such results are
very important in the context of inferential statistics. However, most inferential statistics are
complemented usefully with the inclusion of an effect size estimate. The concept and
application of effect size is so important that it arguably should have been introduced in the
Foundations section of this chapter. However, I discuss effect size so commonly throughout
this textbook that I considered it more prudent to restrict the focus upon the key inferential
concepts introduced in the Foundations section of this chapter.
An effect size represents the magnitude of an effect. Calculating and interpreting
effect sizes is useful, because a statistical result can be found to be statistically significant,
however, that does not necessarily imply that the result is important or large.
In the case of the one sample t-test, researchers almost always calculate the effect size
with a formula known as Cohen’s d (Watch Video 3.18: Cohen’s d Explained)7. In the context of
a one-sample t-test, Cohen’s d may be formulated as:
X
Cohen' s d (7)
SD
where X corresponds to the sample mean, μ corresponds to the non-sample mean, and SD
corresponds to the sample standard deviation. With respect to the one-sample t-test I
reported in the Foundations section of this chapter, the Cohen’s d was equal to:
7.31 8.00 .69
Cohen's d .65
1.06 1.06
Thus, in this example, the magnitude of the difference between the sample mean (7.31) and
my long-held impression of 8.00 hours per night expressed as a standardized effect size was -
91.
The value of Cohen’s d can be interpreted as a mean difference in standard deviation
units. Consequently, it may be said that the average amount of sleep people get per day is .65
of a standard deviation less than my long-held belief of 8.00 hours. It should be said that
whether Cohen’s d is negative or positive is arbitrary, as it depends on which value you specify
first or second in the numerator of formula (7). Had I subtracted 7.31 from 8.00 (8.00 – 7.31),
Cohen’s d would have been estimated at .65. In such a context, my long-held impression would
be considered .65 of a standard deviation larger than the sample mean.
Finally, Cohen (1988; 1992) provided the following guidelines for interpreting d values:
small = |.20|, medium = |.50| and large = |.80|. Thus, in this one-sample t-test example, the
7
Some people refer to this effect size indicator as Hedge’s g.
C3.27
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Cohen’s d value of -.65 is somewhere between a medium and large effect. So, my impression
of 8.00 hours a night was fairly off the mark, but not terribly so.
8
Keep in mind that each time I draw 10 cases from the population, I put them back into the
population, so that each random sample is drawn from a population of 100,000. Such a
sampling procedure is known as random sampling with replacement.
C3.28
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
the specification of alpha = .05 implicitly acknowledges that we will make mistakes, but not
more than in 5% of the samples (give or take). In the context of this simulation, a one-sample
t-test that yielded a p< .05 should be considered a type I error. A type I error occurs when a
decision to reject the null hypothesis has been made, but the null hypothesis is true. I discuss
type I errors in greater detail in chapter 7 (Watch Video 3.19: Monte Carlo One-Sample t-Test).
think about the one-sample t-test is to estimate the 95% confidence intervals associated with
the sample mean, rather than estimate the 95% confidence intervals associated with
difference between the sample mean and the non-sample mean (as performed in the
Foundations section of this chapter). When the lower-bound and upper-bound 95% confidence
intervals associated with the sample mean (e.g., 7.31) do not intersect with the non-sample
mean (e.g., 8.00), it necessarily implies that the conventional one-sample t-test will yield a p-
value that is less than .05.
To illustrate my point, I estimated the standard error of the mean for the sample mean
of 7.31, based on SD = 1.06 and N = 100. The standard error of the mean corresponded to .106
(i.e., 1.06 / √100). The 95% confidence interval multiplier was calculated at .210 (i.e., .106 *
1.984). Thus, the 95% confidence intervals associated with the sample mean of 7.31 (N = 100)
corresponded to:
95%CI Lower-Bound: 7.31 - .210 = 7.100
95%CI Upper-Bound: 7.31 + .210 = 7.520
As the 95% upper-bound confidence intervals did not intersect with the non-sample
mean (i.e., 8.00), it necessarily implies that there was a statistically significant difference
between the sample mean and the non-sample mean, p < .05 (via one-sample t-test). Had the
upper-bound 95% confidence been estimated at 7.99 (rather than 7.520), the corresponding
one-sample t-test p-value would have been .049. So, just barely under .05. By contrast, had the
upper-bound 95% confidence been estimated at 8.01 (rather than 7.53), the one-sample t-test
corresponding p-value would have been .051. So, just over .05.
In summary, there are three equivalent methods to test the difference between a
sample mean and a non-sample mean with specified alpha = .05: (1) calculate the 95%
confidence intervals associated with the numerical difference between the sample mean and
the non-sample mean; (2) calculate the t-value ratio: the difference between the sample mean
and the non-sample mean divided by the standard error of the mean; and (3) calculate the
95% confidence intervals associated with the sample mean (Watch Video 3.21: Three
approaches to the one-sample t-test).
C3.30
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
Practice Questions
2. The correspondence between standard error estimation from re-sampling and formula
In the Foundations section of this chapter, I reported the standard deviations
associated with 10 re-sampled means for sample sizes from N = 5 to N = 5,000 (see Table
C3.2). I specified only 10 re-samples for the sake of simplicity. Of course, the estimation of a
standard error from just 10 re-samples is not great. For practice, estimate 100 sample means
with N = 5 from the population data file. Compare the standard deviation associated with your
100 sample means against the standard deviation I obtained and reported in Table C3.2 (Data
File: sleep_hours_100_thousand) (Watch: 100 Resampled Means N = 5).
than the real photo (check out page 174 of the study). Penton-Voak et al. (2007) then had each
person specify which of the seven photos the most accurate likeness of their partner. In this
case, a score of zero represented a correct identification, whereas scores of +1, +2, and +3
represented progressively more attractive images of the partner, and scores of -1, -2, and -3
represented progressively less attractive images of the partner. If people are not biased in
their perception of their partners, the mean should not be statistically significant different
from 0. By contrast, a positive bias would imply a statistically significant positive mean. I
simulated some data to correspond closely to the results reported by Penton et al. (2007)
(Data File: bias_attraction). Answer the following questions (Watch: Attraction Bias):
Whether people, on average, develop positively biased perceptions of their partners over
time, or whether the bias is present when they first meet is anybody’s guess. What do you
think?
1. What was the mean and standard deviation associated with the student ratings of
their humor ability?
2. Was the mean statistically significantly different from the 50th percentile?
C3.33
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
CHAPTER 3: SAMPLING & STANDARD ERRORS
References
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E. J. (2014). Robust
misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157-
1164.
Kripke, D. F., Garfinkel, L., Wingard, D. L., Klauber, M. R., & Marler, M. R. (2002). Mortality
associated with sleep duration and insomnia. Archives of General Psychiatry, 59(2),
131-136.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognizing
one's own incompetence lead to inflated self-assessments. Journal of personality and
social psychology, 77(6), 1121-1134.
Oakes, M. (1986). Statistical inference: A commentary for the social and behavioural sciences.
New York: Wiley.
Penton-Voak, I. S., Rowe, A. C., & Williams, J. (2007). Through rose-tinted glasses: Relationship
satisfaction and representations of partners’ facial attractiveness. Journal of
Evolutionary Psychology, 5(1), 169-181.
Riza, S. D., Ganzach, Y., & Liu, Y. (2016). Time and job satisfaction a longitudinal study of
The differential roles of age and tenure. Journal of Management, 0149206315624962.
C3.34
Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.