Chapter 3 - Sampling - 2019

Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.
3
Sampling & Standard Errors
Contents
Standard Error of the Mean: It’s Nature via Resampling .............................................................. 2
Standard Error of the Mean: It’s Nature Via Formula ................................................................... 6
How Large of a Sample Size is Enough? ........................................................................................ 8
Confidence Intervals ..................................................................................................................... 9
Confidence Intervals: 68% ....................................................................................................... 10
Confidence Intervals: 95% ....................................................................................................... 12
One-Sample t-Test ...................................................................................................................... 13
One-Sample t-Test: 95% Confidence Intervals........................................................................ 15
One-Sample t-Test: p-value (N = 10) ....................................................................................... 17
One-Sample t-Test: p-value (N = 100) ..................................................................................... 19
One-Sample t-Test: SPSS ......................................................................................................... 20
p-Values ....................................................................................................................................... 21
Types of One-Sample t-Tests ...................................................................................................... 22
Assumptions ................................................................................................................................ 23
Standard Errors and More Statistics ........................................................................................... 23
Standard Error of Skew ........................................................................................................... 23
Standard Error of Kurtosis ....................................................................................................... 25
Standard Error of the Median ................................................................................................. 26
Summary ..................................................................................................................................... 26
Advanced Topics ......................................................................................................................... 27
Effect Size ................................................................................................................................ 27
Monte Carlo Simulation .......................................................................................................... 28
One-Sample t-Test: Bootstrapping.......................................................................................... 29
p-Values and Point-Estimate Confidence Intervals: Revisited ................................................ 29
Central Limit Theorem ............................................................................................................ 30
Practice Questions ...................................................................................................................... 31
References................................................................................................................................... 34
CHAPTER 3: SAMPLING & STANDARD ERRORS
In the first chapter of this textbook, I stated that it was not necessary to have access to
a population, in order to make inferences about the population. Instead, one could use a
random sample (or a convenience sample) of data from which an estimate could be calculate
to represent the population of interest. For example, in chapter 2, based on a sample of 11
observations (N = 11), I estimated the mean amount of sleep people have per day to be 7.36.
In the context of statistics, an estimate represents a value that equals the population
parameter, with a certain margin of error. The estimated quantity is associated with a certain
amount of error, in the sense that it is highly unlikely that the estimate will equal the
population parameter exactly. The amount of error associated with various statistical
estimates can be calculated with a broad classification of statistics known as standard error.
Standard errors are essential to the application of statistics, as they form the basis of testing
hypotheses statistically, when we only have access to samples. Virtually all statistics have a
corresponding standard error that has been discovered. Perhaps the mostly commonly
calculated standard error is the standard error of the mean.
Standard Error of the Mean: It’s Nature via Resampling

The key concept underlying the basis of standard error is repeated sampling. Recall
that, in practice, researchers almost always have data derived from only a sample, rather than
the population of interest. With respect to estimating population parameters, random
sampling is imperfect. For example, a value estimated from a sample is highly unlikely to be
exactly the same as the population parameter it was calculated to represent. Additionally, a
value estimated from one random sample is highly unlikely to be the same value calculated
from another random sample. Which sample value is accurate? It’s impossible to know.
However, mathematical statisticians have discovered the amount of error for various statistical
estimates (e.g., the mean) that can be expected from a long run of repeated random sampling
(with a specified sample size) from a population. In this context, the amount of error is the
standard error.
As a general statement, less error is better, when it comes to estimating. There are
two key characteristics of a sample estimate that can impact the amount of error associated
with that sample estimate: (1) the amount of variability in the sample scores (e.g., the
standard deviation), and (2) the sample size. Larger amounts of variability in the sample scores
lead to greater standard error. By contrast, larger sample sizes lead to less standard error.
Researchers have little control over the amount of variability in the sample scores they obtain.
However, they do have some control over sample size.
Given that sample size is often in the control of a researcher, what sample size should
be used? There is no definitive answer to this question. However, I have a feeling you will be
surprised to learn how good a sample with only 1,000 cases is at estimating a population
C3.2
parameter (i.e., with not much error), based on the following illustration. The illustration was
prepared for two principal reasons. First, to show you how good samples of only 1,000 cases
are a producing point-estimates remarkably close to the population parameter consistently.
Secondly, to illustrate the fundamental relationship between repeated sampling and standard
error.
To illustrate the above two points, I have created a data set with a population of
100,000 cases. I could have created a data set (population) that was larger (say, 1 billion);
however, doing so would not have made much difference to the point of this illustration.
Based on the data file with 100,000 cases, the population mean (a.k.a., μ or mu)1 was
calculated at 7.31. With a computer program, I randomly sampled six different samples (N)
from the population of 100,000 cases. Each sample had a different sample size (N): 5, 10, 50,
100, 1,000 and 5,000.
The sample means (M) estimated from the six samples are reported in Table C3.1. It
can be seen that none of the sample means were equal to the population mean of 7.31.
However, it will be noted that greater accuracy tended to be observed as the sample sizes
increased. In particular, the difference between the sample mean and the population mean
was relatively large for small sample sizes. For example, the sample size of N = 5 suggested
that the population mean was 6.60. That’s an “error” of -.71, given that the population mean
was 7.31. By contrast, there was very little difference between the sample means and the
population mean at sample sizes of 1,000 and 5,000. For example, with a sample size of 1,000,
the sample mean was estimated at 7.35, which was close to the population mean of 7.31.
Based on a sample size of 5,000, the sample mean of 7.29 was very close to the population
mean of 7.31. The results reported in Table C3.1 should correspond to your intuition: larger
sample sizes give more accurate results (estimates) (Watch Video 3.1: Understanding the
standard error of the mean via resampling).
Table C3.1. Random Sample Means With Various Sample Sizes Drawn From a Population
Sample Size (N)
5 10 50 100 1,000 5,000
M 6.60 8.30 7.46 7.18 7.35 7.29
Deviation -.71 .99 .15 -.13 .04 -.02
Note. N = sample size; M = sample mean; Deviation = difference from population mean (7.31)
To extend the illustration further, I randomly sampled each of the six sample sizes
above from the population of 100,000 cases a total of 10 separate times. I did so to illustrate
that larger sample sizes provide more stable estimates consistently, whereas smaller sample
1
The mean calculated from a population is commonly symbolized as μ or mu (pronounced
‘mew’). In this textbook, for the sake of simplicity, I tend to refer to a mean derived from a
sample as a ‘sample mean’ and the mean associated with a population as a ‘population mean’.
C3.3
sizes are much more variable in their estimates. To repeat, standard error is fundamentally
related to the notion of repeated sampling. Sample estimates from relatively large sample
sizes should not change much from sample to sample.
As can be seen in Table C3.2, the sample means derived from the sample sizes of N = 5
cases yielded, again, the least accurate estimates of the population mean. Specifically, the
values associated with the N = 5 sample mean estimates ranged from as low as 6.6 to as high
as 8.0. By contrast, the amount of variability (range) in the N = 5,000 sample mean estimates
was very narrow: 7.29 to 7.34.
To represent the amount of variability in the sample mean estimates numerically, I
calculated the standard deviation (SD) associated with the 10 sample means for each of the six
sample sizes. As can be seen in the bottom row of Table C3.2, the standard deviation (SD)
associated with the 10 mean estimates obtained from the smallest sample size (N = 5) was SD
= .51. By contrast, the means estimated from the largest sample size of 5,000 yielded a
miniscule SD of .02. Again, not only did the sample size of 5,000 yield accurate estimates of the
population mean, the estimates were consistently good. Importantly, the standard deviation
associated with the repeatedly sampled means is an estimate of the standard error of the
mean (Watch Video 3.2: The impact of sample size on the standard error of the mean). That’s
right, standard error is a standard deviation. Let that sink in!
To represent the results in Table C3.2 in graphical form, I created six histograms for
the 10 separate mean estimates across the six sample sizes. As can be seen in Figure C3.1, as
the sample size increased, the amount of variability in the estimated means decreased (Watch
Video 3.3: The standard error of the mean is a standard deviation? Yes). Stated alternatively,
as the sample size increased, the standard error decreased. In particular, it will be noted that
with a sample size of 5, the sample means were highly variable, in comparison to the sample
size of 5,000, which yielded an extremely narrow histogram. Specifically, the means obtained
from the random samples of 5,000 were very consistently around the population value of 7.31.
The results displayed in Table C3.2 and Figure C3.1 should go some way to convince you that
there is not typically much benefit to conducting a study with a sample size with more than
5,000 cases.2 Even a sample of 1,000 gives a high level of accuracy. I’ll note that it would not
matter how large the population is to which you want to infer your results. Also, convenience
samples would yield the same effect demonstrated in Table C3.2 and Figure C3.1. However,
with convenience samples, we simply would not know if the sample statistics would actually
hover around the population parameter associated with the population of interest.
The nature of standard error, as represented by SD in the last row displayed in Table
3.2, was demonstrated by actual resampling, which is a laborious process. Rather than
2
Exceptions include studies where one is interested in studying a very rare phenomenon (e.g.,
rare disease), or a large number of exploratory predictors are used to predict a phenomenon
with very weak explanatory capacity, for example.
C3.4
estimate several randomly sampled means across several samples to gauge the accuracy of a
mean estimate, it is possible to estimate the standard error of the mean efficiently with a
formula, which I describe next.
Table C3.2. Ten Random Sample Means With Various Sample Sizes Drawn From a Population
Sample Size (N)
5 10 50 100 1,000 5,000
M1 7.60 7.30 7.04 7.17 7.31 7.29
M2 7.00 7.30 7.34 7.27 7.27 7.34
M3 6.60 6.40 7.48 7.30 7.29 7.32
M4 7.00 7.00 7.18 7.38 7.31 7.32
M5 7.80 7.20 7.62 7.33 7.35 7.31
M6 6.60 7.60 7.30 7.33 7.30 7.32
M7 6.80 7.10 7.38 7.12 7.31 7.33
M8 7.40 7.40 7.36 7.27 7.32 7.30
M9 7.60 7.50 7.30 7.48 7.36 7.30
M10 8.00 7.30 7.32 7.06 7.34 7.32
GM 7.24 7.21 7.33 7.27 7.32 7.32
GM-Mu -.07 -.10 .02 -.04 .01 .01
SD .51 .33 .16 .13 .03 .02
Note. M = mean; GM = grand mean; GM-Mu = grand mean minus Mu; SD = standard deviation.
C3.5
Figure C3.1. Histograms of Sample Means Across Sample Sizes Drawn From a Population (Mu = 7.31)
Standard Error of the Mean: It’s Nature Via Formula

I mentioned near the beginning of the previous section that mathematical statisticians
have discovered the amount of error for various statistical estimates (e.g., the mean, skew,
kurtosis, etc.). Those mathematical statisticians were so clever that they discovered the
standard errors analytically (i.e., without having to do all of the repeated sampling I did in the
previous section). The most well-known standard error is the standard error of the mean.
C3.6
Recall that the standard error of the mean represents an estimate of the standard deviation of
repeatedly sampled means, drawn from a population, all with the same sample size. The
formula developed to estimate the standard error of the mean from a single sample is:
SD
SE X  (1)
N
where SD equals the standard deviation associated with the data points used to estimate the
mean, and N equals the total number of data points in the sample. Thus, with formula (1), the
standard error associated with a sample mean can be estimated very efficiently. All you need
is the standard deviation and the sample size: Brilliant!
To demonstrate the use of the standard error of the mean formula, I have re-reported
in Table C3.3 the means estimated from the population of 100,000 cases that I obtained from
the first set of sampling (i.e., Table C3.2, first row). I have also reported new information in
Table C3.3: the corresponding standard deviations (SD) for each of the samples. For example,
the standard deviation associated with the mean of 7.60 (N = 5) was 1.673. Similarly, the
standard deviation associated with the mean of 7.29 (N = 5,000) was 1.070. With the standard
deviations and the sample sizes, I then calculated the standard errors for the six sample means
(for thoroughness, I have reported the calculations in Table C3.4). It can be seen in Table C3.3
that the N = 5 standard error of the mean was estimated at a whopping .75. By contrast, the N
= 5,000 standard error of the mean was estimated at a miniscule .02. These results imply that
we can have much more confidence in the accuracy of the mean estimate derived from the
sample size of 5,000, in comparison to the sample size of only N = 5. Such a conclusion is the
same as that reached from the repeated sampling approach to the estimation of the standard
error of the mean I described in the previous section. As can be seen in Table C3.3, the
repeated sampling standard error of the mean (SER) estimates corresponded rather closely to
the formula-based standard error of the mean estimates (SEF) (Watch Video 3.4: Standard
error of the mean: Formula vs resampling). This is not a coincidence. They are two approaches
to the estimation of the same phenomenon of standard error. I hope you can now appreciate
the importance of repeated sampling in the context of the nature of standard error. Also, I
hope you can appreciate that standard error is the standard deviation of the repeatedly
samples estimate from a population. Finally, standard error is the fundamental basis of
inferential statistics, as I demonstrate later in this chapter with the introduction of the one-
sample t-test.
C3.7
Table C3.3. Descriptive Statistics Associated with the First Samples Drawn From the Population
Sample Size (N)
5 10 50 100 1000 5000
M1 7.60 7.30 7.04 7.17 7.31 7.29
SD1 1.673 1.059 1.049 .900 1.057 1.070
SEX . R .51 .33 .16 .13 .03 .02
SEX . F .75 .34 .15 .09 .03 .02
M = mean; SD = standard deviation; SEX .R = standard error of the mean from resampling; SEX . F
= standard error of the mean from formula.
Table C3.4. Calculated Standard Error of the Means for the First Six Samples
Sample Size (N)
5 10 50 100 1,000 5,000
1.673 1.059 1.049 .900 1.057 1.070
SE  SE  SE  SE  SE  SE 
5 10 50 100 1000 5000
SEX . F  .748 SEX . F  .335 SEX . F  .148 SEX . F  .090 SEX . F  .033 SEX . F  .015
How Large of a Sample Size is Enough?

To illustrate further the impact of sample size on parameter estimation precision, I
have calculated the standard error of the mean (via formula 1) across sample sizes varying
from 10 to 5,010 with sample size increments of 100. I used a standard deviation of 1.06,
because that was the population standard deviation (a.k.a., σ pronounced ‘sigma’) associated
with the hours of sleep data file of 100,000 participants. As can be seen in Figure C3.2 (Panel
A), the reduction in standard error from a sample size of 10 to 110 was massive. Thus, there
was a very substantial improvement in accuracy achieved by increasing the sample size from
10 to 110. However, the magnitude of improvement in estimation accuracy diminished very
quickly after about a sample size of approximately 410 to 510. Thus, with respect to estimating
a mean, there is very little benefit to increasing the sample size beyond 5,000, as the reduction
in standard error is so trivial. To help see the results more clearly at the smaller-end of sample
sizes, I have created a second figure which restricts the results to sample sizes between 10 to
1010 (Figure C3.2, Pane B). Again, it can appreciated that the overwhelming majority of the
benefits of increasing sample size was achieved within the first few sample size increments (N
= 10 to N = 410) (Watch Video 3.5: At what point does increasing sample size not matter?).
C3.8
Figure C3.2. Plots of Standard Error of the Means Across Sample Sizes
Panel A: N = 10 to 5,010
Panel B: N = 10 to 1,010
Confidence Intervals
The results reported in Figure C3.2 should give you a sense of the influence of sample
size on the accuracy of the estimation of a statistic such as the mean. As you will discover
throughout this textbook, standard error forms the basis of a great many statistical analyses.
Furthermore, a fuller appreciation of the meaningfulness and utility of a standard error is
arguably incomplete, unless it is complimented with the calculation of confidence intervals.
The mean estimates reported in Table C3.2, for example, are point-estimates. A point-
estimate is calculated from the available data and represents the ‘best guess’ value of a
population parameter. By contrast, confidence intervals are a range of values, a lower-bound
and an upper-bound, which give some indication of the confidence with which we can place in
the precision of the point-estimate.
To repeat, all of the means (M1 to M10) reported in Table C3.3 are point-estimates.
Theoretically, each mean’s corresponding standard error could be added to and subtracted
from the mean, in order to obtain 68% confidence intervals around the point-estimate (Watch
C3.9
Video 3.6: Understanding what confidence intervals are: 68% CIs). For example, the mean (M1)
point-estimate of 7.04 (N = 50) could have .15 added to and subtracted from itself to yield the
following values:
7.04 - .15 = 6.89
7.04 + .15 = 7.19
Theoretically, the values of 6.89 and 7.19 correspond to the sample mean’s 68%
confidence intervals. Thus, it may be suggested with 68% confidence that the population mean
is somewhere between 6.89 and 7.19. Where does 68% come from, you might ask? It comes
from the standard normal distribution (i.e., z-distribution). If you revisit Figure C2.6 in chapter
2, you will notice that the values between -1.0 and 1.0 within the standard normal distribution
represent 68.26% of the sample observations. The same phenomenon applies here, in theory.
Recall that the standard error is a standard deviation. It represents the standard deviation
associated with the repeatedly sampled estimates.
I wrote ‘in theory’, because it is known that data obtained from samples do not follow
the z-distribution perfectly. The z-distribution only works perfectly with population data.
Consequently, statisticians have developed a slightly different distribution known as the t-
distribution to better represent data obtained from a sample with a specified sample size.
Sample size is a key consideration, here, because it has been discovered that the smaller the
sample size, the larger the discrepancy between the z-distribution and the t-distribution. With
a sample size of about 100 or greater, the z-distribution and the t-distribution become very
similar. However, with sample sizes less than, say, 50, the difference between the z-
distribution and the t-distribution is appreciable, especially at the tail ends of the distribution
(Watch Video 3.7: What’s the difference between the z- and t-distribution?).
Confidence Intervals: 68%

In this section of the chapter, I will cover the procedure involved with calculating
accurate 68% confidence intervals for a mean by consulting the t-distribution. There are six
steps involved with the calculation of accurate confidence intervals for a mean:
1. Calculate the sample mean and sample standard deviation.

2. Calculate the standard error of the mean.
3. Calculate the degrees of freedom (df = N – 1).
4. Identify the t-value from the t-distribution with specified df.
5. Multiply the t-value and the standard error of the mean.
6. Add and subtract the product from step 5 to and from the sample mean.
As an example, I calculated the accurate 68% confidence intervals for the N = 5 sample
data, as reported in Table C3.3. The sample mean was M = 7.60 and the standard deviation
was SD = 1.673 (step 1). As reported in Table C3.4, the standard error of the mean (SEF) was
C3.10
estimated at .748 (step 2). Next, the degrees of freedom were calculated at 5 – 1 = 4 (step 3).
Next, the relevant t-value from the t-distribution was identified with the following Excel
function: (Watch Video 3.8: Referencing the t-distribution in Excel):
=T.INV.2T(0.32, 4)
The value of 0.32 that was inputted into the Excel function corresponds to 1.00 - .68
(i.e., 100% - 68%). The value of .32, or 32%, represents the proportion of the t-distribution that
will not be covered by the 68% confidence intervals. In statistics, this value (i.e., .32, in this
case) is known as alpha (α). The value of 4 in the Excel function corresponds to N – 1 (i.e., 5 – 1
= 4, in this example). The Excel function produced a t-value of 1.134 (step 4). Thus, 68.26% of
observations lay somewhere between -1.134 and 1.134 within the t-distribution with 4
degrees of freedom. I’ll note that a t-value of |1.134| is fairly close to a z-value of |1.0|, so,
the t-distribution and the z-distribution do have some resemblance to each other, even with df
as low as 4 (N = 5). Again, though, the t-distribution is more accurate with samples, especially
small samples, in comparison to the z-distribution.
Next, the identified t-value and the standard error were multiplied together which
yielded the following product: 1.134 * .747 = .847 (step 5). Finally, the product obtained from
step 5 was added to and subtracted from the sample mean of 7.60 (step 6):
68%CI Lower-Bound: 7.60 - .847 = 6.753
68%CI Upper-Bound: 7.60 + .847 = 8.447
Thus, the lower-bound and upper-bound 68% confidence intervals corresponded to 6.753 and
8.447, respectively (Watch Video 3.9: Calculating 68% confidence intervals).
One way to interpret the 6.753 and 8.447 confidence intervals is to suggest that the
chances that the mean in the population is somewhere between 6.75 and 8.45 is equal to 68%
(or 68.26% to be ultra-precise). Stated alternatively, there is a 68% chance that the mean in
the population is somewhere between the lower-bound and upper-bound 68% confidence
intervals, i.e., 6.75 and 8.45. I suspect most researchers and students accept intuitively such an
interpretation of confidence intervals. However, it would be fair to suggest that there is a
substantial amount of conflicting information in the literature on the “appropriate”
interpretation of confidence intervals (Hoekstra, Morey, Router, & Wagenmakers, 2014). At
the risk of aggravating some people, I suggest you avoid the mess in this area of theoretical
statistics, as the experts themselves cannot agree on the precise, technical interpretation of a
confidence interval. Instead, I recommend that you interpret confidence intervals as I did
above, as I believe it is fairly accurate representation of the nature of a confidence interval.
To help support the intuitive interpretation of a confidence interval, consider that, in
this example, we know that the per day population sleep mean was equal to 7.31 hours (based
on a population of 100,000 cases). That’s an indisputable fact. It is also a fact that the 68%
confidence intervals reported above for the N = 5 sample (6.753 and 8.447) captured the
population mean of 7.31 (i.e., 7.31 is somewhere in between 6.753 and 8.447). Finally, based
C3.11
on my own simulation, samples of N = 5 redrawn 1,000 times from the 100,000 population
(with μ = 7.31) yielded 68% confidence intervals that captured the population mean of 7.31
across 659 of the resamples, i.e., 66% of the time (659 / 1000 = .659). True, 66% is not equal to
68%, however, it is rather close. Also, it should be kept in mind that the standard error of the
mean formula (1) is itself an estimate with its own standard error. Consequently, we can speak
only in approximate terms, here. In summary, I believe normal theory confidence intervals do
a good job at representing the chances with which a population parameter resides within the
lower- and upper bound estimates. As I demonstrate later in this chapter, confidence intervals
can also be shown to be directly relevant to conventional hypothesis testing, which is the
cornerstone of inferential statistics.
Confidence Intervals: 95%

In the example above, I calculated 68% confidence intervals. I did so because 68%
confidence intervals represent essentially 1 standard deviation above and below the mean
(exactly true for the z-distribution). However, the vast majority of the time, researchers
calculate and report 95% confidence intervals, because 68% confidence is not considered a
sufficient level of confidence. The steps involved with calculating 95% confidence intervals for
a sample mean is essentially the same as that for calculating 68% confidence intervals. The
only difference is that a value of .05 needs to be specified within the Excel function (i.e., 1 -
.95), in order to obtain the relevant t-value for a specified sample size. Thus, in this case, alpha
(α) = .05
=T.INV.2T(0.05, 4)
The above function applied in Excel yielded a t-value of 2.776 (step 4). Thus, 95% of the
observations within a t-distribution with 4 degrees of freedom (N – 1) are somewhere
between -2.776 and +2.776. Next, the t-value of |2.776| was multiplied by the standard error
of the mean: 2.776 * .747 = 2.074 (step 5). Finally, the product obtained from step 5 was
added to and subtracted from the sample mean of 7.60 (step 6), in order to obtain the 95%
confidence intervals (Watch Video 3.10: Calculate 95% confidence intervals):
95%CI Lower-Bound: 7.60 – 2.074 = 5.526
95%CI Upper-Bound: 7.60 + 2.074 = 9.674
Thus, the 95% confidence intervals associated with the sample mean of 7.60 (N = 5)
corresponded to 5.526 and 9.674. Thus, it may be suggested with 95% confidence that the
population mean resides somewhere between 5.53 and 9.67 (rounded). Of course, in this
illustration, we know that the population mean was 7.31, which is in-between 5.53 and 9.67.
In practice, researchers do not know what the population mean is, because they do not have
access to the population. Instead, researchers simply have to rely upon the estimated
confidence intervals and accept that there is an approximate 5% chance (i.e., alpha) that the
population mean resides outside the confidence intervals, when 95% confidence intervals are
C3.12
estimated. When 68% confidence intervals are estimated, the researcher accepts that there is
32% chance (i.e., alpha) that the population mean resides outside the confidence intervals. To
repeat, the chances that the population mean is somewhere outside the confidence intervals
is the margin of error a researcher has specified as maximally acceptable. Again, the margin of
error is known as alpha (α) in statistics. In practice, researchers almost always specify α = .05.
Clearly, the 95% confidence intervals of 5.53 and 9.67 represent a wide range. It
suggests that, on average, people report sleeping somewhere between 5.53 and 9.67 hours
per night (in the population). For the purposes of illustration, I calculated the accurate 68%
and 95% confidence intervals for all six of the first samples drawn from the population (i.e., N
= 5 to N = 5,000). As can be seen in Table C3.5, the sample mean of 7.29, based on N = 5,000,
yielded very narrow 95% confidence intervals of 7.25 and 7.33. Again, the confidence intervals
captured the population mean of 7.31 (there was a 95% chance that they would). Thus,
confidence intervals derived from a collection of samples with N = 5,000 observations yield a
narrow range of values relevant to where the population value might be. Finally, I’ll note that
the reduction in the range of the confidence intervals from a sample size of 1,000 to 5,000
should be considered rather negligible (Watch Video 3.11: Compare 68% and 95% CIs).
Table C3.5.Sample Sizes and 68% and 95% Confidence Intervals

Sample Size (N)
5 10 50 100 1000 5000
M1 7.60 7.30 7.04 7.17 7.31 7.29
SE1 .748 .335 .148 .090 .033 .015
68% Confidence Intervals
LB 6.752 6.948 6.891 7.080 7.277 7.275
UB 8.448 7.652 7.189 7.260 7.343 7.305
95% Confidence Intervals
LB 5.524 6.542 6.743 6.991 7.245 7.261
UB 9.676 8.058 7.337 7.349 7.375 7.319
One-Sample t-Test
Prior to writing this textbook, I was under the impression that adults slept, on average,
8 hours a night. There were two reasons why I thought that, as an average, adults slept 8 hours
a night. First, everyone in my immediately family slept around 8 hours per night (at least). Also,
anyone else I lived with as an adult slept a solid 8 hours a night. Thus, at least based on my
experience, it was reasonable to hypothesize that adults slept, on average, 8 hours every 24
hours. Over the years, I told a lot of people that the average amount of sleep adults get is
about 8 hours a night.
C3.13
In preparation for this textbook, I came across Kripke et al.’s (2002) study (see chapter
2), which is when I began to have serious doubts about the accuracy associated with my long-
held impression. Based on a community sample of adults, Kripke et al. (2002) reported the
average amount of hours adults slept in a 24 hour period was 7.31 (SD = 1.06). Ultimately, my
impression of 8.00 was just that – an impression based on anecdotal evidence. As a scientist, I
should put my impressions to the test, statistically, when possible.
Of course, given what was learned about sampling error in this chapter, Kripke et al.’s
(2002) sample mean estimate of N = 7.31 is unlikely to be a perfectly accurate representation
of the mean in the population. In fact, Kripke et al.’s estimate of 7.31 might not be statistically
significantly different to my long-held impression of 8.00 hours. If that were the case, then
there would not be any convincing statistical evidence to abandon my impression of 8.00
hours. When a sample mean (say, 7.31) is tested statistically against a non-sample mean (say,
8.00), one is said to have conducted a ‘one-sample t-test’. In this section of the chapter, I will
demonstrate how to conduct a one-sample t-test. Along the way, you will learn if my long-held
impression of 8.00 hours is untenable statistically. I will not go down without a fight!
Hypotheses
Formally, most statistical analyses are associated with underlying hypotheses (see
chapter 1). In this example, I have specified two hypotheses: a null hypothesis and an
alternative hypothesis. With respect to the one-sample t-test, the null hypothesis specifies
that there is no difference between the sample mean and the non-sample mean (other than
sampling fluctuations). By contrast, the alternative hypothesis states that the sample mean
and the non-sample mean are unequal in the population. Thus, the generic one-sample t-test
null and alternative hypotheses are:
Null Hypothesis (H0): The sample mean and the non-sample mean are equal.
Alternative Hypothesis (H1): The sample mean and the non-sample mean are unequal.
Of course, it is unlikely that the sample mean and the non-sample mean will be exactly
equal for any particular sample. However, a sample mean may be observed to be numerically
different to the non-sample mean only because of sampling fluctuations (i.e., chance). A
statistical analysis that can test the hypotheses above is the one-sample t-test. The one-
sample t-test was introduced as a test of the difference between a sample mean and another
mean value that was not obtained from a sample: that’s why it is called a one-sample t-test:
there’s only one-sample. It is also called a one-sample t-test because it is reliant upon the t-
distribution, just like the confidence intervals estimated in the previous section were based on
the t-distribution, rather than the z-distribution. In the following sections, I describe how to
perform a one-sample t-test with two similar approaches. The two approaches give exactly the
C3.14
same concluding results. However, both approaches give interesting and useful information,
which makes it worthwhile to understand both approaches. I’ll admit that it will take some
effort to understand the next couple of sections. It will be rewarded effort, though, as an
understanding of the following procedures and principles will hold you in good stead to
understanding the remaining chapters of this textbook.
One-Sample t-Test: 95% Confidence Intervals

One approach to conducting a one-sample t-test is to calculate the numerical
difference between the sample mean and the non-sample mean. Then, calculate the 95%
confidence intervals associated with the numerical difference. If the 95% confidence intervals
associated with the numerical difference do not intersect with zero, then one can reject the
null hypothesis that the sample mean and the non-sample mean are equal to each other.
Conversely, if the 95% confidence intervals do not intersect with zero, the null hypothesis
cannot be rejected.
Steps to Conduct a One-Sample t-Test: 95% Confidence Intervals

I have listed below the steps involved with performing a one-sample t-test via the 95%
confidence intervals approach. The steps involved with calculating 95% confidence intervals
for the difference between a sample mean and a non-sample mean are essentially the same as
the steps involved with calculating 95% confidence intervals for a sample mean:
1. Calculate sample mean and sample standard deviation.

2. Calculate the difference between the sample mean and the non-sample mean.
3. Calculate standard error of the sample mean.
5. Identify the t-value from the t-distribution with specified alpha and df.
6. Multiply the t-value and the standard error of the mean.
7. Add and subtract the product from step 6 to and from the result from step 2.
8. If the 95% confidence intervals do not intersect with zero, reject the null hypothesis of
no difference between the sample and non-sample means.
For the purposes of this illustration, let’s pretend that Kripke et al.’s (2002) study was
based on a sample of N = 10 people. As described in chapter 2, the Kripke et al. (2002) hours of
sleep per day data were associated with a mean of 7.31 and a standard deviation of 1.06 (step
1). The difference between the sample mean and the non-sample mean was equal to -.69 (i.e.,
7.31 – 8.00; step 2).3 Next, the standard error of the sample mean was estimated at SEX =
3
Hypothetically, if there was absolutely no numerical difference between the sample mean
and the non-sample mean, the analysis would be terminated.
C3.15
.335 (1.06 / √10 = .34; step 3). The degrees of freedom corresponded to 9 (N – 1; step 4). I’m
halfway done the analysis, already!
Next, in order to calculate the 95% confidence intervals, the standard error of the
mean needs to be multiplied by a value (i.e., step 5). I could multiply the standard error of the
mean by z = 1.96, in order to get an approximate value. However, to be more accurate, I
should multiply the standard error of the mean by the t-value which corresponds to N-1 (step
5). I used the following function in Excel to identify the t-value:
=T.INV.2T(0.05, 9)
The Excel function above yielded a t-value of 2.262 (step 5). Therefore, the confidence interval
multiplier equaled: 2.262 * .335 = .758 (step 6). Finally, the lower-bound and upper-bound
95% confidence intervals corresponded to -1.448 and .068 (step 7):
95%CI Lower-Bound: -.69 - .758 = -1.448
95%CI Upper-Bound: -.69 + .758 = .068
In simple terms, the numerical difference of -.69 was associated with 95% confidence
intervals equal to -1.448 and .068, based on a sample of 10 cases. Thus, it may be suggested
with 95% confidence that the difference between the sample mean estimate of how many
hours people sleep per night on average and my long-held impression of 8.00 hours is
somewhere between -1.45 and .07 (rounded) in the population. Because the lower-bound and
upper-bound 95% confidence intervals did intersect with zero, it cannot be suggested with
95% confidence that the -.69 numerical difference is different from zero (step 8). Stated
alternatively, the null hypothesis of no difference between the sample mean and my long-held
impression of 8.00 hours cannot be rejected. At least, not based on this sample of N = 10.
(Watch Video 3.12: One Sample t-Test via 95%CI (Non-Significant Example)).
As I mentioned in my description of the results reported in in Figure C3.2, a study
conducted with a sample size of 10 is probably a waste of time. Kripke et al. (2002) used a
sample larger than 10, but I don’t want to mention what it was, yet. A sample size of 100 offers
much greater chances of finding a statistically significant effect, in comparison to N = 10, all
other things equal. Consequently, it would be more useful to test the null hypothesis of no
difference between a sample mean and the non-sample mean of 8.00 with an N = 100 sample.
For the second illustration, let’s pretend Krikpe et al. (2002) collected data from a sample of
100 community participants. Furthermore, let’s pretend that the sample mean and standard
deviation were the same as the N = 10 example (i.e., M = 7.31; SD = 1.06; step 1).
Correspondingly, the numerical difference between the sample mean and the non-sample
mean was the same at -.69 (i.e., 7.31 – 8.00; step 2).
By contrast, the standard error of the mean needed to be recalculated in this follow-
up example, because the sample size was different (i.e., N = 100). Thus, SEX = 1.06 / √100 =
1.06 / 10 = .106 (step 3). The degrees of freedom were df = 99 (100 - 1; step 4). Recall that, in
C3.16
order to calculate the 95% confidence associated with the numerical difference of -.69, the
standard error of the mean needs to be multiplied by a value from the t-distribution. Again,
the z-distribution value of 1.96 could be used, but greater accuracy will be obtained from using
the t-distribution. I used the Excel function to obtain the relevant t-value with N – 1 degrees of
freedom and alpha = 05.
=T.INV.2T(0.05, 99)
The t-value was equal to |1.984| (step 5). Thus, the standard error of the mean (i.e., .11) was
multiplied by |1.984|, which yielded .210 (step 6). Next, the lower-bound and upper-bound
95% confidence intervals corresponded to:
95%CI Lower-Bound: -.69 - .210 = -.900
95%CI Upper-Bound: -.69 + .210 = -.480
In simple terms, the numerical difference of -.69 was associated with 95% confidence intervals
equal to -.900 and -.480. Because the lower-bound and upper-bound 95% confidence intervals
did not intersect with zero, the null hypothesis of no difference between the sample mean and
non-sample mean of 8.00 can be rejected (p < .05) (Watch Video 3.13: One Sample t-Test via
95%CI (Significant Example)). Thus, there likely is a difference between the amount of hours
people sleep per night and my long-held impression of 8 hours, based on N = 100. My
impression was likely an overestimate all this time. How embarrassing. I misled all of those
people! The truth is that the Kripke et al.’s (2002) study was based on a sample of 1,116,936
participants! I knew my impression of 8.00 hours was wrong, as soon as I saw the sample size
reported in their paper.4
One-Sample t-Test: p-value (N = 10)

The approach to conducting a one-sample t-test based on the determination of
whether the 95% confidence intervals intersect with zero is an insightful way to do the
analysis. However, it has one limitation: if the null hypothesis is rejected, one can only say p <
.05. It would be better if one could provide a precise p-value associated with the analysis. In
order to obtain a precise p-value, the second approach to conducting a one-sample t-test can
be performed. The second approach involves the calculation of a t-value and a precise p-value.
I have listed the steps involved with performing a one-sample t-test via the calculated t-value
and precise p-value approach:
1. Calculate sample mean and sample standard deviation.

2. Calculate the t-value via the one-sample t-test formula
4
Technically, Kripke et al. (2002) only asked people to report how much sleep they get per
night. People may not have given accurate information. So, who knows, 8.00 hours may be a
better reflection of what people actually sleep. Like I said, I’m not going down without a fight.

C3.17

4. Identify where the calculated t-value resides within the t-distribution (i.e., p-value).
5. If the calculated t-value’s corresponding p-value is less than .05, reject the null
hypothesis of equality between the sample mean and the non-sample mean.
The formula for the one-sample t-test (i.e., step 2) consists of two parts: (1) the
difference between the sample mean and the non-sample mean; and (2) the standard error of
the sample mean. As can be seen in formula (6), the difference between the sample mean ( X )
and the non-sample mean is placed in the numerator. In this formula, the non-sample mean is
symbolized with μ (pronounced ‘mew’). The standard deviation and sample size are placed in
the denominator. Recall that the denominator portion of the formula corresponds exactly to
the standard error of the mean. The ratio is known as a calculated one-sample t-test t-value:
X 
One  Sample t  (6)
SD / N
As per the 95% confidence interval approach, I conducted the analysis twice: once
with N = 10 and once with N = 100. In this example, the non-sample mean (μ) was equal to my
long-held impression of 8.00 hours. Additionally, I used a sample mean of 7.31, based on a
sample size of N = 10 cases. Based on SD = 1.06 and N = 10, the standard error of the mean
was equal to .335. Thus, the calculated t-value corresponded to -2.060:
7.31  8.00  .69
t   2.060
1.06 / 10 .335
In rough terms, you should consider a t-value of approximately |1.96| or larger (either positive
or negative) as so large as to be ‘beyond chance’ (i.e., p < .05), or at least very nearly so, in
most cases (i.e., depending on sample size). Recall from the z-distribution that a value of 1.96
or -1.96 was considered relatively large, in the sense that it was a very appreciably distant
from the mean of 0. In fact, only approximately 5% of all z-values exceed 1.96 and -1.96.
To repeat, when one obtains a calculated t-value of approximately |1.96| or larger, it
is the starting point of the possibility of having obtained a statistically significant effect. One
then has to find out precisely the probability of having obtained the sample data (i.e.,
calculated t-value), or even more extreme sample data (i.e., larger calculated t-value), under
the expectation that the null hypothesis is true. Whether a t-value of |1.96| or larger is
‘statistically significant’ will, ultimately, depend upon sample size. A larger sample size will
suggest a greater chance of statistical significance, all other things equal. In this illustrative
example, the sample size was only 10 cases. As depicted in Figure C3.2, a sample size of 10
cases does not give us much statistical confidence. Consequently, a t-value of -2.060, based on
a sample size of 10 cases, could have arisen simply by chance. In order to determine precisely
the probability with which a t-value of -2.060 could have arisen simply by chance, the
percentile associated with a t-value of -2.060 and N = 10 can be determined by placing it
C3.18
within the context of the theoretical t-distribution. To do so, I used the following Excel
function:
=TDIST(t, N-1, 2)
Where t = |t-value| (only positives)5, N – 1 equals the sample size minus 1, and 2 = both sides
of the t-distribution. Thus, for this example (Watch Video 3.14: Obtaining a p-value from Excel
(for t-value)):
=TDIST(2.060, 9, 2)
When I applied the above Excel function, I obtained the following result: .069. The value of
.069 is a probability, or p-value. So, what does this p-value of .069 mean? In the next section, I
describe how to interpret p-values in detail. Briefly, I mention here that a p = .069 implies that
I cannot reject the null hypothesis of equality between the sample mean and the non-sample
mean. The reason I cannot reject the null hypothesis is because the p-value was not less than
.05. I’ll note that it was not a coincidence that the confidence intervals associated with the
95% confidence interval approach to the one-sample t-test intersected with zero (described in
the previous section; 95%CI: -1.448/.068) and the p-value from the one-sample t-test reported
here was greater than .05 (for N = 10). They are fundamentally the same analysis; they’re just
slightly different approaches to the same test. If you understand this, you’re doing really well,
as it is the fundamental basis of inferential statistics.
One-Sample t-Test: p-value (N = 100)

As noted previously, statistical analyses with sample sizes of 10 are not particularly
useful, because the confidence intervals associated with the results is so wide. Consequently, I
re-conducted the one-sample t-test with N = 100. Therefore, the calculated one-sample t-test
t-value corresponded to:
7.31  8.00  .69
t   6.509
1.06 / 100 .106
A t-value of -6.509 is much larger than |1.96|, so you should be thinking that the difference
between the sample mean of 7.31 and 8.00 is ‘beyond chance’ (i.e., p < .05). To calculate the
probability exactly, the following Excel function was used:
=TDIST(6.27, 99, 2)
The Excel function above yielded 3.13945E-09. This value corresponds to p = .0000000031394.
Because the magnitude of the p-value was less than .05, we can reject the null hypothesis of
equality between the sample mean (7.31) and my long-standing hunch of 8.00 hours sleep per
night. Such a result is consistent with the 95% confidence intervals reported earlier: they did
not intersect with zero when based on N = 100. As mentioned in the previous section on
5
As the implications associated with a negative and positive t-value are the same, the Excel
function only works with positive t-values.
C3.19
confidence intervals, Kripke et al.’s (2001) study was based on more than 1 million people.
Again, I knew my impression of 8.00 hours was wrong, as soon as I saw the sample size
reported in their paper.
One-Sample t-Test: SPSS

Learning how to conduct a one-sample t-test as described above offers the
opportunity to truly understand the nature of the analysis, not to mention the nature of
inferential statistics more broadly. However, most researchers use software to conduct
statistical analyses, because it reduces the chances of computational errors. Additionally, it is
much more time efficient.
To conduct a one-sample t-test in SPSS, the ‘One-Sample T-Test’ menu option can be
used (Watch Video 3.15: One-Sample t-test in SPSS). When I conducted the one-sample t-test
in SPSS with N = 10 data, I obtained the following results. First, SPSS reported the descriptive
statistics and the standard error of the mean in a table entitled ‘One-Sample Statistics’. It can
be seen that the sample mean and standard deviation were equal to 7.31 and 1.06.
Furthermore, the standard error of the mean was estimated at .3352. Such values are the
same as those reported in the ‘by hand’ calculations in the preceding section.
Next, SPSS reported the one-sample t-test results in the SPSS table entitled ‘One-Sample Test’.
It can be seen that the calculated t-value was reported at -2.058, which is very similar to the -
2.060 I calculated in the preceding section (difference due to rounding). With 9 degrees of
freedom, the p-value was reported at .070. Because the p-value was not less than .05, the null
hypothesis of equal sample and non-sample means cannot be rejected. The ‘One-Sample Test’
table also includes the 95% confidence intervals. Based on SPSS’s calculations, the difference
between the sample mean and the non-sample mean of 8.00 is somewhere between -1.4483
and .0683 with 95% confidence. Again, these results are the same to those I calculated by hand
above.
I next conducted the same one-sample t-test analysis, but based on a sample of 100 cases. I
obtained the following results.
C3.20
It can be seen that the mean and standard deviation were exactly the same as the N = 10
analyses, as expected. However, the standard error of the mean was lower at .106.
Furthermore, the calculated t-value was reported at -6.509. With 99 degrees of freedom, the
null hypothesis was rejected, p < .001. Finally, the 95% confidence intervals corresponded to -
.9003 and -.4797. All of the results are the same as what I obtained ‘by hand’ (above).
p-Values
There is a lot of misunderstanding in relation to the correct interpretation of a p-value
(see Oakes, 1986, for detailed discussion). In the context of a one-sample t-test, most
researchers would state that a p-value of .073 (obtained from the N = 10 example) suggests
that there is a 7.3% chance that there is not a difference between the sample mean and the
non-sample mean (7.31 versus 8.00) in the population. Stated alternatively, many researchers
may think that there is a 92.7% (100% – 7.3%) chance that there is an actual difference
between the sample mean (7.30) and the non-sample mean (8.00) in the population. However,
such an interpretation, as appealing as it may be, is not quite correct.
I believe much of the misunderstanding is due to the fact that p-values do not (quite)
represent what researchers want them to represent. A p-value should not be interpreted as
directly or exclusively relevant to a single event, such as a single study. Instead, in the context
of inferential statistics, probabilities are long run frequencies. Stated alternatively,
probabilities are relevant to a collection of events, not a single event (e.g., a single study).
Understandably, researchers would like to make a clear probabilistic statement about their
own study, but they cannot do so with a p-value, at least not justifiably.
In the context of a one-sample t-test, a p-value is a statement about the chances of
concluding erroneously that there is a difference between the sample mean and the non-
sample mean, when, in fact, no actual difference would be observed, if the same study were
re-conducted (with different random samples) a large number of times with the same sample
C3.21
size. Thus, the p-value of .073 reported in the one-sample t-test above (N = 10) implies that if
the study were re-conducted many times over again with N = 10, we would expect to see a t-
value of |2.03|, or an even greater t-value, 7.3% of time, even if there really were no
difference between the non-sample mean (8.00) and the mean in the population (which I
estimated at 7.31 based on N = 10). As stated previously, a p-value of less than .05 needs to be
observed in order to be confident enough to reject the null hypothesis. The value of .05
represents alpha (α), the maximum margin of error researchers tend to accept. Because the p-
value of .073 was not less than .05, I could not reject the null hypothesis of no difference
between 8.00 and the sample mean of 7.30, based on N = 10. It was only in the one-sample t-
test case based on N = 100, which yielded a p-value of much less than .05 (i.e., .00000000949),
that the null hypothesis could be rejected justifiably with sufficient confidence.
You may wonder how we know the above to be true, given that it seems completely
unrealistic that a researcher has ever re-conducted the same study, with the same sample size
(but different random sample) over, and over, and over again, to verify the accuracy of a p-
value estimated from a single sample. There are two reasons why we know the above to be
true: (1) statistical theory; and (2) Monte Carlo simulations with computers. If you are
interested, I provide an example of a Monte Carlo simulation in the Advanced Topics section of
this chapter.
In summary, researchers use 95% confidence as the minimum requirement to reject a
null hypothesis. Correspondingly, researchers view p < .05 as the demarcation criterion for
‘statistical significance’. Based on the N = 100 one-sample t-test, the null hypothesis of equality
between the sample mean and the non-sample mean was rejected, p < .05, in the preceding
section. Again, my long-held impression of 8.00 hours per night is probably wrong. Of course, a
replication study would be useful.
Types of One-Sample t-Tests

One-sample t-tests are not commonly observed in research papers, because
researchers are usually interested in comparing one-sample mean against another sample-
mean (see chapter 6). In my experience, statistics textbooks typically describe the one-sample
t-test as a test of the difference between a sample mean and a population mean (μ or mu).
Technically, there is nothing wrong with such a description of the one-sample t-test. However,
realistically, the vast majority of researchers will never come across a population mean in their
career. More realistically, researchers will have a sample of data, and they may want to
compare it against another value that is not based on a sample, per se. In the sleep example I
described above, the value of 8.00 hours was the non-sample mean value that I compared
against the sample mean of 7.31. The value of 8.00 was not a population mean. It was simply a
C3.22
value I generated and wanted to test against the sample mean. That’s why I referred to the
value of 8.00 as a non-sample mean, rather than a population mean.
Another useful example of a one-sample t-test is where a researcher wishes to
compare a sample mean associated with a rating scale against the mid-point of the rating
scale. For example, a researcher may administer a questionnaire which includes an item such
as: “I feel this country is heading in the right direction.” Suppose the rating scale was 1 =
strongly disagree, 2 = disagree, 3 = neither agree/nor disagree, 4 = agree, and 5 = strongly
agree. Suppose further that the sample mean came in at 2.88. A researcher may wish to
compare the sample mean of 2.88 against the neutral point, i.e., 3. If a statistically significant
difference between the sample mean of 2.88 and 3.00 were observed with a one-sample t-test
(p < .05), then one could conclude that, an average, people do not believe the country is
heading in the right direction. However, if the one-sample t-test were not statistically
significant (p > .05), then such a conclusion could not be made. I believe such a use of the one-
sample t-test may realistically arise, from time to time, whereas using the one-sample t-test to
make a comparison against a population mean will not. To help consolidate the information
provided in the previous sections, check out this video where I discuss the similarities
associated with three ways I conducted the same one-sample t-test.
Assumptions
All statistics have assumptions that need to be met, in order for statistics such as
confidence intervals and p-values to be perfectly accurate. I discuss assumptions in detail in
several chapters throughout this textbook. I prefer not to treat the topic in detail here. For the
sake of simplicity, I will only mention that the one-sample t-test (and the estimation of 95%
confidence, as described above), assumes the data are normally distributed. What ‘normally
distributed’ implies, in this context, is treated in chapter 6.
Standard Errors and More Statistics

As mentioned earlier in this chapter, virtually all sample statistics have an established
standard error. Because these standard errors are known, researchers can answer questions
statistically. Thus far, the standard error of the mean has been examined in detail. In the next
section of this chapter, I describe three more standard errors associated with three commonly
estimated descriptive statistics: skew, kurtosis, and median.
Standard Error of Skew

In chapter 2, data relevant to US annual earnings was presented in Figure C2.5. The
distribution of data was skewed positively, as the right-side tail was longer than the left-side
tail. In numerical terms, skew was estimated at 2.26. As with any other statistic estimated from
C3.23
a sample, there is an amount of error associated with the point-estimate, from the perspective
of repeated sampling. Skew is no exception. In contrast to the standard error of the mean, the
standard error of skew is based exclusively upon sample size:
6 * N * ( N  1)
SEskew (2)
( N  2) * ( N  1) * ( N  3)
In the annual earnings example (see chapter 2), the sample size was N = 5,225.
Therefore, the standard error of skew was estimated at .034.
6 * 5,225 * (5,225  1)
SEskew  .034
(5,225  2) * (5,225  1) * (5,225  3)
Thus, if a sample of 5,225 US participants were to be re-sampled from the US population
many, many times over, the standard deviation of sample skew estimates would be estimated
at .034. Given that the skew estimate was 2.26, a SD of .034 is really small, which is a good
thing from a confidence perspective.
The 95% confidence intervals associated with a sample skew estimate can be
calculated using the same steps as that used for the sample mean. In particular, the standard
error needs to be multiplied by a value derived from the t-distribution (step 4).6 As per the
preceding section of this chapter, an Excel function can be used to obtain the t-distribution
value for a sample size of 5,225 (df = N – 1). Again, because 95% confidence is sought, α was
specified at .05:
=T.INV.2T(0.05, 5224)
The result associated with the above function was t = 1.960. In order to obtain the 95%
confidence intervals associated with the skew 2.26 point-estimate, the value of .067 (i.e.,
1.960*.034) was added and subtracted from the skew point-estimate of 2.26. Thus, based on
the annual earnings example, the lower-bound and upper-bound 95% confidence intervals
corresponded to:
95%CI Lower-Bound: 2.26 - .067 = 2.193
95%CI Upper-Bound: 2.26 + .067 = 2.327
Thus, it may be suggested with 95% confidence that the skew associated with annual earnings
in the population is somewhere between 2.193 and 2.327. That’s a very respectable narrow
range around the point-estimate of 2.20. Again, the sample size of N = 5,000 is very large, from
a statistical perspective.
As mentioned previously, when the 95% confidence intervals do not intersect with
zero, it implies that the point-estimate is statistically significantly different to zero (with 95%
confidence; or p < .05). In this example, the lower- and upper-bound intervals were both
positive (i.e., did not intersect with zero), therefore, it may be concluded that the skew point-
estimate of 2.20 was statistically significantly different from zero with 95% confidence. I
6
Again, we could use the well-known z-distribution, here, but it would not be exactly accurate.
C3.24
mention, here, briefly that another way to communicate ‘with 95% or greater confidence’ is ‘p
< .05’.
Additionally, instead of calculating 95% confidence intervals, one can divide the point-
estimate by the standard error. In this case, the ratio of 2.20 to .034 was equal to 64.71. You
can view the value of 64.71 as if it were a t-value that follows the t-distribution.
skew (3)
skew t 
SEskew
Recall that t-values greater than approximately |1.96| would be expected to be
observed by chance less than 5% of the time, when the sample size is approximately 50 or
greater, and the null hypothesis is true. What I’m getting at here is that the positive skewness
associated with the annual earnings data is very unlikely to have occurred by chance. Instead,
the distribution of annual earnings in the population is very likely skewed positively (Watch
Video 3.16.: How to calculate standard error of skew).
Standard Error of Kurtosis

The standard error of kurtosis is directly related to the standard error of skew. Again,
only the sample size is needed to estimate the standard error of kurtosis:
N 2 1
SEkurtosis  2 * SEskew (4)
( N  3)(N  5)
Recall that in the annual earnings example, the standard error of skew was estimated at .034.
Additionally, the sample size was 5,225. Therefore, the standard error of kurtosis was
estimated at:
5,2252  1
SEkurtosis  2 * SEskew  .068
(5,225  3)(5,225  5)
As per the description of skew above, the standard error of kurtosis can be multiplied by 1.960
(via Excel function: T.INV.2T(0.05, 5224), in order to obtain the 95% confidence intervals (.068
* 1.960 = .133):
95%CI Lower-Bound: 7.78 - .133 = 7.647
95%CI Upper-Bound: 7.78 + .133 = 7.913
Thus, it may be suggested with 95% confidence that the kurtosis in the population is
somewhere between 7.647 and 7.913. Finally, calculated kurtosis t-value was obtained with
the following formula.
kurtosis (#)
kurtosist 
SEkurtosis
Based on the annual earnings data, the kurtosis calculated t-value corresponded to t = 114.41
(i.e., 7.78 / .068) which is clearly greater than the value of |1.96|, within the context of the t-
distribution. Consequently, as was the case with skew, the distribution of annual earnings in
C3.25
the US is very likely kurtotic, p < .05 (Watch Video 3.17: How to calculate standard error of
kurtosis).
Standard Error of the Median

The median is similar to the mean in that it is a measure of central tendency. In the
annual earnings example, the median was estimated at $32,000, which was distinctly different
to the mean (i.e. $38,238). Of course, the distribution of annual earnings was substantially
skewed positively, which helps explain why the median and mean were so different. The
standard error of the median is closely related to the standard error of the mean.
SEm edian  SEMean *1.253 (5)
In the annual earnings example, the standard error of the mean corresponded to 420.51 (i.e.,
$30,396/√5225). Thus, the standard error of the median was estimated at 420.51 * 1.253 =
$526.90. Unlike the standard error of the mean, the accuracy of standard error of the median
is dependent upon the presence of normally distributed data. As the distribution becomes
progressively less normal, the standard error of the median formula will yield progressively
less accurate estimates. Consequently, I would not place much confidence in the estimated
standard error of the median associated with the annual earnings data, in this case.
The fact that the standard error of the median is always approximately 25% larger
than the standard error of the mean implies that estimates based on the median will be less
precise. I suspect that this one of the reasons that so many statistical analyses are based on
the mean, rather than the median.
Summary
This was a relatively tough chapter, as several important concepts were introduced. In
particular, standard error, the theoretical normal distribution, and confidence intervals.
Additionally, I demonstrated how to conduct an inferential statistical analysis: the one-sample
t-test. You will likely find that the jump from Chapter to 2 to Chapter 3 is one of the largest in
this textbook. Consequently, it is expected to take some time to understand thoroughly.
C3.26
Advanced Topics
Effect Size
The Foundations section of this chapter focused upon the calculation of standard
errors, confidence intervals, and p-values associated with point-estimates. Such results are
very important in the context of inferential statistics. However, most inferential statistics are
complemented usefully with the inclusion of an effect size estimate. The concept and
application of effect size is so important that it arguably should have been introduced in the
Foundations section of this chapter. However, I discuss effect size so commonly throughout
this textbook that I considered it more prudent to restrict the focus upon the key inferential
concepts introduced in the Foundations section of this chapter.
An effect size represents the magnitude of an effect. Calculating and interpreting
effect sizes is useful, because a statistical result can be found to be statistically significant,
however, that does not necessarily imply that the result is important or large.
In the case of the one sample t-test, researchers almost always calculate the effect size
with a formula known as Cohen’s d (Watch Video 3.18: Cohen’s d Explained)7. In the context of
a one-sample t-test, Cohen’s d may be formulated as:
X 
Cohen' s d  (7)
SD
where X corresponds to the sample mean, μ corresponds to the non-sample mean, and SD
corresponds to the sample standard deviation. With respect to the one-sample t-test I
reported in the Foundations section of this chapter, the Cohen’s d was equal to:
7.31  8.00  .69
Cohen's d    .65
1.06 1.06
Thus, in this example, the magnitude of the difference between the sample mean (7.31) and
my long-held impression of 8.00 hours per night expressed as a standardized effect size was -
91.
The value of Cohen’s d can be interpreted as a mean difference in standard deviation
units. Consequently, it may be said that the average amount of sleep people get per day is .65
of a standard deviation less than my long-held belief of 8.00 hours. It should be said that
whether Cohen’s d is negative or positive is arbitrary, as it depends on which value you specify
first or second in the numerator of formula (7). Had I subtracted 7.31 from 8.00 (8.00 – 7.31),
Cohen’s d would have been estimated at .65. In such a context, my long-held impression would
be considered .65 of a standard deviation larger than the sample mean.
Finally, Cohen (1988; 1992) provided the following guidelines for interpreting d values:
small = |.20|, medium = |.50| and large = |.80|. Thus, in this one-sample t-test example, the
7
Some people refer to this effect size indicator as Hedge’s g.
C3.27
Cohen’s d value of -.65 is somewhere between a medium and large effect. So, my impression
of 8.00 hours a night was fairly off the mark, but not terribly so.
Monte Carlo Simulation

A lot of what we know about the strengths and limitations of various inferential
statistics has been uncovered through Monte Carlo simulations. In simple terms, a Monte
Carlo simulation consists of repeated random sampling with a specified sample size from a
population of data. As the resamples are collected, a particular statistical analysis of interest
(e.g., one-sample t-test) can be conducted upon each of the random samples of data. Then, if
the p-value associated with each statistical test is saved to a different file, an average p-value
can be calculated based on all of the re-sampled results. Other statistics can be calculated on
the p-values, as well, such as the percentiles associated with the p-values. With these steps in
mind, I describe an example Monte Carlo simulation. As you will see, you’ll get a real sense of
the nature of a p-value.
Suppose I generated a population data file with 100,000 cases, with a population mean
of 7.31, and a population standard deviation of 1.06. Thus, the data file represents the
population of interest: the population mean was 7.31 – there is no doubting this. However,
what do you think would happen if I obtained 1,000 random samples (with replacement8),
each with 10 cases (N = 10), from the population of 100,000 and conducted a one-sample t-
test with each of those 1,000 sample (against 7.31)? Just to be clear, with each sample (N =
10), I will test the sample mean (whatever that happens to be for that sample) against the non-
sample (population) mean of 7.31. Thus, what do you think the average p-value would be
across the 1,000 one-sample t-tests? Also, how many p-values do you think will be < .05? To
answer this question, I conducted a Monte Carlo simulation (one slow Friday evening).
Specifically, I conducted 1,000 one-sample t-tests against a non-sample mean of 7.31. I then
averaged the p-values from the 1,000 one-sample t-tests.
The results? The mean p-value was .493, and the median p-value was .498. Thus,
approximately 50% of the p-values were less than .50 and approximately 50% were larger than
.50 (see Figure C3.3 for a histogram of the 1,000 p-values). Importantly, 5.5% of the p-values
were less than .05. Thus, 55 of the one-sample t-tests suggested that there was a ‘statistically
significant’ difference between the sample mean and the non-sample mean of 7.31. Those, 55
one-sample t-tests have to be ‘wrong’, because I know the population mean was 7.31.
The reason approximately 5% of the 1,000 one-sample t-tests suggested that null
hypothesis should be rejected is because conducting statistics with samples can lead to errors,
when we specify a level of probability that demarcates a significant effect. Stated alternatively,
8
Keep in mind that each time I draw 10 cases from the population, I put them back into the
population, so that each random sample is drawn from a population of 100,000. Such a
sampling procedure is known as random sampling with replacement.
C3.28
the specification of alpha = .05 implicitly acknowledges that we will make mistakes, but not
more than in 5% of the samples (give or take). In the context of this simulation, a one-sample
t-test that yielded a p< .05 should be considered a type I error. A type I error occurs when a
decision to reject the null hypothesis has been made, but the null hypothesis is true. I discuss
type I errors in greater detail in chapter 7 (Watch Video 3.19: Monte Carlo One-Sample t-Test).
Figure C3.3. Histogram of p-values from Monte Carlo Simulation
One-Sample t-Test: Bootstrapping

It was mentioned in the Foundation section of this chapter that the one-sample t-test
is relatively robust to violations of normality. However, a researcher may come across data
where the data are associated with skew and/or kurtosis greater than |2.0| and/or |9.0|. In
such cases, asymptotic normal distribution theory estimation cannot be counted upon to
yield accurate confidence intervals or p-values (see chapter 6). Within this chapter, the 95%
confidence intervals and the one-sample t-test methods were based on asymptotic normal
distribution theory estimation. Fortunately, the data were approximately normally distributed.
An alternative estimation technique is known as bootstrapping. Bootstrapping does not
assume any level of normality. I describe bootstrapping in much more detail in chapter 6.
However, for the sake of competing this chapter, I felt obliged to mention that the one-sample
t-test can also be conducted with bootstrapping as the estimation method (Watch Video 3.20:
One-Sample t-test via Bootstrapping).
p-Values and Point-Estimate Confidence Intervals: Revisited

As described in the Foundations section of this chapter, there is a very close
relationship between one-sample t-test p-values and 95% confidence intervals. Another way to
C3.29
think about the one-sample t-test is to estimate the 95% confidence intervals associated with
the sample mean, rather than estimate the 95% confidence intervals associated with
difference between the sample mean and the non-sample mean (as performed in the
Foundations section of this chapter). When the lower-bound and upper-bound 95% confidence
intervals associated with the sample mean (e.g., 7.31) do not intersect with the non-sample
mean (e.g., 8.00), it necessarily implies that the conventional one-sample t-test will yield a p-
value that is less than .05.
To illustrate my point, I estimated the standard error of the mean for the sample mean
of 7.31, based on SD = 1.06 and N = 100. The standard error of the mean corresponded to .106
(i.e., 1.06 / √100). The 95% confidence interval multiplier was calculated at .210 (i.e., .106 *
1.984). Thus, the 95% confidence intervals associated with the sample mean of 7.31 (N = 100)
corresponded to:
95%CI Lower-Bound: 7.31 - .210 = 7.100
95%CI Upper-Bound: 7.31 + .210 = 7.520
As the 95% upper-bound confidence intervals did not intersect with the non-sample
mean (i.e., 8.00), it necessarily implies that there was a statistically significant difference
between the sample mean and the non-sample mean, p < .05 (via one-sample t-test). Had the
upper-bound 95% confidence been estimated at 7.99 (rather than 7.520), the corresponding
one-sample t-test p-value would have been .049. So, just barely under .05. By contrast, had the
upper-bound 95% confidence been estimated at 8.01 (rather than 7.53), the one-sample t-test
corresponding p-value would have been .051. So, just over .05.
In summary, there are three equivalent methods to test the difference between a
sample mean and a non-sample mean with specified alpha = .05: (1) calculate the 95%
confidence intervals associated with the numerical difference between the sample mean and
the non-sample mean; (2) calculate the t-value ratio: the difference between the sample mean
and the non-sample mean divided by the standard error of the mean; and (3) calculate the
95% confidence intervals associated with the sample mean (Watch Video 3.21: Three
approaches to the one-sample t-test).
Central Limit Theorem

I note here, only briefly, that one of the theoretical principles that underlies our
understanding of standard error is the central limit theorem. Essentially, the central limit
theorem states that the distribution of mean values calculated from samples derived from
resampling of a population will become progressively normally distributed, as the sample sizes
increase. This is true, whether the population is normally distributed or not. Although the
central limit theorem essentially governed most of what I presented in this chapter, it is
arguably of greater theoretical interest, rather than useful from a practical perspective.
Consequently, I would not state anything more about it in this textbook.
C3.30
Practice Questions
1. Are people satisfied with their jobs?

Riza, Ganzach and Liu (2016) investigated the relationship between age, length of time
at an organization, and job satisfaction. For this practice question, I was particularly interested
in the data relevant to job satisfaction. I simulated data to correspond to the results reported
by Riza et al. (2016) very closely (Data File: job_satisfaction). However, for the sake of
simplicity, I simulated the data to be associated with 500 cases (Riza et al., 2016, had a massive
sample of 19,609 participants). Job satisfaction was measured on a 5-point scale: 1 = dislike it
very much; 2 = dislike it somewhat; 3 = think it is OK; 4 = like it fairly well; and 5 = like it very
much. Answer the following questions in relation to the job satisfaction data (Watch: Job
Satisfaction SPSS):
1. What was the standard error of the mean?

2. What was the standard error of the median?
3. What was the standard error of the skew?
4. What was the standard error of the kurtosis?
5. Did the skew 95% confidence intervals intersect with zero?
6. Did the kurtosis 95% confidence intervals intersect with zero?
7. What were the 95% confidence intervals for the mean?
8. Was the job satisfaction mean statistically significantly different from the mid-point of
3?
2. The correspondence between standard error estimation from re-sampling and formula
In the Foundations section of this chapter, I reported the standard deviations
associated with 10 re-sampled means for sample sizes from N = 5 to N = 5,000 (see Table
C3.2). I specified only 10 re-samples for the sake of simplicity. Of course, the estimation of a
standard error from just 10 re-samples is not great. For practice, estimate 100 sample means
with N = 5 from the population data file. Compare the standard deviation associated with your
100 sample means against the standard deviation I obtained and reported in Table C3.2 (Data
File: sleep_hours_100_thousand) (Watch: 100 Resampled Means N = 5).
3. Do people overestimate the attractiveness of their partners?

Penton-Voak et al (2007) hypothesized that people in a relationship see each other more
attractively than they really. To test the hypothesis, Penton-Voak et al. (2007) took
photographs of the faces of 14 couples. Then, they created 6 manipulated images of the
photographs: three of the manipulations were progressively and slightly more attractive than
the real photo and three of the manipulations were progressively and slight less attractive
C3.31
than the real photo (check out page 174 of the study). Penton-Voak et al. (2007) then had each
person specify which of the seven photos the most accurate likeness of their partner. In this
case, a score of zero represented a correct identification, whereas scores of +1, +2, and +3
represented progressively more attractive images of the partner, and scores of -1, -2, and -3
represented progressively less attractive images of the partner. If people are not biased in
their perception of their partners, the mean should not be statistically significant different
from 0. By contrast, a positive bias would imply a statistically significant positive mean. I
simulated some data to correspond closely to the results reported by Penton et al. (2007)
(Data File: bias_attraction). Answer the following questions (Watch: Attraction Bias):
1. What was the standard error of the mean?

2. What was the standard error of the median?
3. What was the standard error of the skew?
4. What was the standard error of the kurtosis?
5. Did the skew 95% confidence intervals intersect with zero?
6. Did the kurtosis 95% confidence intervals intersect with zero?
7. What were the 95% confidence intervals for the mean?
8. Was the face identification accuracy mean statistically significantly different from
zero?
Whether people, on average, develop positively biased perceptions of their partners over
time, or whether the bias is present when they first meet is anybody’s guess. What do you
think?
4. Do people overestimate their humor ability?

It should be no surprise that people, on average, overestimate their abilities. However,
as I described in chapter 1, statistics is not just about determining whether a null hypothesis
can be rejected or not: one should always be concerned with the magnitude of the effect. So,
do people, on average, greatly overestimate their abilities or do they overestimate them by a
small amount? Kruger and Dunning (1999) had 65 university students rate the quality of 30
jokes on 1 to 11 point scale. They also had professional comedians rate the quality of the same
jokes on the same scale. With these data, Kruger and Dunning (1999) calculated a ‘humor
ability’ percentile score for each student on the basis of the degree of correspondence
between the university students’ ratings of the jokes and the professional comedians’ ratings
of the jokes. Finally, the university students rater their own humor ability, in comparison to
other university students on a percentile scale: from 0 (I'm at the very bottom) to 50 (I'm
exactly average) to 99 (I'm at the very top). I simulated some data to correspond very closely
to the results reported by Kruger and Dunning (Data File: bias_humor) (Watch: Humor Bias):
C3.32
1. What was the mean and standard deviation associated with the student ratings of
their humor ability?
2. Was the mean statistically significantly different from the 50th percentile?
C3.33
References
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E. J. (2014). Robust
misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21(5), 1157-
1164.
Kripke, D. F., Garfinkel, L., Wingard, D. L., Klauber, M. R., & Marler, M. R. (2002). Mortality
associated with sleep duration and insomnia. Archives of General Psychiatry, 59(2),
131-136.
Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: how difficulties in recognizing
one's own incompetence lead to inflated self-assessments. Journal of personality and
social psychology, 77(6), 1121-1134.
Oakes, M. (1986). Statistical inference: A commentary for the social and behavioural sciences.
New York: Wiley.
Penton-Voak, I. S., Rowe, A. C., & Williams, J. (2007). Through rose-tinted glasses: Relationship
satisfaction and representations of partners’ facial attractiveness. Journal of
Evolutionary Psychology, 5(1), 169-181.
Riza, S. D., Ganzach, Y., & Liu, Y. (2016). Time and job satisfaction a longitudinal study of
The differential roles of age and tenure. Journal of Management, 0149206315624962.
C3.34

Chapter 3 - Sampling - 2019

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 - Sampling - 2019

Uploaded by

Copyright:

Available Formats

Gignac, G. E. (2019). How2statsbook (Online Edition 1). Perth, Australia: Author.

Standard Error of the Mean: It’s Nature via Resampling

Standard Error of the Mean: It’s Nature Via Formula

= standard error of the mean from formula.

How Large of a Sample Size is Enough?

Confidence Intervals: 68%

1. Calculate the sample mean and sample standard deviation.

Confidence Intervals: 95%

Table C3.5.Sample Sizes and 68% and 95% Confidence Intervals

One-Sample t-Test: 95% Confidence Intervals

Steps to Conduct a One-Sample t-Test: 95% Confidence Intervals

1. Calculate sample mean and sample standard deviation.

One-Sample t-Test: p-value (N = 10)

1. Calculate sample mean and sample standard deviation.

3. Calculate the degrees of freedom (df = N – 1).

One-Sample t-Test: p-value (N = 100)

One-Sample t-Test: SPSS

Types of One-Sample t-Tests

Standard Errors and More Statistics

Standard Error of Skew

Standard Error of Kurtosis

Standard Error of the Median

Monte Carlo Simulation

Figure C3.3. Histogram of p-values from Monte Carlo Simulation

One-Sample t-Test: Bootstrapping

p-Values and Point-Estimate Confidence Intervals: Revisited

Central Limit Theorem

1. Are people satisfied with their jobs?

1. What was the standard error of the mean?

3. Do people overestimate the attractiveness of their partners?

1. What was the standard error of the mean?

4. Do people overestimate their humor ability?

You might also like