The t (Student) Distribution

Applied Statistics and Probability


Ira M. Anjasmara

Program Studi Magister Geomatika

Teknik Geomatika - ITS
Small Samples

Recall the central limit theorem:

the sampling distribution of the mean approaches a normal probability
distribution when the sample size is greater than or equal to 30.
Often, our sample size will be less than 30. This means:
the central limit theorem no longer applies;
we can’t use normal probability tables;
we can’t approximate σ by s, because for large n, σ ≈ s, but not so
for small n.
For small samples we have to consider other methods of estimation and
hypothesis testing.

The t Distribution
The t distribution was developed in early 1900’s by William Gosset who
worked for Guinness Breweries. He published under the pseudonym

The t distribution is concerned with small samples (n < 30) drawn from a
population that has a normal distribution.

Importantly, we don’t need to know the value of σ

Unlike the normal distribution, there is a unique t distribution for each

sample size. Each t distribution:
has a mean of zero;
is symmetrical about the mean;
has a variance is greater than 1 (cf. the standard normal distribution,
where σ 2 = 1);
is less peaked at the mean, and has thicker tails than the normal
The t Distribution

As the sample size increases, the t distribution approximates to the normal

distribution, until at n = 30, the two are almost visually indistinguishable.

As with all probability density functions, the total area beneath the t-curve
is 1.

The probability density function for the t distribution is:

Γ( v+1 x2

2 )
f (x, v) = 1 1+
Γ( v2 )(πv) 2 v

where Γ is the “gamma function”

Degrees of Freedom

When using the t distribution, we need to know the degrees of freedom

(ν) of the sample.

Technically, degrees of freedom are the difference between the number of

observations (sample size) and the number of independent parameters
estimated from that sample.
From a sample of size n we estimate the standard deviation, s, and
the mean, x̄.
So, the number of independent parameters estimated = 1 (because
the standard deviation is dependent on the mean)
ν =n−1

Degrees of Freedom
As mentioned, the number of degrees of freedom dictates the shape of the
t-curve. Here are some curves of different ν:

Probability Estimation

The procedure for estimating probabilities from the tdistribution is similar

to normal distribution estimation, but we use a different table.

The t distribution table shows the t-score corresponding to a particular

area (from a choice of 5) in the upper tail:
these scores are shown for different degrees of freedom;
they are usually denoted tν,α , showing its dependence on ν and α.

Note: as mentioned, as the degree of freedom increases, the t distribution

approximates to a normal distribution:
i.e., for ν = ∞, the t-score = z-score.

The tTable
The tables for the t distribution look something like this:

The numbers in the first column give the degrees of freedom; the numbers
in the first row represent the area of the upper tail.

The numbers in the main body of the table give the t-score corresponding
to those particular values of ν and α., i.e., tν,α .
The t Table

The highlighted value in the table gives the t-score for 14 degrees of
freedom, and an area in the upper tail of 0.05 (5%), i.e. t14,0.05 = 1.761

Hypothesis Testing

The small sample procedure for hypothesis testing about a population

mean follows the same 8-step procedure as for a normal distribution,
we use tν,α in place of zα as the critical value;
we use the following test statistic:
x̄ − µ
sx̄ = √

A manufacturer of calculator batteries claims his batteries last an average

life of 500 hours. From a test of 25 batteries, the mean life was 518 hours
with a standard deviation of 40 hours. At the 0.05 level of significance, is
the average battery life at least 500 hours?

Take 500 hrs as the population mean.

We therefore want to test whether the sample mean from the new data
(518) indicates that this value is too low.
We have: µ = 500, x̄= 518, s = 40, n = 25, α = 0.05.

Step 1

Formulate alternative hypothesis: Ha : µ > 500

i.e., test whether the true population mean is actually more than the
established value.
Formulate null hypothesis: H0 : µ ≤ 500
i.e., assume the given population mean is correct, and the sample
data are misleading.

Step 2

Determine number of tails.

This is a 1-tailed test, because the null hypothesis has an inequality.

Step 3

Determine level of significance and degrees of freedom:

We are told that the significant level is α = 0.05.
From n=25, we get v=25-1=24

Step 4

Determine the critical value of t:

We have a 1-tailed test, so we need to find tν,α = t24,0.05
From the t distribution table, we have:
t24,0.05 =1.711

Step 5

Determine the rejection region:

The null hypothesis will be rejected if µ > 500, so we have the following

Since we are testing µ > 500, we are in the RHS of the t curve, therefore
the rejection region is t > 1.711.

Step 6
Determine the test statistic (t-score) from the sample data:
x̄ − µ 518 − 500
t= = 40 √ = 2.25 (1)
σx̄ / 25

Step 7
Compare the test statistic against its critical value: 2.25 > 1.711, therefore
t, and hence x̄, the sample mean, do lie in the rejection region.
Hence, we reject H0 at the 0.05 significance level.

Step 8
Our sample measurement is incompatible with the supposed population
mean at 95% confidence level. Therefore it follows that the battery life is
at least 500 hours at this level.

Confidence Intervals

As with the normal distribution, we can estimate a confidence interval for

the population mean from a small sample, without having any information
about the population.

For a significance level α:

CI = x̄ ± tν,α/2 sx̄

Now, because we are using small samples, we can no longer approximate

the population variance by the sample variance, and must use sx̄ = s/ n in
place of σx̄ .

tν,α/2 is the critical t value, providing an area of α/2 in the upper tail of a t
distribution with ν = n − 1 degrees of freedom.

An angle is measured 20 times with a mean of 30◦ 00 12.500 and a standard

deviation of 4.1”. Develop a 95% confidence interval estimate for the true
value of the mean (µ).

At 95% confidence, α = 0.05. Since n = 20, we have ν = 19. From the

tables, tν,α/2 = t19,.0025 = 2.093. Therefore,

CI = 30◦ 00 12.500 ± 2.093 × 4.1

= 30◦ 00 12.500 ± 1.9200

So we are 95% confident that the true angle is within these limits.

P -Values

In contrast to the normal distribution, determining P -values for the

t-distribution is hard. This is because of the contrasting arrangement of
the normal and t tables.

Remember that in order to determine a P -value with the normal

distribution, you need to first find the z-score from the data, and then use
the table to find a corresponding probability (upper- or lower-tail area).
That is, the normal tables are arranged so that the probability is given as a
function of z-score.

In contrast, the t tables are arranged with t-score given as a function of

probability (i.e., upper-tail area). Importantly, only a few values of
probability (upper-tail area) are given. So once you have worked out your
t-score from the data, it is almost impossible to work back through the t
table to find the corresponding upper- or lower-tail area.

P -Values

In order to work out P -values from the t distribution, you need to use a
computer program. Fortunately, a search of the Web will provide many
websites to do this, e.g.: tables/statistics tables.html

Alternatively, Microsoft Excel has the function TDIST to work out

P -values for the t distribution, where:

p(t > t0 ) = TDIST(t0 , ν, nt )

for some numerical value t0 , and nt tails (1 or 2).

A certain quantity has an accepted value of 645. A new experiment of 20

observations finds that the sample mean is 655 with a sample standard
deviation of 20. What is the P -value for these data?

x̄ − µ 655 − 645
t= = 20 √ = 2.236
sx̄ / 20
Using Excel (or the website shown above), we find:

P = p(x̄ ≥ 655) = p(t ≥ 2.236) = TDIST(2.236, 19, 1) = 0.0188

Summary of Means Testing

Although the crucial question you have to ask is “is my sample size less
than 30?”, it can be seen that sometimes this requirement is not enough
for use of the t distribution.

This is because the t distribution requires that the population that the
small sample was drawn from must be normally-distributed:
if it is, then you can use the t distribution
if it isn’t, you can’t and must increase the sample size to ≥30, and
then use the normal distribution.

