Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

The Minitab Blog

Search: Search for a blog post:

What Are Degrees of Freedom in Statistics?


Minitab Blog Editor . 08 April, 2016
Tweet Share Like 22 Share

About a year ago, a reader asked if I could try to explain degrees of freedom in
statistics. Since then, I’ve been circling around that request very cautiously, like
it’s some kind of wild beast that I’m not sure I can safely wrestle to the ground.

Degrees of freedom aren’t easy to explain. They come up in many different


contexts in statistics—some advanced and complicated. In mathematics, they're
technically defined as the dimension of the domain of a random vector.

But we won't get into that. Because degrees of freedom are generally not
something you need to understand to perform a statistical analysis—unless you’re
a research statistician, or someone studying statistical theory.

And yet, enquiring minds want to know. So for the adventurous and the curious,
here are some examples that provide a basic gist of their meaning in statistics.

The Freedom to Vary


First, forget about statistics. Imagine you’re a fun-loving person who loves to wear hats. You couldn't care less what a degree
of freedom is. You believe that variety is the spice of life.

Unfortunately, you have constraints. You have only 7 hats. Yet you want to wear a different hat every day of the week.

On the first day, you can wear any of the 7 hats. On the second day, you can choose from the 6 remaining hats, on day 3 you
can choose from 5 hats, and so on.

When day 6 rolls around, you still have a choice between 2 hats that you haven’t worn yet that week. But after you choose
your hat for day 6, you have no choice for the hat that you wear on Day 7. You must wear the one remaining hat. You had 7-1
= 6 days of “hat” freedom—in which the hat you wore could vary!

That’s kind of the idea behind degrees of freedom in statistics. Degrees of freedom are often broadly defined as the number
of "observations" (pieces of information) in the data that are free to vary when estimating statistical parameters.

Degrees of Freedom: 1-Sample t test


Now imagine you're not into hats. You're into data analysis.
Al utilizar este sitio, usted acepta el uso de cookies para efectos de análisis y contenido
You have a data set with 10 values. If you’re not estimating anything, each value can take on any number,Aceptar
right? Each value
personalizado. Leer nuestra política
is completely free to vary. /
But suppose you want to test the population mean with a sample of 10 values, using a 1-sample t test. You now have a
The Minitab
constraint—the estimation of the mean. What is that constraint, BlogBy definition of the mean, the following relationship
exactly?
must hold: The sum of all values in the data must equal n x mean, where n is the number of values in the data set.
Search:
So if a data set has 10 values, the sum of the 10 values must equal the mean x 10. If the mean of the 10Search
valuesfor a blog
is 3.5 post:
(you
could pick any number), this constraint requires that the sum of the 10 values must equal 10 x 3.5 = 35.

With that constraint, the first value in the data set is free to vary. Whatever value it is, it’s still possible for the sum of all 10
numbers to have a value of 35. The second value is also free to vary, because whatever value you choose, it still allows for
the possibility that the sum of all the values is 35.

In fact, the first 9 values could be anything, including these two examples:

34, -8.3, -37, -92, -1, 0, 1, -22, 99


0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9

But to have all 10 values sum to 35, and have a mean of 3.5, the 10th value cannot vary. It must be a specific number:

34, -8.3, -37, -92, -1, 0, 1, -22, 99 -----> 10TH value must be 61.3
0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 ----> 10TH value must be 30.5

Therefore, you have 10 - 1 = 9 degrees of freedom. It doesn’t matter what sample size you use, or what mean value you use
—the last value in the sample is not free to vary. You end up with n - 1 degrees of freedom, where n is the sample size.

Another way to say this is that the number of degrees of freedom equals the number of "observations" minus the number of
required relations among the observations (e.g., the number of parameter estimates). For a 1-sample t-test, one degree of
freedom is spent estimating the mean, and the remaining n - 1 degrees of freedom estimate variability.

The degrees for freedom then define the specific t-distribution that’s used to calculate the p-values and t-values for the t-test.

Notice that for small sample sizes (n), which correspond with smaller degrees of freedom (n - 1 for the 1-sample t test), the t-
distribution has fatter tails. This is because the t distribution was specially designed to provide more conservative test results
Al utilizar este sitio, usted acepta el uso de cookies para efectos de análisis y contenido
when analyzing small samples Aceptarof degrees of
(such as in the brewing industry). As the sample size (n) increases, the number
personalizado. Leer nuestra política
freedom increases, and the t-distribution approaches a normal distribution. /
Degrees of Freedom: Chi-Square Test of Independence
The Minitab Blog
Let's look at another context. A chi-square test of independence is used to determine whether two categorical variables are
dependent. For this test, the degreesSearch:
of freedom are the number of cells in the two-way table of the categorical
Search variables
for a blog that
post:
can vary, given the constraints of the row and column marginal totals.So each "observation" in this case is a frequency in a
cell.

Consider the simplest example: a 2 x 2 table, with two categories and two levels for each category:

Category A Total

? 6
Category B
15

Total 10 11 21

It doesn't matter what values you use for the row and column marginal totals. Once those values are set, there's only one cell
value that can vary (here, shown with the question mark—but it could be any one of the four cells). Once you enter a number
for one cell, the numbers for all the other cells are predetermined by the row and column totals. They're not free to vary. So
the chi-square test for independence has only 1 degree of freedom for a 2 x 2 table.

Similarly, a 3 x 2 table has 2 degrees of freedom, because only two of the cells can vary for a given set of marginal totals.

Category A Total

? ? 15
Category B
15

Total 10 11 9 30

If you experimented with different sized tables, eventually you’d find a general pattern. For a table with r rows and c columns,
the number of cells that can vary is (r-1)(c-1). And that’s the formula for the degrees for freedom for the chi-square test of
independence!

The degrees of freedom then define the chi-square distribution used to evaluate independence for the test.

Al utilizar este sitio, usted acepta el uso de cookies para efectos de análisis y contenido
Aceptar
personalizado. Leer nuestra política
/
The Minitab Blog

Search: Search for a blog post:

The chi-square distribution is positively skewed. As the degrees of freedom increases, it approaches the normal curve.

Degrees of Freedom: Regression


Degrees of freedom is more involved in the context of regression. Rather than risk losing the one remaining reader still
reading this post (hi, Mom!), I'll cut to the chase.

Recall that degrees of freedom generally equals the number of observations (or pieces of information) minus the number of
parameters estimated. When you perform regression, a parameter is estimated for every term in the model, and and each
one consumes a degree of freedom. Therefore, including excessive terms in a multiple regression model reduces the
degrees of freedom available to estimate the parameters' variability. In fact, if the amount of data isn't sufficient for the
number of terms in your model, there may not even be enough degrees of freedom (DF) for the error term and no p-value or
F-values can be calculated at all. You'll get output something like this:

If this happens, you either need to collect more data (to increase the degrees of freedom) or drop terms from your model (to
reduce the number of degrees of freedom required). So degrees of freedom does have real, tangible effects on your data
analysis, despite existing in the netherworld of the domain of a random vector.

Follow-up
Al utilizar este sitio, usted acepta el uso de cookies para efectos de análisis y contenido
Aceptar
personalizado. Leer nuestra política
/
This post provides a basic, informal introduction to degrees of freedom in statistics. If you want to further your conceptual
The Minitab
understanding of degrees of freedom, check out this classic paper inBlog
the Journal of Educational Psychology by Dr. Helen
Walker, an associate professor of education at Columbia who was the first female president of the American Statistical
Association. Another good general reference
Search: is by Pandy, S., and Bright, C. L., Social Work Research Vol 32, number 2, June
Search for a blog post:
2008, available here.

You Might Also Like:


To Insights and Beyond: The 5 Key Challenges & Learnings You Need to Know from 2019 Minitab Insights Conference
5 Quick Minitab Tricks Even Some of the Experts Don't Know
Want to Learn How to Create Good Wine? Let's Pour Over the Details
Fewer X-Ray Errors Reduce Cancer Risk, Wait Time and Costs

Posted In
Statistics
Data Analysis
Fun Statistics
Quality Improvement
Statistics Help
see all

Subscribe
First name

Last name

Work / Academic Email*

Country/Territory/Region
Al utilizar este sitio, usted acepta el uso de cookies para efectos de análisis y contenido
Please Select Aceptar
personalizado. Leer nuestra política
/
Which best describes you?
The Minitab Blog
Please Select

Search: events, webinars, newsletters, products or services. You can


We will contact you about Minitab resources, Search for a blog
unsubscribe post:
and
manage your communication preferences at any time, as outlined in our privacy policy.

Subscribe

Who We Are
Minitab is the leading provider of software and services for quality improvement and statistics education. More than 90% of
Fortune 100 companies use Minitab Statistical Software, our flagship product, and more students worldwide have used
Minitab to learn statistics than any other package.

Minitab LLC. is a privately owned company headquartered in State College, Pennsylvania, with subsidiaries in Chicago, San
Diego, United Kingdom, France, Germany, Australia and Hong Kong. Our global network of representatives serves more than
40 countries around the world.

Visit Us at Minitab.com
Legal | Privacy Policy | Terms of Use | Trademarks

© 2019 Minitab, LLC. All rights reserved.

Authors
Agnes Ogee and Mark Ellis
Bruno Scibilia
Bruno Scibilia and Mark Ellis
Cheryl Pammer
Cody Steele

Al utilizar este sitio, usted acepta el uso de cookies para efectos de análisis y contenido
Aceptar
personalizado. Leer nuestra política
/

You might also like