Hypothesis Testing (T-Test and ANOVA)

1/27/21
A D V A N C E D D ATA A N A LY S I S
ANNOUNCEMENTS
IN PSYCHOLOGY
The Psychological Health Society is organizing a Research
LECTURE #3: Outreach Workshop on Thursday, January 28th. The workshop
HYPOTHESI S TESTI NG: T- TESTS & ANOVA will be highlighting critical email writing skills to help students
successfully achieve research opportunities. All attendees will
also receive a free booklet with a directory of all psychology
professors, emails and lab websites.
Link: https://www.eventbrite.ca/e/research-outreach-workshop-
tickets-136754571605
ANNOUNCEMENTS ANNOUNCEMENTS
Unrelated to the course:
Are you interested in paid

research?
The aim of this research is to understand human cognitive
processing and decision making for my undergraduate
thesis study at University of Toronto
Time: ~1.5 hours

Compensation: Base pay of $10/hr + extra possible
payment of up to $20 based on decision making
Eligibility: between 18 - 35 years of age
Contact: aa.thesis.study@outlook.com
3 4
1
1/27/21
COURSE ANNOUNCEMENTS NULL HYPOTHESIS SIGNIFICANCE TESTING (NHST)

th
• Written Midterm Exam will take place on March 6 from 1 – 4 p.m.
Oral Exams will take place on March 2nd to 5th. Sign ups for the oral Null hypothesis (H0) testing provides researchers with a
exams will be open on February 8 - 22. framework to evaluate their research. Formulating H0 is your first
step, regardless of the the type of research question, the type of
• Midterm exam procedures document will be posted to Quercus data collected, or the statistical test used.
soon. Exam procedures quiz will be due on Feb 24th.
Is there sufficient evidence to support (indirectly*) the
• Assignment 1 will be posted today. researcher’s actual hypothesis (H1)?
• Submission #1 is due on February 11th
• TAs will grade and provide detailed feedback by Feb 21st *”Indirectly” because because the support is based on testing
• Submission #2 is due on March 8th and rejecting H0, not H1, technically.
• To help keep you organized, I have added calendars for this class.
Can be found on the front page of our course.
5 6
HYPOTHESIS TESTING: P < .05 HYPOTHESIS TESTING: P < .05

H0: There is no difference (μTreatment = μ). The default “cut-off” is p = 0.05.
This is the alpha criterion, or the level of significance.
P(results | H0)? Alpha Criterion
If the probability of a difference of the observed magnitude is
less than 0.05 (5%), we reject the null hypothesis.
• There is evidence supporting the alternative hypothesis.
There is a • Note: This is not the end because one experiment never
high probability “proves” anything. The alternative hypothesis is never true.
of finding means
that fall in this
area If the probability of a difference of the observed magnitude is
greater than 0.05 (5%), we fail-to-reject the null hypothesis.
I.e., there is insufficient evidence to reject the null hypothesis.
There is a low probability of • There is not enough evidence supporting the alternative hypothesis.
finding means that fall in
these areas.
Graphic reproduced from Gravetter & Wallnau (2007). 7 8
2
1/27/21
T-TESTS
Today, we will learn about the differences between a z-test and a
t-test, as well as the three types of t-tests that exist:
What is a MAJOR limitation of using z-scores? 1. One-sample t-test
Not practical: We often do not know population parameters. 2. Independent-samples t-test
3. Paired-samples t-test
The fix: t-tests!
9 When you come back to this slide to study for your exams, ask yourself:
10
Is it better to use a t-test or a z-test, if given the option to choose either. Explain.
REVIEW
ONE-SAMPLE T-TEST POPULATION VS. SAMPLE VARIANCE & SD

Use a one-sample t-test when you know the true population
mean, but are missing the true population standard deviation. The variance and standard deviation of a population:
%$ − ' ∑% &% − + " ∑% &% − + "

!" = ( *" = *=
(" (" = ( (
)
The variance and standard deviation of a sample:
Look familiar? It’s basically the z-test formula (except for sigma):
∑% &% − & "
%$ − ' ∑% &% − & "
!" = !=
*" = + (−1 (−1
+" +" =
)
[FYI] Proof that they are equivalent when N-1: https://www.khanacademy.org/math/ap-
11 statistics/summarizing-quantitative-data-ap/more-standard-deviation/v/review-and-intuition-why- 12
we-divide-by-n-1-for-the-unbiased-sample-variance
3
1/27/21
ONE-SAMPLE T-TEST
3. Check assumptions. (Finally!)
ü The dependent variable (dyslexia score) is continuous; the

independent variable (musical training) is categorical.
ü Assume that the observations within the sample are
independent.
What if you don’t have the population mean or
ü Assume that the dependent variable (dyslexia score) is
approximately normally distributed. population standard deviation?
ü No outliers beyond ± 4 standard deviations.
When you come back to this slide to study for your exams, ask yourself:
13 14
How might small Ns influence assumptions? What about outliers?
INDEPENDENT-SAMPLES T-TEST INDEPENDENT-SAMPLES T-TEST: UNEQUAL N

Use it when you are comparing two sample means, and you are missing
the population SD and the population mean.
What if you have unequal n in your independent samples t-test?
#$ − #& Why do we use both

Calculate the pooled variance instead of each sample’s variance:
!= samples to calculate SD?
Estimating SD is better
'$& '&& when you use two
+ #$ − #&
($ (& samples instead of one.
!= ,)$ − 1)'$& + ()& − 1)'&&
'(& '(& '+& =
)$ + )& − 2
Degrees of Freedom = N – 2*
)$ + )&
*Note: N is the number of scores across both samples. Also we subtract 2

because there are two samples, and in each sample there is one score that is
not free to vary.
15 16
4
1/27/21
INDEPENDENT-SAMPLES T-TEST EXAMPLE: HYPOTHESIS TESTING

Current theories suggest that a key factor in the No Training Musical Training
reading difficulties experienced by dyslexic children is H0: There is no difference.
deficient rhythmic processing (Reifinger, 2019). It is 15.4 17.5 μNo Music = 18.94 μMusic = 20.8
(μNo Music = μMusic)
thought, however, that these deficiencies can be 12.0 12.9
improved via musical training. To test this, 20 children
with dyslexia were recruited. Ten of these children
21.2 23.4
were enrolled in music lessons, specifically created to 24.3 26.8
develop their motor timing skills, whereas the other 24.1 27.4 TREATMENT
ten were used as a control. After completing the ! !
musical lessons, the participants were tested using the 14.5 14.1 (Music Lessons)
Lucid Rapid Dyslexia Screening test, which scores 17.5 19.2
reading abilities on a scale of 1– 99 (lower scores
indicate poorer reading abilities).
18.1 18.8 Distribution of scores in a Distribution of scores in a
20.4 24.0 population without the population with the
Is there evidence that musical lessons influence 21.9 23.9 treatment musical treatment.
dyslexia scores?
17 18
HYPOTHESIS TESTING INDEPENDENT-SAMPLES T-TEST

1. Formulate the null and alternative hypotheses:
How far removed does the treatment mean need to be from the
original population to be considered significantly different? H0: There is no significant difference in dyslexia scores between
children who did versus did not receive musical training.
? ? ? (μNo Music= μMusic)
HA : There is a significant difference in dyslexia scores between

children who did versus did not receive musical training.
(μNo Music ≠ μMusic)
You will learn today that it depends. However, we use an alpha

criterion of 0.05 (or 5%) as the probability value that is used to
define unlikely sample outcomes.
19 20
5
1/27/21
INDEPENDENT-SAMPLES T-TEST INDEPENDENT-SAMPLES T-TEST

2. Calculate the descriptive stats: 3. Check assumptions.
No Training: Training: ü The dependent variable (dyslexia score) is continuous; the

independent variable (musical training) is categorical.
& = 18.94 & = 20.8 ü Assume that the observations within and between the samples are
" " independent.
∑% &% − & ∑% &% − &
!" = = 17.17 !" = = 25.66 ü Assume that the dependent variable (dyslexia score) is
(−1 (−1
approximately normally distributed in both samples.
!= !" = 17.17 = 4.14 != !" = 25.66 = 5.07 ü No outliers beyond ± 4 standard deviations in either sample.
+ Assume that the variances across both samples are homogenous.
22
REVIEW
INDEPENDENT-SAMPLES T-TEST THE SAMPLING DISTRIBUTION

4. Conduct the test to calculate the observed t statistic: Just like scores, you can calculate the variability/spread of
sample means by sampling, and re-sampling, and re-sampling,
#$ − #& 18.94 − 20.8 and re-sampling, and re-sampling, and re-sampling, and re-sampling, and re-sampling,
!= = = −0.90
and re-sampling, a n d r e - s a m p lin g , andre-sampling
17.17 25.66
'$& '&& +
+ 10 10 sample
($ (& sample
sample
sample
5. Compare to the critical value (1.96...?); reject or fail-to-reject H0. sample
sample
sample sample
sample Population
sample
HOLD UP. sample

sample
sample
sample
sample sample
sample
sample
23 24
6
1/27/21
REVIEW REVIEW
THE SAMPLING DISTRIBUTION STANDARD ERROR OF THE MEAN

You can create a histogram/distribution using the sample means: A smaller spread, means that your sample’s estimate of the true
population mean is more efficient.
Frequency
Frequency
$
!" =
%
Sample Means
What could influence this spread? Sample size.
Interactive Demo: http://shocf.atwebpages.com/bors/textbook/chapter5.html 25 Interactive Demo: http://shocf.atwebpages.com/bors/textbook/chapter5.html 26
REVIEW
STANDARD ERROR OF THE MEAN THE Z DISTRIBUTION

The larger the sample, the smaller the variability/SEM. So far, we have been using the standardized z
distribution. This distribution is static...
The math checks out: The standard error changes as a function of N. If
you keep the standard deviation constant, increasing N will decrease ... therefore, the area under the curve is also static...
your standard error:
.... therefore, the probabilities with the area under the

! %.'( curve are also static...
= = 1.60
" )' The larger the sample size, ... therefore, the critical values for a z distribution (±1.64
Reduced! the less likely the sample
or ±1.96) are also static.
! %.'( means are to vary.
= = 1.01
" -% One size fits all.
27 28
7
1/27/21
THE T DISTRIBUTION CRITICAL VALUES FOR T

When we are using a sample to estimate population The t distribution estimates the normal distribution – the smaller
standard deviation, we are using the t distribution William Sealy Gosset the N, the “fatter” and “flatter” the distribution becomes:
a.k.a. “Student”
(also known as “Student’s t”).
-1.96 1.96
The t distribution will change depending
on your sample size.
Remember: The larger the sample size, the smaller the variability
will be within the sampling distribution
Standard normal distribution (z)
29 30
CRITICAL VALUES FOR T CRITICAL VALUES FOR T

The t distribution estimates the normal distribution – the smaller Compare the area under the curve for an alpha criterion of 0.05:
the N, the “fatter” and “flatter” the distribution becomes: what’s the problem?
normal distribution (z)

t distribution (df = 2)
t distribution (df = 2)
32
What are the similarities and differences between the normal and t distributions?
8
1/27/21

The problem is that the area under the curve at ±1.96 How can we fix this?
is not 5% for the t-distribution - it’s more than 5%. Move the critical value/cut-off over, so that the area under the
curve becomes 5% for the t distribution.
normal distribution (z) normal distribution (z)

-1.96 1.96 -1.96 1.96
t distribution (df = 2) t distribution (df = 2)
-4.303 4.303
When you come back to this slide to study for your exams, ask yourself: When you come back to this slide to study for your exams, ask yourself:
33 34
If you didn’t adjust the critical values when using a t-test, how would this affect statistical errors? Why is normality a necessary assumption for testing a hypothesis using a t-test?

Remember: the distribution changes when N changes. Remember: the distribution changes when N changes.
Therefore, the critical value changes as well. Therefore, the critical value changes as well.

-1.96 1.96 -1.96 1.96
-4.303 4.303 -2.571 2.571
35 36
9
1/27/21

Remember: the distribution changes when N changes. Remember: the distribution changes when N changes.
Therefore, the critical value changes as well. Therefore, the critical value changes as well.

-1.96 1.96 -1.96 1.96
-2.228 2.228 -2.131 2.131
37 38
What are the z and t distributions the same? Are they ever the same?
THE T DISTRIBUTION CRITICAL VALUES FOR T

Fortunately, the critical values for t-tests have been calculated for
If the t distribution changes with the sample size, this you, and summarized into a convenient t table:
distribution is not static...
... therefore, the area under the curve is also not static...
.... therefore, the probabilities with the area under the

curve are also not static...
... therefore, the critical values for a t distribution are also

not static.
One size DOES NOT fit all.

The critical value changes with sample size.
39 40
(Appendix D, Page 620 in the textbook)
10
1/27/21
CRITICAL VALUES FOR T INDEPENDENT-SAMPLES T-TEST

Unfortunately, you need to find a different cut-off value for every 5. Find the critical value (df = N – 2 = 18):
sample. It’s not a static cut-off anymore (like ±1.64 or ±1.96).
1. Decide if you want to use a one- or two-tailed test. 2. Find your alpha criterion.
3. Find your
degrees of
freedom.
For an
independent-
sample t-test,
df = N - 2.
41 42
HYPOTHESIS TESTING: P = ? EXAMPLE: INDEPENDENT-SAMPLES T-TEST
H0 : μNo Training = μTraining 6. Reject or fail-to-reject the null hypothesis:

P (-0.90 | H0 is true)? tobserved < texpected = |-0.90| < 2.101
t = -0.90 Fail to reject the null hypothesis, p > .05.
Reject H0 Reject H0
7. Conclusion:
Reporting In APA format: Children who underwent musical training

t* = -2.101 t* = 2.101 did not have significantly different dyslexia scores than the children who
μ from H0 did not receive musical training, t(18) = -0.90, p > .05.
FAIL TO REJECT
0 H0
P (-0.90 | H0 is true) > .05
Graphic adapted from Gravetter & Wallnau (2007). 43 44
11
1/27/21
Questions? Let’s take a break!
45 46
ONE-SAMPLE T-TEST: CONCLUSIONS STATISTICAL ERRORS
Children who underwent musical training did not have significantly Since we rejected the null hypothesis, one of two things
different dyslexia scores than the children who did not receive musical could have actually happened:
training, t(18) = -0.90, p > .05.
REJECT H0
In this case, we have evidence that the deviation between the
sample and population means was not different than what would
be expected due to chance or random error.
... But are you sure of this finding? There is an effect of music training There is no effect of music training
on dyslexia in reality, and you have on dyslexia in reality, and you have
... Can you ever be 100% sure of anything in statistics...?
correctly rejected the null. committed a Type I error.
47 48
12
1/27/21
STATISTICAL ERRORS – TYPE I STATISTICAL ERRORS – TYPE I
A Type I Error is when you incorrectly reject the null hypothesis. A Type I Error is when you incorrectly reject the null hypothesis.
This is when you find a significant difference but you are not supposed to. This is when you find a significant difference but you are not supposed to.
A false positive. A false positive.
μ0 μ1 Critical cut-off Critical cut-off

μ0 μ1
In this case, would you reject What about now? Would you
the null hypothesis? reject the null hypothesis?
No! μ1 does not fall into the Yes! !" falls into the critical
critical region. region.
#$
49 50
STATISTICAL ERRORS – TYPE I STATISTICAL ERRORS
A Type I Error is when you incorrectly reject the null hypothesis. Likewise, if you were to fail-to-reject the null hypothesis, one
of two things could have actually happened:
This is when you find a significant difference but you are not supposed to.
A false positive.
FAIL-TO-REJECT H0
μ0 μ1 Critical cut-off
Using this sample mean, you
would reject the null
hypothesis, even though you
shouldn’t have rejected it. There is no effect of music training There is an effect of music training
TYPE I ERROR. on dyslexia in reality, and you have on dyslexia in reality, and you have
correctly failed-to-reject the null. committed a Type II error.
!"
51 52
13
1/27/21
STATISTICAL ERRORS – TYPE II STATISTICAL ERRORS – TYPE II
A Type II Error is when you incorrectly fail-to-reject the null hypothesis. A Type II Error is when you incorrectly fail-to-reject the null hypothesis.
This is when you do not find a significant difference when you were supposed to. This is when you do not find a significant difference when you were supposed to.
A false negative. A false negative.
Critical cut-off Critical cut-off

μ0 μ1 μ0 μ1
In this case, would you reject What about now? Would you
the null hypothesis? reject the null hypothesis?
Yes! μ1 falls into the critical No! !" does not falls into the
region. critical region.
53 54
STATISTICAL ERRORS – TYPE II STATISTICAL ERRORS

Can we figure out the likelihood of making an error?
A Type II Error is when you incorrectly fail-to-reject the null hypothesis.
This is when you do not find a significant difference when you were supposed to.
TYPE I – yes. The Type I error rate for an experiment is equal to
A false negative. the alpha criterion (default = 0.05)
Critical cut-off TYPE II – yes, but only indirectly. You can figure out your Type II
μ0 μ1 error rate by calculating your statistical power:
Using this sample mean, you
would fail-to-reject the null Type II error = 1 - Power
hypothesis, even though you
should have rejected it. Power: the probability of rejecting the null hypothesis when the
alternative hypothesis is true.
TYPE II ERROR.
! Increasing power reduces your Type II error rate.
How do the assumptions (or violations to the assumptions) affect Type I and Type II error rates? 55 56
14
1/27/21
FOUR OUTCOMES OF HYPOTHESIS TESTS FOUR OUTCOMES OF HYPOTHESIS TESTS

Actual state of the null hypothesis Actual state of the null hypothesis
(i.e., what is actually happening in the population) (i.e., what is actually happening in the population)
What your experiment finds

True False True False
Reject H0 Type I Error Correct Rejection Reject H0 Type I Error Correct Rejection
(α) (1-β) (α) (1-β)
Fail To Reject H0 Correct Failure to Type II Error Fail To Reject H0 Correct Failure to Type II Error
Reject (β) Reject (β)
(1-α) (1-α)
Alpha (α) is 0.05 (i.e., the default cut off If you have a 5% chance of making a
we use). You have a 5% chance of Type I error, then you are 95% sure your
making a Type I error in an experiment. results are not an error (100% minus 5%).
This is your confidence interval.
57 58
FOUR OUTCOMES OF HYPOTHESIS TESTS FOUR OUTCOMES OF HYPOTHESIS TESTS

Actual state of the null hypothesis Actual state of the null hypothesis
(i.e., what is actually happening in the population) (i.e., what is actually happening in the population)
True False True False

Reject H0 Type I Error Correct Rejection Reject H0 Type I Error Correct Rejection
(α) (1-β) (α) (1-β)
Fail To Reject H0 Correct Failure to Type II Error Fail To Reject H0 Correct Failure to Type II Error
Reject (β) Reject (β)
(1-α) (1-α)
Unfortunately, there is no way of calculating the This is power (1- β). This is the chance of
probability of making a Type II error…You can finding an existing effect. Researchers try
minimize this by increasing the probability of to aim for a minimum power of 0.8 (80%).
correctly rejecting a false H 0 (i.e., power).
59 60
15
1/27/21
STATISTICAL POWER POWER

Critical Value
Power can be increased in three ways:
H0 H1 1. By increasing the size of the difference between the
1- α expected and the observed (i.e., effect size).
(Confidence Interval)
• The larger the difference between your experimental groups,
the more likely it is that you will find significant results (duh).
1-β (Power)
Critical cut-off
μ0 μ1
β
α (one-tailed)
α/2 (two-tailed)
To help you visualize the relationships, move the critical value to the left and to the right.
How do alpha, beta, confidence and power change? 61 62
How do the assumptions (or violations to the assumptions) affect power?
POWER POWER
Power can be increased in three ways: Power can be increased in three ways:
1. By increasing the size of the difference between the 2. By increasing the number of observations, or sample size.
expected and the observed (i.e., effect size). • An increase in sample size will result in a smaller SEM, resulting
• The larger the difference between your experimental groups, in a “skinnier” distribution. Critical values are also less strict.
the more likely it is that you will find significant results (duh).
• Not feasible
μ0 μ1 μ0 μ1
63 64
How do the assumptions (or violations to the assumptions) affect power? How do the assumptions (or violations to the assumptions) affect power?
16
1/27/21
POWER POWER
Power can be increased in three ways: Power can be increased in three ways:
2. By increasing the number of observations, or sample size. 3. By reducing the error variance in the dependent variable.
• An increase in sample size will result in a smaller SEM, resulting • The smaller the variance/noise the greater the power.
in a “skinnier” distribution. Critical values are also less strict.
• Feasible – easiest/quickest resolution! Power analysis (next week!)
μ0 μ1 μ0 μ1
65 66
How do the assumptions (or violations to the assumptions) affect power? How do the assumptions (or violations to the assumptions) affect power?
POWER T-TESTS
Power can be increased in three ways: Today, we will learn about the differences between a z-test and a
3. By reducing the error variance in the dependent variable. t-test, as well as the three types of t-tests that exist:
• The smaller the variance/noise the greater the power.
1. One-sample t-test
• Feasible – Use a paired-samples t-test.
2. Independent-samples t-test
Critical cut-off
μ0 μ1
3. Paired-samples t-test
67 68
How do the assumptions (or violations to the assumptions) affect power? Is it better to use a t-test or a z-test, if given the option to choose either. Explain.
17
1/27/21
PAIRED-SAMPLES T-TEST HYPOTHESIS TESTING: PAIRED-SAMPLES T-TEST

Use it when: you are comparing two sample means, but each
Let’s pretend that we have before/after scores for the children in our study:
participant has participated in BOTH groups (i.e., you are
analyzing a within-subjects independent variable). [...] Here is a table containing children’s dyslexia scores before and after
the musical training. Is there evidence that musical lessons influence
You may ask: why include the dyslexia scores?
Note: for this test, zero? Isn’t it redundant?!
your first step is to first $−0 '" =
'
Before After
find the differences (D) !" = ("
In this test, you are comparing
between participants’ '" to see if the average difference 15.4 17.5
own scores. between the two groups is
significantly different than zero. 12.0 12.9
21.2 23.4
H0: There is no difference between the means of the two samples. 24.3 26.8
HA: There is a difference between the means of the two samples. 24.1 27.4
14.5 14.1
Degrees of Freedom = ND (number of difference scores) – 1 17.5 19.2
18.1 18.8
20.4 24.0
69 70
21.9 23.9
HYPOTHESIS TESTING: PAIRED-SAMPLES T-TEST PAIRED-SAMPLES T-TEST

2. Calculate the difference scores, THEN calculate the descriptive stats:
1. Formulate the null and alternative hypotheses:
Before After Difference
H0: There is no significant difference between children’s dyslexia 15.4 17.5 2.1
scores before and after the musical training (μBefore = μAfter). & = 1.86
12.0 12.9 0.9
∑% &% − & "
21.2 23.4 2.2 !" = = 1.46
HA : There is a significant difference between children’s dyslexia
24.3 26.8 2.5 (−1
scores before and after the musical training. (μBefore ≠ μAfter).
24.1 27.4 3.3 ! = !" = 1.46 = 1.21
14.5 14.1 -0.4
17.5 19.2 1.7
18.1 18.8 0.7
20.4 24.0 3.6 Note: It doesn’t matter if you subtract
the After scores from the Before scores
21.9 23.9 2 (or vice versa), as long as it’s consistent!
71 72
18
1/27/21
PAIRED-SAMPLES T-TEST PAIRED-SAMPLES T-TEST

3. Check assumptions. 4. Conduct the test to calculate the observed t statistic for a sample mean:
ü The dependent variable (dyslexia score) is continuous; the

independent variable (musical training) is categorical. $ − 0 $ − 0 1.86 − 0
!" = = ' = = 4.86
ü Assume that the observations within the sample are independent. '" 1.21
ü Assume that the difference scores for the dependent variable (" 10
(dyslexia score) are approximately normally distributed.
ü No outliers beyond ± 4 standard deviations. 5. Find the critical value (df = ND – 1 = 9):
Lower limit: 1.86 - 4(1.21) = -2.98
Upper limit: 1.86 + 4(1.21) = 6.70
There are no difference scores in our dataset that were below -2.98 or
above 6.70. Therefore, there are no outliers in our dataset.
73 74
HYPOTHESIS TESTING: P = ? EXAMPLE: PAIRED-SAMPLES T-TEST
H0 : μBefore = μAfter 6. Reject or fail-to-reject the null hypothesis:

P (1.86 | H0 is true)? tobserved > texpected = 4.86 > 2.262
Reject the null hypothesis, p < .05.

Reject H0 Reject H0
7. Conclusion:
t = 4.86
Reporting In APA format: Children who underwent musical training
t* = -2.262 t* = 2.262 had significantly higher dyslexia scores (increased by 1.86 points, on
μ from H0 average) after the musical training than before the musical training, t(9) =
REJECT
0 H0 4.86, p < .05.
P (1.86 | H0 is true) < .05
Graphic adapted from Gravetter & Wallnau (2007). 75 76
19
1/27/21
Questions? Let’s take a break!
77 78
ANALYSIS OF VARIANCE (ANOVA)

T-TEST LIMITATIONS
ANOVAs allow us to:
What are the limitations of using t-tests? ✓Compare two or more groups at a time.
1. We can only compare two group means at a time. ✓ Analyze one or more independent variables simultaneously.
2. We can only analyze one independent variable at a time. ✓ Test for interactions between independent variables.
3. You increase the Type I error rate when you do too many test ✓ Control the Type I error rate.
(more on this later!)
We will begin with the one-way between-subjects ANOVA

The fix: The Analysis of Variance (ANOVA)
to analyze differences between two sample means.
Note: “One-Way” refers to the fact that there is one independent variable.
79 80
20
1/27/21
ANALYSIS OF VARIANCE (ANOVA) ANOVA ASSUMPTIONS

Because they are equivalent tests, the assumptions for a between-
We will begin with the one-way between-subjects ANOVA subject ANOVA are the same as those used for an independent-
to analyze differences between two sample means. samples t-test:
ü The dependent variable is continuous; the independent variable is

But wait. categorical (ANOVA can handle 2+ groups/categories though!).
Shouldn’t we be using an independent samples t-test?! ü The data/observations are independent.
ü Assume that the DV is normally distributed for all groups.

You will see by the end of this lecture that the independent
samples t-test and an ANOVA yield the same results. ü Assume that the variances are homogenous across all groups.
ü No outliers beyond ± 4 standard deviations in all groups.

The F statistic is used in ANOVA.
F = t2
81 82
LOGIC OF ANOVA LOGIC OF ANOVA – ERROR ESTIMATE

Remember: The purpose of collecting samples is to You already know how to calculate the error variance. It is
estimate population parameters from sample statistics. the variance that is found within samples (i.e., the spread):
μNM μM
There are two ways of estimating population variance:
1. Error estimate (σ2e )

Frequency
2. Treatment estimate (σ2t)
When the null is true, these two estimates will be the same.
But when the null is false… let’s see! X
Non-Musicians Musicians
∑% &% − & "

!" =
83 (−1 84
21
1/27/21
LOGIC OF ANOVA – ERROR ESTIMATE LOGIC OF ANOVA – ERROR ESTIMATE

If the null hypothesis was true, what would the two If the alternative hypothesis was true, what would the
distributions look like, relative to each other? distributions look like, relative to each other?
μμ
NM
M μNM μM
Frequency
Frequency
X X
Non-Musicians
Musicians Non-Musicians Musicians
85 86
LOGIC OF ANOVA – ERROR ESTIMATE LOGIC OF ANOVA – ERROR ESTIMATE

Compare the two scenarios: The error estimate of variance does not change
regardless of if the null is true or not.
One of the assumptions of an ANOVA is that you require roughly

equal variances (i.e. “homogeneity of variance”). Due to this
assumption, we can simply estimate s 2 by finding the average of
the sample/group variances.
∑ &'#
!"# =
Does the amount of variance within (i.e., the spread of
(
error) the non-musicians’ and musicians’ scores change?
This error estimate is also called the Mean Squared Error
NO! (MSError) or MSWithin
87 88
22
1/27/21
EXAMPLE – ANOVA LOGIC OF ANOVA

Going back to today’s independent-samples t-test example… The purpose of collecting samples is to estimate
population parameters from sample statistics.
Descriptive Statistics:
No Training Musical Training There are two ways of estimating population variance:
Mean 18.94 20.8 ∑ &'(
Variance 17.17 25.66 !"# = ü Error estimate (σ2e )
)
SD 4.14 5.07
+,.+, .(#0.11)
SEM 1.31 1.60 = 2. Treatment estimate (σ2t)
#
N 10 10
= 21.415
This is MSError What the heck is the treatment estimate?!
89 90
LOGIC OF ANOVA – TREATMENT ESTIMATE LOGIC OF ANOVA – TREATMENT ESTIMATE

Another way you can estimate the population variance is to To calculate the treatment estimate of population variance, you
use the variance between sample means (this is a new need to look at how much the group means deviate from the
concept!) grand mean (multiplied by N for each group):
You need to quantify how much the sample means deviate μNM μGRAND μM
from the overall mean, also known as the grand mean.
∑("'()*),
/0# = 1!"# !"# =
Frequency
-(.
This treatment estimate of the variance is called

MSTreatment or MSBetween.
X
Non-Musicians Musicians
91 92
23
1/27/21
LOGIC OF ANOVA – TREATMENT ESTIMATE LOGIC OF ANOVA – ERROR ESTIMATE

If the null hypothesis was true, what would my distributions Compare the two scenarios: Does the amount of variance
look like, relative to each other? between the non-musician and musician groups (i.e., the
μGRAND
μμ
MM spread) change at all?
YES!
Frequency
When the null is false, the treatment

estimate will be biased:
it will be TOO BIG.
X
Non-Musicians
Musicians
93 94
LOGIC OF ANOVA LOGIC OF ANOVA
IF THE NULL IS TRUE IF THE NULL IS FALSE (REJECTED)
H0 : μ1 = μ2 = μ3 = ... = μk = μ H1 : At least ONE group is different.

(when independent variable has “K” levels)
MSerror still unbiasedly estimates the population variance
MStreatment/between and MSerror are two ways of estimating
the same thing, population variance. BUT
Therefore, !"# = !$# MStreatment will be biased by being TOO BIG.
95 96
24
1/27/21
LOGIC OF ANOVA THE F STATISTIC
()* ,-./+0)1+2) ,-5+)6++2

The “Error” term (denominator) doesn’t change ! "#$%& = = =
depending on if we reject or fail-to-reject the H0
(+* ,-3//4/ ,-78)982
because it looks at within group variation (which is
assumed to be equal for all groups).
If H0 is not true… F > 1
If H0 is true… F≤1
Therefore, the “Treatment” term (numerator) changes
depending on whether we reject or fail-to-reject the H0
only. How much greater than 1 does F have to be though?
You need to find a critical value using the F-table!
97 98
()* ,-./+0)1+2) ,-5+)6++2

THE F STATISTIC ! "#$%& =
(+*
=
,-3//4/
=
,-78)982
EXAMPLE – ANOVA
Going back to today’s independent-samples t-test example…
The F statistic tests whether the variation between groups
greater than the variation within groups. Descriptive Statistics:
measure of effect (or treatment) No Training Training
assessed by examining variance
(or difference) between the groups Mean 18.94 20.8 ∑ &'(
Variance 17.17 25.66 !"# =
)
SD 4.14 5.07
measure of random variation (or error) +,.+, .(#0.11)
assessed by examining variance SEM 1.31 1.60 =
#
within the groups N 10 10
= 21.415
This is the same logic used for t-tests! This is MSError
99 100
25
1/27/21
EXAMPLE – ANOVA EXAMPLE – ANOVA

Now let’s calculate MS Treatment. Now let’s calculate MS Treatment.
Descriptive Statistics: Descriptive Statistics:
No Training Training No Training Training

Mean 18.94 20.8 What is the grand mean? Mean 18.94 20.8 How much does 18.94
Variance 17.17 25.66 Variance 17.17 25.66 deviate from the grand
∑ %&
SD 4.14 5.07 GM = SD 4.14 5.07
mean?
'
SEM 1.31 1.60 SEM 1.31 1.60 (18.94 − 19.87)
)*.,- .(01.*)
N 10 10 = N 10 10 = -0.93
0
= 19.87
101 102
EXAMPLE – ANOVA EXAMPLE – ANOVA

Now let’s calculate MS Treatment. Now let’s calculate MS Treatment.
Descriptive Statistics: How much does 18.94 deviate How much does 20.8 deviate
from the grand mean? from the grand mean?
No Training Training
Mean 18.94 20.8 How much does 20.8 (18.94 − 19.87)* = 0.8649 (20.8 − 19.87)* = 0.8649
Variance 17.17 25.66 deviate from the grand
SD 4.14 5.07
mean? 0.8649 * 10 = 8.649 0.8649 * 10 = 8.649
SEM 1.31 1.60 (19.87 − 20.8)
N 10 10 = 0.93 Sum the squared deviations (SS): 8.649 + 8.649 = 17.298
23.*45
Find the average squared deviations (MS):
672
= 17.298 This is MSTreatment
Note: What we are calculating here is really just the formula for variance,
103 104
except with means instead of scores.
26
1/27/21
EXAMPLE – ANOVA
Going back to our original example…
F = t2
$%&'()*+(,* /0.234
F = $%-''.'
=
2/.5/6
= 0.81
;/ − ;2 18.94 − 20.8 Next week:

:= = = −0.90
17.17 25.66 Continuing with the ANOVA!
=/2 =22 +
+ 10 10
>/ >2
In this case, we don’t even need a critical value because F < 1.

Therefore, we fail to reject the null.
105 106
27

Hypothesis Testing (T-Test and ANOVA)

Uploaded by

Copyright:

Available Formats

You might also like

Hypothesis Testing (T-Test and ANOVA)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hypothesis Testing (T-Test and ANOVA)

Uploaded by

Copyright:

Available Formats

1/27/21

Are you interested in paid

Time: ~1.5 hours

COURSE ANNOUNCEMENTS NULL HYPOTHESIS SIGNIFICANCE TESTING (NHST)

HYPOTHESIS TESTING: P < .05 HYPOTHESIS TESTING: P < .05

What is a MAJOR limitation of using z-scores? 1. One-sample t-test

Not practical: We often do not know population parameters. 2. Independent-samples t-test

ONE-SAMPLE T-TEST POPULATION VS. SAMPLE VARIANCE & SD

%$ − ' ∑% &% − + " ∑% &% − + "

ü The dependent variable (dyslexia score) is continuous; the

INDEPENDENT-SAMPLES T-TEST INDEPENDENT-SAMPLES T-TEST: UNEQUAL N

#$ − #& Why do we use both

*Note: N is the number of scores across both samples. Also we subtract 2

INDEPENDENT-SAMPLES T-TEST EXAMPLE: HYPOTHESIS TESTING

HYPOTHESIS TESTING INDEPENDENT-SAMPLES T-TEST

HA : There is a significant difference in dyslexia scores between

You will learn today that it depends. However, we use an alpha

INDEPENDENT-SAMPLES T-TEST INDEPENDENT-SAMPLES T-TEST

No Training: Training: ü The dependent variable (dyslexia score) is continuous; the

INDEPENDENT-SAMPLES T-TEST THE SAMPLING DISTRIBUTION

HOLD UP. sample

THE SAMPLING DISTRIBUTION STANDARD ERROR OF THE MEAN

What could influence this spread? Sample size.

Interactive Demo: http://shocf.atwebpages.com/bors/textbook/chapter5.html 25 Interactive Demo: http://shocf.atwebpages.com/bors/textbook/chapter5.html 26

STANDARD ERROR OF THE MEAN THE Z DISTRIBUTION

.... therefore, the probabilities with the area under the

THE T DISTRIBUTION CRITICAL VALUES FOR T

Standard normal distribution (z)

CRITICAL VALUES FOR T CRITICAL VALUES FOR T

normal distribution (z)

CRITICAL VALUES FOR T CRITICAL VALUES FOR T

normal distribution (z) normal distribution (z)

CRITICAL VALUES FOR T CRITICAL VALUES FOR T

normal distribution (z) normal distribution (z)

CRITICAL VALUES FOR T CRITICAL VALUES FOR T

normal distribution (z) normal distribution (z)

THE T DISTRIBUTION CRITICAL VALUES FOR T

.... therefore, the probabilities with the area under the

... therefore, the critical values for a t distribution are also

One size DOES NOT fit all.

CRITICAL VALUES FOR T INDEPENDENT-SAMPLES T-TEST

HYPOTHESIS TESTING: P = ? EXAMPLE: INDEPENDENT-SAMPLES T-TEST

H0 : μNo Training = μTraining 6. Reject or fail-to-reject the null hypothesis:

Reporting In APA format: Children who underwent musical training

Graphic adapted from Gravetter & Wallnau (2007). 43 44

Questions? Let’s take a break!

ONE-SAMPLE T-TEST: CONCLUSIONS STATISTICAL ERRORS

STATISTICAL ERRORS – TYPE I STATISTICAL ERRORS – TYPE I

μ0 μ1 Critical cut-off Critical cut-off

STATISTICAL ERRORS – TYPE I STATISTICAL ERRORS

STATISTICAL ERRORS – TYPE II STATISTICAL ERRORS – TYPE II

Critical cut-off Critical cut-off

STATISTICAL ERRORS – TYPE II STATISTICAL ERRORS

FOUR OUTCOMES OF HYPOTHESIS TESTS FOUR OUTCOMES OF HYPOTHESIS TESTS

What your experiment finds

FOUR OUTCOMES OF HYPOTHESIS TESTS FOUR OUTCOMES OF HYPOTHESIS TESTS

What your experiment finds

True False True False