Professional Documents
Culture Documents
Inferential Statistics For Psychology
Inferential Statistics For Psychology
Type 1 error: Rejecting the null hypothesis when the null hypothesis is true.
Type 2 error: Failing to reject the null hypothesis when the null hypothesis is false.
(there is a real effect)
State of Reality
Decision 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒
Retain 𝐻0 Correct Decision Type 2 error = 𝛽
Reject 𝐻0 Type 1 error = 𝛼 Correct Decision= Power
Sum of Columns 1 1
Alpha level (𝛼):The level to which the scientist wishes to limit the probability of making
a Type 1 Error. Set at the beginning of the experiment.
The more stringent the alpha level, the lower the probability of making a Type 1
error. BUT
The more stringent the alpha level, the higher the probability of making a Type 2
error.
Probability of making a Type 1 error decreases greatly with independent
replication.
Sign Test
Step 1: Calculate the pluses and the minuses
Step 2: Evaluate the probability of getting an outcome as extreme as or more extreme
than what was observed, assuming the null hypothesis is true.
Use the binomial distribution.
Since we assume 𝐻0 is true, P = 0.5
Step 3: Compare this probability against 𝛼, if it is smaller than 𝛼, then we reject the null
hypothesis.
1
Size of effect
We must not confuse statistically significant with ‘important’! Real effect may exist but
it may be too small to be important (e.g. this drug helps u lose weight, but 10 calories a
day)
Power is:
A measure of the sensitivity of the experiment to detect a real effect of the IV.
The probability that the results of an experiment will allow rejection of the null
hypothesis if the IV has a real effect.
Goes from 0 to 1 (since it is a probability)
Useful to determine when:
o Initially designing the experiment
o Interpreting the results of experiments that fail to detect any real effect
of the IV
𝑃𝑛𝑢𝑙𝑙 is the probability of getting a plus with any subject in the sample of the experiment
when the IV has no effect.
- 𝑃𝑛𝑢𝑙𝑙 is always 0.5
𝑃𝑟𝑒𝑎𝑙 is the probability of getting a plus with any subject in the sample of the experiment
when the IV has a real effect.
- in reality, you don’t know 𝑃𝑟𝑒𝑎𝑙 , but we provide estimates for it using pilot work
or other research. Then we design experiment that has power high enough to
detect that size of effect.
- It is also the proportion of pluses in the population if the experiment were done
on the entire population and the independent variable has a real effect.
Calculation of power
Step 1: Assume the null hypothesis is true. Using 𝑃𝑛𝑢𝑙𝑙 = 0.5, determine the possible
sample outcomes in the experiment that allow 𝐻0 to be rejected.
o If the test is two-tailed, start with both ends of N.
o If the test is one-tailed, start with the end of N in your predicted direction.
2
Step 2: For the level of 𝑃𝑟𝑒𝑎𝑙 under consideration, determine the probability of getting
the sample outcomes in Step 1.
Chapter 12: Sampling Distributions, SDM and the Normal (Z) Test
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
= 𝜎
√𝑁
2. Find 𝑍𝑐𝑟𝑖𝑡 𝑔𝑖𝑣𝑒𝑛 𝛼, which can be either 1-tailed or 2-tailed.
3. Evaluate the statistic
If |𝑧𝑜𝑏𝑡 | ≥|𝑧𝑐𝑟𝑖𝑡 |, reject 𝐻0 .
3
Chapter 13: Single sample T-test
Main difference: Sometimes 𝜎 is unknown. In that case, we estimate it with s, the sample
standard deviation.
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
𝑡𝑜𝑏𝑡 =
𝑠𝑋̅
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
= 𝑠
√𝑁
Degrees of Freedom
Number of scores that are free to vary in calculating that statistic
For sample s.d, since the sum of deviations about the mean must equal zero, only
N-1 of the deviation scores are free to take on any value.
For mean, it has N degrees of freedom since even after we know the N-1th score,
the Nth score can still take on any value.
The t dist varies uniquely with degrees of freedom.
T vs Z distributions
As df increases, the t distribution approximates the normal curve.
Hence, when df → ∞, the t dist is identical to the z dist.
At any df other than ∞, the t distribution has more extreme t values than the z
dist.
o Cuz the tails of the t distribution are elevated relative to the z
distribution.
The t test is less powerful than the z test. For any alpha level, 𝑡𝑜𝑏𝑡 must be higher
than 𝑧𝑜𝑏𝑡 to reject the null hypothesis.
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
= 𝑠
√𝑁
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
=
𝑆𝑆
√
𝑁(𝑁 − 1)
2. Find 𝑡𝑐𝑟𝑖𝑡 𝑔𝑖𝑣𝑒𝑛 𝛼, which can be either 1-tailed or 2-tailed.
If 𝛼 is two-tailed, you need to supply the ± sign for 𝑡𝑐𝑟𝑖𝑡 .
3. Find the degrees of freedom. df = N - 1
4. Evaluate the statistic
If |𝑡𝑜𝑏𝑡 | ≥|𝑡𝑐𝑟𝑖𝑡 |, reject 𝐻0 .
4
b. Null hypothesis is normally distributed.
|𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒|
𝑑=
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
|𝑋𝑜𝑏𝑡 − 𝜇|
𝑑̂ =
𝑠
Value of 𝑑̂ Interpretation of 𝑑̂
0-0.2 Small effect
0.21-0.79 Medium effect
≥ 0.8 Large effect
Confidence Interval
Range of values that probably contains the population value.
A 95% CI is an interval such that the probability is 0.95 that the interval contains
the population value.
WRONG: the probability is 0.95 that the population mean lies within the interval.
The larger the interval, the more confidence we have that it contains the
population mean. That's why 99% CI is wider than 95% CI.
95% CI:
𝜇𝑙𝑜𝑤𝑒𝑟 = (𝑋𝑜𝑏𝑡 ) − 𝑠̅̅̅̅
𝑋1 𝑡0.025
𝜇𝑢𝑝𝑝𝑒𝑟 = (𝑋𝑜𝑏𝑡 ) + 𝑠̅̅̅̅
𝑋1 𝑡0.025
FYI: If test is one-tailed, then
95% CI is [-∞, 𝑋𝑜𝑏𝑡 + 𝑠̅̅̅̅
𝑋1 𝑡0.05 ] or [𝑋𝑜𝑏𝑡 − 𝑠̅̅̅̅
𝑋1 𝑡0.05 , ∞]
5
Pagano Chapter 14: Correlated groups T-test & Independent groups T-test.
𝐻0 : 𝜇𝐷 = 0
𝐻1 : 𝜇𝐷 ≠ 0
6
Use df = N -1 with the relevant 𝛼, 𝑢𝑠𝑢𝑎𝑙𝑙𝑦 𝛼 = 0.052 𝑡𝑎𝑖𝑙
Null Hypothesis:
𝐻0 : 𝜇1 = 𝜇2
Alternative Hypothesis:
𝐻1 : 𝜇1 ≠ 𝜇2
7
𝑋1 = variance of the sampling distribution of the mean for samples of size
2
𝜎̅̅̅̅
𝑛1 taken from the first population.
𝑋2 = variance of the sampling distribution of the mean for samples of size
2
𝜎̅̅̅̅
𝑛2 taken from the first population.
𝜎1 2 𝜎2 2
= √ +
𝑛1 𝑛2
1 1
= √𝜎 2 ( + )
𝑛1 𝑛2
We can estimate 𝜎 2 𝑤𝑖𝑡ℎ 𝑠𝑤 2 , a weighted average of the sample variances 𝑠1 2 and 𝑠2 2 .
Weighting is done using degrees of freedom as the weights.
𝑆𝑆1 + 𝑆𝑆2
𝑠𝑤 2 =
𝑛1 + 𝑛2 − 2
Evaluating the test statistic:
𝑋̅1 − 𝑋̅2
𝑡𝑜𝑏𝑡 =
1 1
√𝑠𝑤 2 ( + )
𝑛1 𝑛2
𝑋̅1 −𝑋̅2
= 𝑆𝑆1 + 𝑆𝑆2 1 1
√𝑛 +𝑛 ( + )
1 2 −2 𝑛1 𝑛2
2
(∑ 𝑆𝑆1 )
Recall that 𝑆𝑆1 = ∑ 𝑆𝑆1 2 − 𝑛1
Note: The t distribution varies both with N and degrees of freedom, but it varies
uniquely only with degrees of freedom. Hence, the t distribution corresponding to 13 df
is the same whether it is derived from the single sample situation with N = 14 or in the
two-sample situation with N = 15.
8
the t-test is a robust test, meaning that it is relatively insensitive to violations of its
underlying mathematical assumptions.
The t-test is relatively insensitive to normality and homogeneity of variance
assumption.
Depending on sample size and the type and magnitude of the violation.
If 𝑛1 = 𝑛2 and the size of each sample is ≥ 30 , the t test for independent groups
may be used without appreciable error despite moderate violation of the normality
and/or the homogeneity of variance assumptions.
̂
Value of 𝒅 ̂
Interpretation of 𝒅
0-0.2 Small effect
0.21-0.79 Medium effect
≥ 0.8 Large effect
Comparing the power of the correlated groups t test and the independent groups t test:
Since 𝑆𝑆𝐷 measures the variability of the difference scores, it is much smaller than 𝑆𝑆1 +
𝑆𝑆2 which are measures of the variability of the raw scores. Hence, generally, the
correlated groups t-test is more powerful.
However, the independent groups design is more efficient from a df perspective (i.e. you
can get more degrees of freedom for the same sample size.) Since as df increases, the
lower the 𝑡𝑐𝑟𝑖𝑡 , the easier it is to reject 𝐻0 .
Also, there are situations in which you can’t use correlated groups design. For example,
if your IV is men vs women, or if the effect of the first condition persists for too long, or
in learning experiments (where learning is irreversible). Experimenter also doesn’t
know which variables are important for matching so as to produce a higher correlation.
Confidence Interval
95% CI:
9
𝜇𝑙𝑜𝑤𝑒𝑟 = (𝑋̅1 − 𝑋̅2 ) − 𝑠̅̅̅̅
𝑋1 −𝑋̅̅̅̅ 𝑡
2 0.025
̅ ̅
𝜇𝑢𝑝𝑝𝑒𝑟 = (𝑋1 − 𝑋2 ) + 𝑠̅̅̅̅
𝑋1 −𝑋̅̅̅̅ 𝑡
2 0.025
𝑆𝑆1 + 𝑆𝑆2 1 1
where 𝑠̅̅̅̅
𝑋1 −𝑋̅̅̅̅
2
= √𝑛 ( +𝑛 )
1 +𝑛2 −2 𝑛1 2
Interpretation: We are 95% confident that the range of 𝜇𝑙𝑜𝑤𝑒𝑟 - 𝜇𝑢𝑝𝑝𝑒𝑟 contains the real
effect of the IV. If so, then the real effect of the IV is to cause
𝜇𝑙𝑜𝑤𝑒𝑟 - 𝜇𝑢𝑝𝑝𝑒𝑟 more matings than the placebo.
If CI contains 0 , then the effect is insignificant.
With t-test and z-test, we have been using the mean as the basic statistic for evaluating
the null hypothesis.
However, it is also possible to use the variance of the data for hypothesis testing. For
this, we use the F-test.
Hence, the sampling distribution of F gives all the possible F values along with the p(F) for
each value, assuming sampling is random from the population.
Generally,
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 1 𝑜𝑓 𝜎 2
𝐹𝑜𝑏𝑡 =
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 2 𝑜𝑓 𝜎 2
and has one value of df for numerator and denominator each. 𝑑𝑓1 = 𝑛1 − 1, 𝑑𝑓2 = 𝑛2 − 1
10
Null Hypothesis: The different conditions are all equally effective; The scores in
each group are random samples from populations with the same mean value.
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 … = 𝜇𝑘
Alternative hypothesis: At least one situation affects the DV differently from other
situations. Samples are random samples from populations where not all population
means are equal. (non-directional)
Note: the ANOVA assumes that variance of the populations from which the samples
are taken are equal. 𝜎1 2 = 𝜎2 2 = ⋯ = 𝜎𝑘 2
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 increases with the magnitude of the IV’s effect, whereas the 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 is
unaffected. Thus, the larger the F-ratio, the more unreasonable the null hypothesis
becomes.
𝑆𝑆1 + 𝑆𝑆2
𝑠𝑤 2 =
𝑛1 + 𝑛2 − 2
Looking at 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 :
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = estimate of 𝜎 2
= 𝑛𝜎𝑋 2
which can be estimated as
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑛𝑠𝑋 2
11
∑(𝑋−𝑋𝐺 ) 2
=𝑛[ 𝑘−1
]
𝑆𝑆
= 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝑏𝑒𝑡𝑤𝑒𝑒𝑛
2
(∑ 𝑋1 )2 (∑ 𝑋2 )2 (∑ 𝑋3 )2 (∑ 𝑋𝑘 )2 (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = [ + + …+ ]−
𝑛1 𝑛2 𝑛3 𝑛𝑘 𝑁
Degrees of freedom:
The critical value is obtained using a set of two dfs, one in the numerator and one in the
denominator.
𝑑𝑓𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛
=
𝑑𝑓𝑑𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛
𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑘 − 1
𝑑𝑓𝑡𝑜𝑡𝑎𝑙 = 𝑁 − 1
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛
6. Calculate 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛
𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
7. Calculate 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
8. Calculate 𝐹𝑜𝑏𝑡 = 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛
9. Evaluate 𝐹𝑜𝑏𝑡 . Find 𝐹𝑐𝑟𝑖𝑡 using 𝛼 = 0.05, 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑎𝑛𝑑 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛 .
If 𝐹𝑜𝑏𝑡 ≥ 𝐹𝑐𝑟𝑖𝑡 , reject 𝐻0 . Otherwise, retain 𝐻0 .
12
2. The samples are drawn from populations of equal variances. (Homogeneity of
variance assumption.)
Note: Like the t test, the ANOVA is a robust test. It is minimally affected by violations of
population normality. It is also relatively insensitive to violations of homogeneity of
variance, provided the samples are of equal size.
Size of effect:
We can use Eta-squared (𝜂 2 ) to determine size of effect in a one-way, independent
groups. It is conceptually very similar to 𝑅 2, as it also provides an estimate of the
proportion of the total variability of Y that is accounted for by X.
Disadvantages of 𝜂 2 :
More biased estimate as compared to omega squared.
Biased estimate is larger than true size of the effect.
How to calculate 𝜂 2 :
𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝜂2 =
𝑆𝑆𝑡𝑜𝑡𝑎𝑙
Intepretation: e.g. if 𝜂 2 is 0.79. The situations (provided by change in level of IV) account
for 79% of the variance in Y.
Cohen’s criteria:
𝜼𝟐 (proportion of variance accounted for) Interpretation
0.01-0.05 Small effect
0.06-0.13 Medium effect
≥0.14 Large effect
Multiple Comparisons
A significant F value tells us that at least one condition differs from at least one of the
others. In addition, we are also interested in determining which of the conditions differ
from each other. Hence, we need to make multiple comparisons between pairs of group
means.
We employ the t test for independent groups. So when comparing conditions 1 and 2,
we use the following equation:
13
𝑋̅1 − 𝑋̅2
𝑡𝑜𝑏𝑡 =
1 1
√𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 ( + )
𝑛1 𝑛2
We use 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 instead of 𝑠𝑤 2 as it is a better estimate with 3 or more groups instead
of just 2.
Tukey HSD:
One of the methods used for post hoc comparisons.
Maintains Type 1 error rate at 𝛼. However, the Tukey HSD maintains the Type 1
error at 𝛼 when controlling for all possible comparisons between pairs of means.
(vs controlling for all possible comparisons in the Scheffe Test)
Uses the Q-statistic: Studentized range distribution.
𝑋̅𝑖 − 𝑋̅𝑗
𝑄𝑜𝑏𝑡 =
√𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 /𝑛
where X ̅ i is the larger of the two means being compared.
̅ j is the smaller of the two means being compared.
X
Since the smaller means is always subtracted from the larger mean, 𝑄𝑜𝑏𝑡 is always
positive.
To get 𝑄𝑐𝑟𝑖𝑡, we must know the df, k and the alpha level. The degrees of freedom are
associated with 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 .
Decision rule:
If 𝑄𝑜𝑏𝑡 ≥ 𝑄𝑐𝑟𝑖𝑡 , reject 𝐻0 . Otherwise, retain 𝐻0 .
Factorial experiment:
Main effect: The effect of factor A and the effect of factor B are called main effects.
Interaction effect: The effect of one factor is not the same at all levels of the other factor.
14
Variable B1 (light) Variable B2 (heavy)
Variable A1 (morning) A1B1 A1B2
Variable A2 (evening) A2B1 A2B2
Recall that from the One-way ANOVA, we can partition the total sum of squares into:
Same logic for the 2-way ANOVA, we can partition the 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 :
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠
𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 =
𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠
Conceptually,
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 = 𝑆𝑆11 + 𝑆𝑆12 + ⋯ 𝑆𝑆𝑟𝑐
Computationally,
𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 2 2 2
2
(∑𝑐𝑒𝑙𝑙 11 𝑋) + (∑𝑐𝑒𝑙𝑙 12 𝑋) + ⋯ + (∑𝑐𝑒𝑙𝑙 𝑟𝑐 𝑋)
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 = ∑ 𝑋 − [ ]
𝑛𝑐𝑒𝑙𝑙
𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 = 𝑟𝑐(𝑛 − 1)
𝑆𝑆𝑟𝑜𝑤𝑠
𝑀𝑆𝑟𝑜𝑤𝑠 =
𝑑𝑓𝑟𝑜𝑤𝑠
15
Computational equation of the row of sum of squares:
2 2 2
(∑𝑟𝑜𝑤 1 𝑋) + (∑𝑟𝑜𝑤 2 𝑋) + ⋯ + (∑𝑟𝑜𝑤 𝑟 𝑋)2 (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑟𝑜𝑤𝑠 =[ ]−
𝑛𝑟𝑜𝑤 𝑁
𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠
𝑀𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠 =
𝑑𝑓𝑐𝑜𝑙𝑢𝑚𝑛𝑠
2 2 2 2
(∑𝑐𝑜𝑙𝑢𝑚𝑛 1 𝑋) + (∑𝑐𝑜𝑙𝑢𝑚𝑛 2 𝑋 ) + ⋯ + (∑𝑐𝑜𝑙𝑢𝑚𝑛 𝑟 𝑋) (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠 =[ ]−
𝑛𝑐𝑜𝑙𝑢𝑚𝑛 𝑁
An interaction exists when the effect of the combined action of the variables is
different from that which would be predicted by the individual effects of the
variables.
It is an estimate of 𝜎 2 plus the interaction of A and B
If there is no interaction after any main effects are removed, then the population cell
means are equal and differences among cell means must be due to random sampling
from identical populations. In this case, 𝑀𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 would be an estimate of 𝜎 2
alone.
𝑆𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
𝑀𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 =
𝑑𝑓𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
2 2 2 2
(∑𝑐𝑒𝑙𝑙 11 𝑋) + (∑𝑐𝑒𝑙𝑙 12 𝑋) + ⋯ + (∑𝑐𝑒𝑙𝑙 𝑟𝑐 𝑋) (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 =[ ]−
𝑛𝑐𝑒𝑙𝑙 𝑁
− 𝑆𝑆𝑟𝑜𝑤𝑠 − 𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠
𝑑𝑓𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 = (𝑟 − 1)(𝑐 − 1)
16
Main effect: The different levels of exercise intensity have the same effect on nighttime
sleep. The population column means for averaged over the different levels of exercise
intensity are equal. (𝜇𝑏1 = 𝜇𝑏2 = 𝜇𝑏3 )
Interaction effect: There is no interaction between time of day and exercise intensity.
With any main effects removed, the population cell means are equal.
(𝜇𝑎1𝑏1 = 𝜇𝑎1𝑏2 = 𝜇𝑎1𝑏3 = 𝜇𝑎2𝑏1 = 𝜇𝑎2𝑏2 = 𝜇𝑎2𝑏3 )
9. Calculate the F ratio for row effect, column effect and interaction effect.
10. Evaluate the 𝐹𝑜𝑏𝑡 values. If 𝐹𝑜𝑏𝑡 > 𝐹𝑐𝑟𝑖𝑡 , then we reject null hypothesis.
Correlation: Finding out whether a relationship exists and determining its magnitude
and direction.
Uses of correlation:
For prediction purposes
First step towards proving that two variables are casually related.
“test-retest reliability”
Correlation coefficient:
Also known as ‘r’, expresses quantitatively the magnitude and direction of the
relationship
17
How to tell if a linear relationship has high r? Look at the scatterplot. The closer
the points are to the regression line, the higher the magnitude of the correlation
coefficient and the more accurate the prediction.
The X and Y values can be transformed into standardised Z scores. If the paired
scores have the same z value, then they occupy the same relative position within
their own distributions, then the correlation is perfect (r=1)
Pearson r is a measure of the extent to which paired scores occupy the same or
opposite positions within their own distributions.
Only calculate r where the data are of interval or ratio scaling
18
∑(𝑌 − 𝑌 ′ )2
𝑠𝑌|𝑋 = √
𝑁−2
We divide by N-2 because calculate of the S.E involves fitting the data to a straight line.
To do so requires estimation of two parameters, slope and intercept, leaving the
deviations about the line with N-2 degrees of freedom.
Considerations:
Basic computation group must be representative of the prediction group. (Can
be achieved through random sampling of the population)
Linear regression equation is properly used just for the range of the variable for
which the relationship was computed. E.g. GPA on IQ scores was computed for
110-138. cannot use to predict GPA of 140.
Parametric Non-parametric
Depends considerably on population Does not depend on knowing population
characteristics, or parameters for use. distributions Distribution-free tests
E.g. Z test requires us to specify the mean E.g. Chi-squared and sign tests
and SD of the null-hypothesis population,
as well as requiring that population scores
must be normally distributed when N is
small.
More powerful than non-parametric tests.
E.g sign test has less power compared to t-
test for correlated groups.
Always use parametric tests if the data
meet the assumptions of the test.
Good for ordinal, interval or ratio data Good for nominal (categorical) data.
Can test 3, 4 or more variables and their No comparable technique exists for non-
interactions parametric.
Note: Since the direction of the difference doesn’t matter, because we have already
squared the difference, chi-squared test is a non-directional test.
The chi-squared test can also be conducted to determine whether two categorical
variables are independent or related.
19
Null hypothesis: Variable A and Variable B are independent.
Step 1: Calculated expected frequency for each cell (𝑓𝑒 ). If we do not know the
population proportions, we can estimate them from the sample. You can do this by
multiplying the marginals for that cell (row total and column total) and dividing by N.
(𝑓0 −𝑓𝑒 )2
Step 2: Calculate 𝜒 2 𝑜𝑏𝑡 = ∑ 𝑓𝑒
Step 3: Obtain 𝜒 2 𝑐𝑟𝑖𝑡 where df = (r-1)(c-1) for contingency tables
Step 4: Decision rule: If 𝜒 2 𝑜𝑏𝑡 ≥ 𝜒 2 𝑐𝑟𝑖𝑡 , 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 .
Assumptions underlying 𝜒 2 :
1. There is independence between each observation recorded in the contingency
table. (i.e. each subject can have only one entry in the table)
2. The expected frequency in each cell is at least 5 where r or c is greater than 2.
The expected frequency in each cell is at least 10 if both r and c are smaller than
2. (if not, should use Fisher’s exact probability test)
Note: Chi-squared can be used with ordinal, interval and ratio data. But it must be
reduced to mutually exclusive categories.
20