T, F & Regression

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 44

t-tests, ANOVA and

regression

Methods for Dummies February 1 st


2006

Jon Roiser and Predrag Petrovic


Overview
 Simple hypothesis testing
 Z-tests & t-tests
 F-tests and ANOVA
 Correlation/regression
Starting point
 Central aim of
statistical tests:
 Determining the likelihood of
a value in a sample, given
that the Null Hypothesis is
true: P(value|H0)
 H0: no statistically significant
difference between sample &
population (or between samples)
 H1: statistically significant
difference between sample &
population (or between samples)
Total
Types of populatio
α

error n
Populat β
ion
studied
True state of the
world
H0 true H0 false
-error
Correct (Type II
Accept
acceptance error)
H0 False
p=1-
Decisi negative
on -error
Reject (Type I Correct
error) rejection
H0
Distribution &
probability
If we know something about the distribution of events in
a population, we know something about the probability
of these events
Populati
on n
mean xi
/2   i 1
n

x n
Population
i
x i 1

standard
deviation
n

 ( xi   ) 2
  i 1
n
Standardised normal
distribution
1 point compared Group compared
Standardised to population to population
xz  0 x
xi   z
sz  1 zi  
 n
 The z-score represents a value on
the x-axis for which we know the
p-value
 2-tailed: z = 1.96 is equivalent
to p=0.05 (rule of thumb ~2SD)
 1-tailed: z = 1.65 is equivalent
to p=0.05 (area between infinity
and 1.65=5% on one of the tails)
Assumptions of
parametric tests
 Variables are:
 Normally distributed
 N>10 (or 12, or 15…)
 On an interval or ideally ratio
scale (e.g. 2 metres=2x1 metre)

 …but parametric tests are


(fairly) robust to violations
of these assumptions
Z- versus t- statistic?
 Z is used when we know
the variance in the
general population e.g.
IQ. This is not normally Large
true! N
 t is used when we do not Small
know the variance of the
underlying population for N
sure, and is dependent on
N
 The t distribution is
similar to the Z (but
flatter)
 For N>30, t≈Z
Two-sample t-test
Difference between the means divided by
the pooled standard error of the mean

x1  x 2
t
s x1  x2

2 2
s1 s2
Group Group
s x1  x2  
n1 n2
1 2
Different types of t-
test
 One-sample
 Tests whether the mean of a population
is different to a given value (e.g. if
chance performance=50% in 2 alternative
forced choice)
 Paired t-test (within subjects)
 Tests whether a group of individuals
tested under condition A is different
to tested under condition B
 Must have 2 values for each subject
 Basically the same as a one-sample t-
test on the difference scores,
comparing the difference scores to 0
Another approach to
group differences
 Instead of thinking about the group
means, we can instead think about
variances
n
Recall sample variance:


(x  x)
i
2

s 
2 i 1
n 1
 F=Variance 1/Variance 2
 ANOVA = ANALYSIS OF VARIANCE
 Total variance=model variance + error
variance
Partitioning the
variance

Grou Grou Grou Grou Grou Grou


p 1 p 2 p 1 p 2 p 1 p 2

Total Model Error


= + (Within
groups)
ANOVA
 At its simplest, one-way ANOVA is the same
as the two-sample t-test
 Recall t=difference between means/spread
around means (pooled standard error of the
mean)
Model (difference Error (spread around
between means) means)
Between groups Within groups
2
sModel
F 2
sError

Grou Grou Grou Grou


p 1 p 2 p 1 p 2
A quick proof from
SPSS Group Statistics

t-test
Std. Error
group2 N Mean Std. Deviation Mean
Depression Ecstasy 15 3.1352 1.45306 .37518
Control 11 1.7157 1.03059 .31073

Independent Samples Test

Levene's Test for


Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
Depression Equal variances
2.105 .160 2.764 24 .011 1.41951 .51363 .35943 2.47958
assumed
Equal variances
2.914 23.991 .008 1.41951 .48715 .41406 2.42496
not assumed

ANOVA

ANOVA
Depression
Sum of
Squares df Mean Square F Sig.
Between Groups 12.788 1 12.788 7.638 .011
Within Groups 40.181 24 1.674
Total 52.968 25

In fact, t=SQRT(F) => 2.764=SQRT(7.638) (for 1


degree of freedom)
ANOVA is useful for more
complex designs
More than 1 More than 1 effect
group (interaction)

● Drug

Placebo

Group Group Group Mal Femal


1 2 3 e e
…but we need to use post-hoc tests
(t-tests corrected for multiple
Differences between t-
tests and F-tests
(especially in SPM)
 t-tests can only be used to compare
2 groups/effects, while ANOVA can
handle more sophisticated designs
(several groups/several
effects/interactions)
 In SPM t-tests are one-tailed (i.e.
for contrast X-Y, significant
voxels are only reported where X>Y)
 In SPM F-tests are two-tailed (i.e.
for contrast X-Y, significant
voxels are reported for both X>Y
and Y>X)
Correlation and
Regression
 Is there a relationship between x and y?
 What is the strength of this relationship
 Pearson’s r
 Can we describe this relationship and use it to predict y
from x?
 Regression
 Fitting a line using the Least Squares solution
 Is the relationship we have described statistically
significant?
 Significance tests
 Relevance to SPM
 GLM
Correlation and
Regression
 Is there a relationship between x and y?
 What is the strength of this relationship
 Pearson’s r
 Can we describe this relationship and use it to predict y
from x?
 Regression
 Fitting a line using the Least Squares solution
 Is the relationship we have described statistically
significant?
 Significance tests
 Relevance to SPM
 GLM
Correlation and
Regression
 Correlation: predictability about the
relationship between two variables
 Covariance: measurement of this
predictability
 Regression: description about the
relationship between two variables
where one is dependent and the other is
independent
 No causality in any of these models
Covariance
n

 (x i  x)( yi  y )
cov( x, y )  i 1
n 1
 When X and Y : cov (x,y) = pos.
 When X and Y : cov (x,y) = neg.
 When no consistent relationship: cov (x,y) = 0
 Dependent on size of the data’s standard deviations (!)
  We need to standardize the data (Pearson’s r)
Correlation and
Regression
 Is there a relationship between x and y?
 What is the strength of this relationship
 Pearson’s r
 Can we describe this relationship and use it to predict y
from x?
 Regression
 Fitting a line using the Least Squares solution
 Is the relationship we have described statistically
significant?
 Significance tests
 Relevance to SPM
 GLM
Pearson’s r
Pearson’s r
 Covariance does not really tell us anything

– Solution: standardise this measure

 Pearson’s r: standardises the covariance value


 Divides the covariance by the multiplied
standard deviations of X and Y:
n

cov( x, y )  ( x  x)( y
i i  y)
rxy   i 1
sx s y (n  1) s x s y
Correlation and
Regression
 Is there a relationship between x and y?
 What is the strength of this relationship
 Pearson’s r
 Can we describe this relationship and use it to predict y
from x?
 Regression
 Fitting a line using the Least Squares solution
 Is the relationship we have described statistically
significant?
 Significance tests
 Relevance to SPM
 GLM
Best-fit line
 Aim of linear regression is to fit a
straight line, ŷ = ax + b to data that gives
best prediction of y for any value of x

 This will be the line that minimises


distance between
ŷ = ax +
data and fitted line, i.e. the residuals b
slope intercept

= ŷ, predicted value
= y i , true value
ε = residual error
Least Squares
Regression
 To find the best line we must minimise
the sum of the squares of the residuals
(the vertical distances from the data
points to our line)
Model line: ŷ = ax + b a = slope, b = intercept

Residual (ε) = y - ŷ
Sum of squares (SS) of residuals = Σ (y – ŷ)2

 We must find values of a and b that minimise


Σ (y – ŷ)2
Finding b
 First we find the value of b that
gives the least sum of squares

b
b
b

 Trying different values of b is equivalent to


shifting the line up and down the scatter plot
Finding a
 Now we find the value of a that
gives the least sum of squares

b b b

 Trying out different values of a is equivalent to


changing the slope of the line, while b stays constant
Minimising the sum of squares
 Need to minimise Σ(y–ŷ)2
 ŷ = ax + b
 So need to minimise:
Σ(y - ax - b)2

sum of squares (SS)


 If we plot the sums of
squares for all different
values of a and b we get a
parabola, because it is a
squared term
Gradient = 0
 So the minimum sum of min S

squares is at the bottom of Values of a and b


the curve, where the
gradient is zero
The solution
 Doing this gives the following equation for a:

r sy r = correlation coefficient of x and y


a= sx
sy = standard deviation of y
sx = standard deviation of x

 From this we can see that:


 A low correlation coefficient gives a flatter slope (low
value of a)
 Large spread of y, i.e. high standard deviation, results in a
steeper slope (high value of a)
 Large spread of x, i.e. high standard deviation, results in a
flatter slope (low value of a)
The solution continued
 Our model equation is ŷ = ax + b
 This line must pass through (x, y) so:

y = ax + b b = y – ax
 We can put our equation for a into this giving:
r sy r = correlation coefficient of x and y
b=y- x sy = standard deviation of y
sx sx = standard deviation of x

 The smaller the correlation, the closer the intercept


is to the mean of y
Back to the model
a b
r sy r sy
ŷ = ax + b = x+y- x
sx sx
a a
r sy
Rearranges to: ŷ= (x – x) + y
sx
 If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y

 But this isn’t very useful

 We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
Correlation and
Regression
 Is there a relationship between x and y?
 What is the strength of this relationship
 Pearson’s r
 Can we describe this relationship and use it to predict y
from x?
 Regression
 Fitting a line using the Least Squares solution
 Is the relationship we have described statistically
significant?
 Significance tests
 Relevance to SPM
 GLM
How can we determine the
significance of the
model?
 We’ve determined the form of the
relationship
(y = ax + b)

  Does a prediction based on this model


do a better job than just predicting the
mean?
We can solve this
using ANOVA
 In general:

Total variance = predicted (or model)


variance + error variance

 In a one-way ANOVA, we have:

VarianceTotal = MSModel +
MSError
MS=SS/df
n
MSModel
 i
( x  x ) 2

F 
s2  i 1 (df ,
model dferror
) MSError
n 1
Partitioning the
variance for linear
regression (using ANOVA)
= Model +
(Between) (Within)

= +

So linear regression and ANOVA


are doing the same thing
statistically, and are the same
Another quick proof
from SPSS
Correlat
Correlations

Ecstasy_

ion
Depression frequency
Depression Pearson Correlation 1 .606*
Sig. (2-tailed) .017

(Pearson
N 15 15
Ecstasy_frequency Pearson Correlation .606* 1
Sig. (2-tailed) .017

’s r)
N 15 15
*. Correlation is significant at the 0.05 level (2-tailed).

ANOVAb

Model
1 Regression
Sum of
Squares
10.843
df
1
Mean Square
10.843
F
7.531
Sig.
.017a
Regressi
on
Residual 18.717 13 1.440
Total 29.560 14
a. Predictors: (Constant), Ecstasy_frequency
b. Dependent Variable: Depression
Relating the F and t
statistics

rˆ ( N  2)
2
F  MS
(df , dferror
)  Model

model 2
1 rˆ MSError

Alternatively (as F is the square of t):

rˆ ( N  2 ) So all we need to
t( N 2 ) 

1 rˆ 2 know is N and r!!
Basic assumptions
 Variables: ratio or interval with >
10 (or 12, or 15…) different pairs of
values
 Variables normally distributed in the
population
 Linear relationship
 Residuals (errors) should be normally
distributed
 Independent sampling
Regression health
warning!
Warning 1: Outliers
Regression health
warning!
Warning 2: More than 1 different population or cont
(aka the “Ecological Fallacy”)
(519
citations)

Science (1997)
277:968-71
Correlation and
Regression
 Is there a relationship between x and y?
 What is the strength of this relationship
 Pearson’s r
 Can we describe this relationship and use it to predict y
from x?
 Regression
 Fitting a line using the Least Squares solution
 Is the relationship we have described statistically
significant?
 Significance tests
 Relevance to SPM
 GLM
Multiple regression
 Multiple regression is used to determine the
effect of a number of independent variables,
x1, x2, x3 etc, on a single dependent
variable, y
 The different x variables are combined in a
linear way and each has its own regression
coefficient:

 y = a1x1+ a2x2 +…..+ anxn + b + ε

 The a parameters reflect the independent


contribution of each independent variable, x,
to the value of the dependent variable, y
 i.e. the amount of variance in y that is
accounted for by each x variable after all
the other x variables have been accounted for
SPM
 Linear regression is a GLM that models the
effect of one independent variable, x, on
ONE dependent variable, y
 Multiple regression models the effect of
several independent variables, x1, x2 etc, on
ONE dependent variable, y
 Both are types of General Linear Model
 GLM can also allow you to analyse the
effects of several independent x variables
on several dependent variables, y1, y2, y3
etc, in a linear combination

 This is what SPM does!


Acknowledgements
 Previous year’s slides
 David Howell’s excellent book
Statistical Methods for
Psychology (2002)
 And David Howell’s website
http://www.uvm.edu/~dhowell/S
tatPages/StatHomePage.html

The lecturers declare that they do not own stocks, shares or capital
investments in David Howell’s book, they are not employed by the
Duxbury group and do not consult for them, nor are they associated
with David Howell, or his friends, or his family, or his cat

You might also like