Professional Documents
Culture Documents
T, F & Regression
T, F & Regression
T, F & Regression
regression
error n
Populat β
ion
studied
True state of the
world
H0 true H0 false
-error
Correct (Type II
Accept
acceptance error)
H0 False
p=1-
Decisi negative
on -error
Reject (Type I Correct
error) rejection
H0
Distribution &
probability
If we know something about the distribution of events in
a population, we know something about the probability
of these events
Populati
on n
mean xi
/2 i 1
n
x n
Population
i
x i 1
standard
deviation
n
( xi ) 2
i 1
n
Standardised normal
distribution
1 point compared Group compared
Standardised to population to population
xz 0 x
xi z
sz 1 zi
n
The z-score represents a value on
the x-axis for which we know the
p-value
2-tailed: z = 1.96 is equivalent
to p=0.05 (rule of thumb ~2SD)
1-tailed: z = 1.65 is equivalent
to p=0.05 (area between infinity
and 1.65=5% on one of the tails)
Assumptions of
parametric tests
Variables are:
Normally distributed
N>10 (or 12, or 15…)
On an interval or ideally ratio
scale (e.g. 2 metres=2x1 metre)
x1 x 2
t
s x1 x2
2 2
s1 s2
Group Group
s x1 x2
n1 n2
1 2
Different types of t-
test
One-sample
Tests whether the mean of a population
is different to a given value (e.g. if
chance performance=50% in 2 alternative
forced choice)
Paired t-test (within subjects)
Tests whether a group of individuals
tested under condition A is different
to tested under condition B
Must have 2 values for each subject
Basically the same as a one-sample t-
test on the difference scores,
comparing the difference scores to 0
Another approach to
group differences
Instead of thinking about the group
means, we can instead think about
variances
n
Recall sample variance:
(x x)
i
2
s
2 i 1
n 1
F=Variance 1/Variance 2
ANOVA = ANALYSIS OF VARIANCE
Total variance=model variance + error
variance
Partitioning the
variance
t-test
Std. Error
group2 N Mean Std. Deviation Mean
Depression Ecstasy 15 3.1352 1.45306 .37518
Control 11 1.7157 1.03059 .31073
ANOVA
ANOVA
Depression
Sum of
Squares df Mean Square F Sig.
Between Groups 12.788 1 12.788 7.638 .011
Within Groups 40.181 24 1.674
Total 52.968 25
● Drug
●
Placebo
(x i x)( yi y )
cov( x, y ) i 1
n 1
When X and Y : cov (x,y) = pos.
When X and Y : cov (x,y) = neg.
When no consistent relationship: cov (x,y) = 0
Dependent on size of the data’s standard deviations (!)
We need to standardize the data (Pearson’s r)
Correlation and
Regression
Is there a relationship between x and y?
What is the strength of this relationship
Pearson’s r
Can we describe this relationship and use it to predict y
from x?
Regression
Fitting a line using the Least Squares solution
Is the relationship we have described statistically
significant?
Significance tests
Relevance to SPM
GLM
Pearson’s r
Pearson’s r
Covariance does not really tell us anything
cov( x, y ) ( x x)( y
i i y)
rxy i 1
sx s y (n 1) s x s y
Correlation and
Regression
Is there a relationship between x and y?
What is the strength of this relationship
Pearson’s r
Can we describe this relationship and use it to predict y
from x?
Regression
Fitting a line using the Least Squares solution
Is the relationship we have described statistically
significant?
Significance tests
Relevance to SPM
GLM
Best-fit line
Aim of linear regression is to fit a
straight line, ŷ = ax + b to data that gives
best prediction of y for any value of x
= ŷ, predicted value
= y i , true value
ε = residual error
Least Squares
Regression
To find the best line we must minimise
the sum of the squares of the residuals
(the vertical distances from the data
points to our line)
Model line: ŷ = ax + b a = slope, b = intercept
Residual (ε) = y - ŷ
Sum of squares (SS) of residuals = Σ (y – ŷ)2
b
b
b
b b b
y = ax + b b = y – ax
We can put our equation for a into this giving:
r sy r = correlation coefficient of x and y
b=y- x sy = standard deviation of y
sx sx = standard deviation of x
We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
Correlation and
Regression
Is there a relationship between x and y?
What is the strength of this relationship
Pearson’s r
Can we describe this relationship and use it to predict y
from x?
Regression
Fitting a line using the Least Squares solution
Is the relationship we have described statistically
significant?
Significance tests
Relevance to SPM
GLM
How can we determine the
significance of the
model?
We’ve determined the form of the
relationship
(y = ax + b)
VarianceTotal = MSModel +
MSError
MS=SS/df
n
MSModel
i
( x x ) 2
F
s2 i 1 (df ,
model dferror
) MSError
n 1
Partitioning the
variance for linear
regression (using ANOVA)
= Model +
(Between) (Within)
= +
Ecstasy_
ion
Depression frequency
Depression Pearson Correlation 1 .606*
Sig. (2-tailed) .017
(Pearson
N 15 15
Ecstasy_frequency Pearson Correlation .606* 1
Sig. (2-tailed) .017
’s r)
N 15 15
*. Correlation is significant at the 0.05 level (2-tailed).
ANOVAb
Model
1 Regression
Sum of
Squares
10.843
df
1
Mean Square
10.843
F
7.531
Sig.
.017a
Regressi
on
Residual 18.717 13 1.440
Total 29.560 14
a. Predictors: (Constant), Ecstasy_frequency
b. Dependent Variable: Depression
Relating the F and t
statistics
rˆ ( N 2)
2
F MS
(df , dferror
) Model
model 2
1 rˆ MSError
rˆ ( N 2 ) So all we need to
t( N 2 )
1 rˆ 2 know is N and r!!
Basic assumptions
Variables: ratio or interval with >
10 (or 12, or 15…) different pairs of
values
Variables normally distributed in the
population
Linear relationship
Residuals (errors) should be normally
distributed
Independent sampling
Regression health
warning!
Warning 1: Outliers
Regression health
warning!
Warning 2: More than 1 different population or cont
(aka the “Ecological Fallacy”)
(519
citations)
Science (1997)
277:968-71
Correlation and
Regression
Is there a relationship between x and y?
What is the strength of this relationship
Pearson’s r
Can we describe this relationship and use it to predict y
from x?
Regression
Fitting a line using the Least Squares solution
Is the relationship we have described statistically
significant?
Significance tests
Relevance to SPM
GLM
Multiple regression
Multiple regression is used to determine the
effect of a number of independent variables,
x1, x2, x3 etc, on a single dependent
variable, y
The different x variables are combined in a
linear way and each has its own regression
coefficient:
The lecturers declare that they do not own stocks, shares or capital
investments in David Howell’s book, they are not employed by the
Duxbury group and do not consult for them, nor are they associated
with David Howell, or his friends, or his family, or his cat