Professional Documents
Culture Documents
CH 10
CH 10
Related?
• So you say to yourself “What could
regression have to do with statistical
significance? These things can’t possibly
be related, right?”
Review
• Recall the basics of regression
• We have two quantitative variables
• We can plot them and draw the best fitting line
through the data
yˆ = 0.125x − 41.4
Slope Y-intercept
b1 b0
6
Notation
• Recall the calculated values from the
SAMPLE are denoted as ŷ = b0 + b1x
where
Y-intercept in the
b0 β0 Beta-zero population
Slope in the
b1 β1 Beta-one population
10
Terminology: Repeating
• Realize that my|x, b0(pronounced Beta not) and b1
(pronounced Beta-one) are unknown numbers that exist
in the population–
– That’s why they are Greek letters
– They are called parameters
Hypothetical Regression
• Let’s suppose I have a (ridiculous)
hypothesis: The more French DNA you
have, the less intelligent you are
12
So β1=?
13
So just because
we find a negative
slope in one
sample it does not
necessarily mean
that the slope in
the population is
negative.
14
We obtain a sample
and it has a
negative slope
How do we decide
if the slope in the
population is
? negative or not?
15
Hypothesis
• We wish to know whether we believe that
something is true in the population
• We only have a sample of data from that
population
• Sound familiar?
• Ho: B1=0
• Ha: B1<0
16
How Decide?
• If the p-value is less than the alpha value...
• If the p-value is less than the alpha value...
• If the p-value is less than the alpha value...
• If the p-value is less than the alpha value...
• If the p-value is less than the alpha value...
• If the p-value is less than the alpha value...
• If the p-value is less than the alpha value...
17
Regression– No Relationship
• If there is truly no relationship in the data then
the regression line will have a slope of zero
• So, we can do hypothesis testing on how
different the slope of the line in the sample is
from zero Recall:
residual=observed
minus predicted value of
Example of Ineffective Regression y.
Best fitting line. 10 If slope=0 then residuals
The same as Y will be the distance from
average y (ybar) the individual data point
7
to the overall average of
the data.
4 Same as the standard
1 2 3 4 5 6 7 8 9 10
deviation of the y
variable (covered in
X (used to try to predict Y) chapter 1)
18
Regression– Relationship
• If x and y are related (lower left) then the
slope of the best fitting line will not be zero
• And the residuals will be smaller than the
standard deviation of y
residual
10 10
4
1 2 3 4 5 6 7 8 9 10 4
1 2 3 4 5 6 7 8 9 10
19
my|x
Wage (y)
Experience (x)
21
s=
residual2
=
i i
( y − ˆ
y ) 2
n−2 n−2
* Recall that yi is the actual data value from the sample and ŷi is the estimate from
the regression line
22
a residual plot:
indicates that the data fits a linear model, and has normally distributed
Curved pattern
→ the relationship is not linear.
Direction of Relationship
• Note that depending on the specifics of the research, the
test for B1 can take different forms
• If we expect that as experience increases then wages
will increase then the form is
Ho: B1=0 vs.
Ha: B1>0
• If we expect that as individuals smoke more then they
will decrease life expectancy then the form is
Ho: B1=0 vs.
Ha: B1<0
• If we suspect a relationship exists but don’t know the
direction then the test is
Ho: B1=0 vs.
Ha: B1≠0
28
• The SPSS output indicates that the best guess for the
slope between T-Bill returns and inflation is between
.428 and .826