Professional Documents
Culture Documents
Session 5 Marked B PDF
Session 5 Marked B PDF
Session 5 Marked B PDF
• Correlation
• Prediction
• Linear regression
• R-Square
2
Variables
EXAMPLE:
Y . Y
. . .
Performance . Error Rate .
. . . .
.. .. .
. . .
. . .
. . .
. . .
. .
. .
.
X X
Test Score Practice Time
Y . . Y
. . . .
Sales . . Unit Cos t
. . . . . .
. .
. . . . .
. . . .
. . . .
. . . .
. . . . .
. . .
. .. .
. . .
X X
Advertis ing Production Rate
5
r = 0.6 r = 0.8
r = 0.95 r = 0.99
6
Interpretation of r
Always: -1 r 1
r has no units of measurement
EXAMPLE:
The following data collected on our company's 5
salespeople. We've recorded the number of
years of sales experience for each one along
with the amount of sales generated last month.
16 15
6 3
12 10
1 0
10 7
8
SX =
( X − X ) 2 and SY =
(Y − Y)2
n −1 n −1
cov(X ,Y )
r= (2)
S X SY
r=
XY − nXY (3)
X 2 − nX 2 Y 2 − nY 2
(a) It doesn't matter which variable we call X and which Y.
Example:
For the Sales problem, find r.
X Y X2 Y2 XY
16 15
6 3
12 10
1 0
10 7
Sum
11
Prediction
100
80
Course Grade
60
40
20
30 40 50 60 70 80 90 100
Exam 3
Y = 0 + 1X +
Regression Analysis
SS(Residuals)
180
160
140
120
100
80
60
40
20
0
0 2 4 6 8 10 12
(Y − Yˆ )
2
is the sum of squared errors (SSE), or the
(Y(Y− Y−öY)ˆ )
2 2
The value of can be anything and it
doesn’t mean much by itself. Instead, we can compute
Root Mean Square Error (RMSE) or Standard Error:
(Y(Y− Y
−Y )
ö) ˆ2 2
Standard Error = s =
n−2
n−2
SSE (e12 + e22 + ... + en2 )
RMSE = =
(n − 2) (n − 2)
The Standard Error (or RMSE) s is an estimate of ,
the standard deviation of data around the regression
line.
ŶYö = b0 + b1X
b1 =
XY − nXY
X 2 − nX 2
and
b 0 = Y − b1X
Example:
For the Sales problem, find the regression equation.
(Use the table of computations we used for r.)
Software Notes
13
19
Regression Printout
Regression Statistics
R Square 0.9711
Adjusted R Square 0.9614
Standard Error 1.1536
Observations 5
ANOVA
df SS MS F Significance F
Regression 1 134.0076 134.0076 100.6964 0.0021
Residual 3 3.9924 1.3308
Total 4 138
Note:
The slope b1 and the correlation r will always have the
same sign:
+, -, or 0
sY
because of the formula b1 = r which we’ll use later in the chapter.
sX
20
R-Square
Example:
In our data set, r 2 = 0.9711
where:
(
Y − Yˆ ) 2
SSR = SS(Regression) = “Explained SS” =
( )
2
SST = SS(Total) = Total SS = Y −Y
This term does not depend on the X variable and how
well it predicts Y.
SS(Regression)
r2 =
SS(Total)
or
SS(Residual)
r 2 = 1−
SS(Total)
22
X , Y , s X , s Y , and r
sY
b1 = r
sX
b 0 = Y − b1X
where
r = Correlation between X and Y
s Y = Standard deviation of Y values
s X = Standard deviation of X values
Example:
A student scores 55% on test #1 in Finance. Predict
his overall grade using regression based on last
semester's data.
Solution:
24
b0 − β0 b1 − β1
~ Tn − 2 ~ Tn − 2
SE(b0 ) SE(b1 )
6
25
Question:
How do we know if the regression line is useful?
continued...
26
Example:
In finance, the risk of a stock is measured by its “beta
value.” This is the slope in a linear regression model1
E(Y ) = 0 + 1X
Terminology:
Beta is also described as measuring a stock’s:
-- correlated relative volatility
-- non-diversifiable risk
-- systematic risk
-- market risk
1 This is stated more precisely using the current risk-free interest rate: subtract this interest rate from the data
(weekly returns) for both X and Y. If that rate is roughly constant over the time period we are analyzing, then the
regression model as given above is essentially the same. Also, the intercept in this model should be 0 if the market is
efficient.
2 Return =(dividends received+change in price)/(price at the start of the day, week or month). This is the % change.
3 The risk specific to a company is not of major importance since it can be reduced through diversification. The risk
measured by is more serious since it combines with similar risk of the other stocks in our portfolio.
1
30
Example:
No dividends were paid during this time period, otherwise we would have had to
adjust the prices but finance.yahoo.com does that for us automatically.
31
Jan-June 2010
0.15
0.10
Apple % Weekly Change
0.05
0.00
-0.05
-0.10
-0.15
-0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04
S&P500 % Weekly Change
Regression Statistics
Multiple R 0.8238
R Square 0.6787
Adjusted R Square 0.6647
Standard Error 0.0281
Observations 25
ANOVA
df SS MS F Significance F
Regression 1 0.03848 0.03848 48.58683 4.18706E-07
Residual 23 0.01822 0.00079
Total 24 0.05670
A stock with a slope of 2 would tend to increase twice as fast as the S&P 500 and
decrease twice as fast. A stock with a slope of only 0.50 would tend to increase or
decrease only 1/2 as fast as the S&P 500.
33
continued…
s Apple
b1 = r
s S&P 500
4 A weekly (or daily) s is usually converted to yearly data and then stated as a %. The annual volatility:
Apple: 0.049 52 = 35.3%
S&P500: 0.027 52 = 19.5%
34
continued…
Example:
Here is a data set consisting of the number of people
who play golf in this country and the number of
missed days at work in the entire U.S. due to reported
injury or illness.
Solution:
A fitted regression line gives...
Example:
X Y
1 5
2 2
3 1
4 2
5 5
(b) Plot the 5 data points. Explain the result of (a) now
that you have seen the plot.
Y3
0
0 1 2 3 4 5 6
X