Professional Documents
Culture Documents
Session2-Simple Linear Model PDF
Session2-Simple Linear Model PDF
REGRESSION
MODEL
Dr. Indra, S.Si, M.Si
OUTLINE
The Model
House size
6
The Model
House size
7
The Model
y = b 0 + b1x + e
b0 and b1 are unknown population
y parameters, therefore are estimated
y = dependent variable from the data.
x = independent variable
b0 = y-intercept Rise b1 = Rise/Run
b1 = slope of the line b0 Run
e = error variable x
8
Sum of squared differences = (2 - 1)2 + (4 - 2)2 +(1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Let us compare two lines
4 (2,4)
w The second line is horizontal
3 w (4,3.2)
2.5
2
(1,2) w
w (3,1.5)
1 The smaller the sum of
squared differences
the better the fit of the
1 2 3 4
line to the data.
1
S xy = ( x i − x )( yi − y )
n −1
( )
2
x −x
=
i
S xx
n −1
2
Example:
• Solution
– Solving by hand: Calculate a number of statistics
x = 36,009.45;
S xx = 43,528,690
y = 14,822.823; S xy = −2,712,511
where n = 100.
S xy −2,712,511
b1 = = = −.06232
Sx 43,528,690
b0 = y − b1 x = 14,822.82 − (−.06232)(36,009.45) = 17,067
• Solution – continued
– Using the computer
1. Scatterplot
2. Trend function
3. Tools > Data Analysis > Regression
5
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8063
R Square 0.6501
Adjusted R Square
0.6466
Standard Error 303.1
Observations 100 yˆ = 17 ,067 − .0623 x
ANOVA
df SS MS F Significance F
Regression 1 16734111 16734111 182.11 0.0000
Residual 98 9005450 91892
Total 99 25739561
16000
15000
Price
14000
0 No data 13000
Odometer
yˆ = 17 ,067 − .0623 x
The Normality of e
E(y|x3)
The standard deviation remains constant,
b0 + b1x3 m3
E(y|x2)
b0 + b1x2 m2
but the mean value changes with x E(y|x1)
b0 + b1x1 m1
n
SSE =
i =1
( y i − ŷ i ) 2 .
– A shortcut formula
b1 − b1 se
t= s =
( n − 1) S xx
s b1 where b1
Example
– Test to determine whether there is enough e
vidence to infer that there is a linear relation
ship between the car auction price and the
odometer reading for all three-year-old Taur
uses in the previous example .
Use a = 5%.
6
Testing the Slope,
Example
Solving by hand
– To compute “t” we need the values of b1 and sb1.
𝑏1 = −.0623
𝑠𝜀 303.1
𝑠𝑏1 = = = .00462
(99)(43,528,690)
(𝑛 − 1)𝑆𝑥𝑥
𝑏1 − 𝛽1 −.0623 − 0
𝑡= = = −13.49
𝑠𝑏1 .00462
Coefficient of determination
– To measure the strength of the linear relationship
we use the coefficient of determination.
2
S
R =
2 xy
S xx S yy
SSE
or R = 1 −2
( yi − y ) 2
Coefficient of determination
The error
0
Coefficient of determination
y2
Two data points (x1,y1) and (x2,y2)
of a certain sample are shown.
x1 x2
Total variation in y = Variation explained by the + Unexplained variation (error)
regression line
(y1 − y ) 2 + ( y 2 − y ) 2 = (ŷ1 − y) 2 + (ŷ 2 − y) 2 + (y1 − ŷ1 ) 2 + (y 2 − ŷ 2 ) 2
1
Coefficient of determination
2
R = 1−
SSE
=
( y i − y ) 2 − SSE
=
SSR
(y i − y) 2
(y − y )
i
2
(y i − y) 2
Coefficient of determination
Example
– Find the coefficient of determination for the
used car price –odometer example.what doe
s this statistic tell you about the model?
Solution
– Solving by hand;
R = .6501
2
3
Coefficient of determination
– Using the computer
From the regression output we have
SUMMARY OUTPUT
Point Prediction
• Example
– Predict the selling price of a three-year-old Taurus wi
th 40,000 miles on the odometer.
A point prediction
ŷ = 17067− .0623x = 17067− .0623(40,000) = 14,575
Interval Estimates
Interval Estimates
Example - continued
– Provide an interval estimate for the bidding
price on a Ford Taurus with 40,000 miles on
the odometer.
– Two types of predictions are required:
• A prediction for a specific car
• An estimate for the average price per car
8
Interval Estimates
• Solution
– A prediction interval provides the price estimate for a sin
gle car:
1 ( xg − x ) 2
yˆ ta 2 se 1 + +
n ( xi − x ) 2
t.025,98
Approximately
Interval Estimate
Solution – continued
– A confidence interval provides the estimate
of the mean price per car for a Ford Taurus
with 40,000 miles reading on the odometer.
• The confidence interval (95%) =
1 ( x g − x)2
ŷ t a 2 s e +
n
( x i − x)2
ŷ = b 0 + b1x g
1 ( x g − x)
2
ŷ t a 2 s e +
n (n − 1)s 2x
x
1
The effect of the given xg on
the length of the interval
– As xg moves away from x the interval becom
es longer. That is, the shortest interval is fo
und at x.
ŷ = b 0 + b1x g
1 ( x g − x)
2
ŷ t a 2 s e +
n (n − 1)s 2x
ŷ ( x g = x + 1)
ŷ ( x g = x − 1) 1 12
ŷ t a 2 s e +
n (n − 1)s 2x
x −1 x +1
x
( x − 1) − x = −1 ( x + 1) − x = 1
2
The effect of the given xg on the
length of the interval
– As xg moves away from x the interval becomes long
er. That is, the shortest interval is found at x.
ŷ = b 0 + b1x g
1 ( x g − x)
2
ŷ t a 2 s e +
n (n − 1)s 2x
1 12
ŷ t a 2 s e +
n (n − 1)s 2x
x−2 x+2 1 22
x ŷ t a 2 s e +
n (n − 1)s 2x
( x − 2) − x = −2 ( x + 2) − x = 2
3
Regression Diagnostics - I
Residual Analysis
Residual Analysis
Residual Analysis
Standardized residuals
40
30
20
10
0
-2 -1 0 1 2 More
Heteroscedasticity
When the requirement of a constant variance is violated we
have a condition of heteroscedasticity.
Diagnose heteroscedasticity by plotting the residual against
the predicted y.
+
^y
++
Residual
+
+ + + ++
+
+ + + ++ + +
+ + + +
+ + + ++ +
+ + + + y^
+ + ++ +
+ + +
+ + ++
+ + ++
Homoscedasticity
When the requirement of a constant variance is no
t violated we have a condition of homoscedasticity.
Example - continued
1000
500
Residuals
0
13500 14000 14500 15000 15500 16000
-500
-1000
Predicted Price
9
Non Independence of
Error Variables
– A time series is constituted if data were coll
ected over time.
– Examining the residuals over time, no patter
n should be observed if the errors are indep
endent.
– When a pattern is detected, the errors are sa
id to be autocorrelated.
– Autocorrelation can be detected by graphin
g the residuals against time.
0
+
+ ++
+
+ + +
+ + +
0 + 0 + +
+ Time Time
+ + + + + +
+
+
+ ++ +
+
Note the runs of positive residuals, Note the oscillating behavior of the
replaced by runs of negative residuals residuals around zero.
1
Outliers
+++++++++++
+ +
+ … but, some outliers
+ +
+ +
may be very influential
+
+ + + +
+
+ +
+
(g) Compute R2
(f) Conduct significant test for b2.
Assignment No.2