Professional Documents
Culture Documents
01 SLR Final
01 SLR Final
1
Population Regression model
Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual
Y β 0 β1X ε
Variable
2
Linear Regression Assumptions
• Error values (ε) are statistically independent
• Error values are normally distributed for any given value of x
• The probability distribution of the errors is normal
• The probability distribution of the errors has constant variance
• The underlying relationship between the x variable and the y
variable is linear
3
Population Linear Regression
(continued)
y Yˆ β̂ 0 β̂1X
Observed Value
of y for xi Slope = β1
εi
Predicted Value Random Error
of y for xi
for this x value
Intercept = β0
xi x
4
Estimated Regression Model
Independent
5
Scatter plot
• Plot of All (Xi, Yi) Pairs
• Suggests How Well Model Will Fit
Y
60
40
20
0 X
0 20 40 60
6
Thinking Challenge
• How would you draw a line through the points? How do you determine
which line ‘fits best’?
Y
60
40
20
0 X
0 20 40 60
7
Thinking Challenge
• How would you draw a line through the points? How do you determine
which line ‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
8
Thinking Challenge
• How would you draw a line through the points? How do you determine
which line ‘fits best’?
Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept changed
9
Least Squares
ˆ
n n
ˆ 2
Y Y
i i
2
i
i 1 i 1
10
Least square estimators
𝑌
𝑥𝑦 𝑌=
𝑏1 = 𝑛
𝑥2 𝑋=
𝑋
𝑛
𝑏0 = 𝑌 − 𝑏1 𝑋 𝑥2 = 𝑋−𝑋 2
𝑦2 = 𝑌−𝑌 2
𝑥𝑦 = 𝑋−𝑋 𝑌−𝑌
11
Example
A study was made by a retail merchant to determine the relation between
weekly advertising expenditures (X) and sales (Y). Estimate regression line to
predict weekly sales from advertising expenditures and interpret it. Predict the
Sales for a weekly expenditures 29. Also test the Significance of the model and
find R2 and interpret it.
Y X
385 20
400 15 𝑛=4
395 25 𝑌
𝑌= = 405
440 40 𝑛
𝑋
𝑋= = 25
1620 100 𝑛
12
Example - Computations
• Computation of Simple Linear Regression equation
Y X
385 20
400 15
395 25
440 40
1620 100
13
Example - Computations
• Computation of Simple Linear Regression equation
Y X x
385 20 -5
400 15 -10
395 25 0
440 40 15
1620 100 0
14
Example - Computations
• Computation of Simple Linear Regression equation
Y X x y
385 20 -5 -20
400 15 -10 -5
395 25 0 -10
440 40 15 35
1620 100 0 0
15
Example - Computations
• Computation of Simple Linear Regression equation
Y X x y xy
385 20 -5 -20 100
400 15 -10 -5 50
395 25 0 -10 0
440 40 15 35 525
1620 100 0 0 675
16
Example - Computations
• Computation of Simple Linear Regression equation
Y X x y xy x2
385 20 -5 -20 100 25
400 15 -10 -5 50 100
395 25 0 -10 0 0
440 40 15 35 525 225
1620 100 0 0 675 350
17
Example - Computations
• Computation of Simple Linear Regression equation
Y X x y xy x2 y2
385 20 -5 -20 100 25 400
400 15 -10 -5 50 100 25
395 25 0 -10 0 0 100
440 40 15 35 525 225 1225
1620 100 0 0 675 350 1750
18
Estimated Linear regression between Sales and
Advertising Expenditures
𝑥𝑦
𝑏1 = 2
= 1.93
𝑥
𝒀 = 356.75 + 1.93 𝐗
𝑏0 = 𝑌 − 𝑏1 𝑋 = 356.75
𝐻0 : 𝜷𝟏 = 0 𝑣𝑠 𝐻1 : 𝜷𝟏 ≠ 0 𝑡 ≤ −𝑡𝛼,𝑑𝑓 𝑜𝑟 𝑡 ≥ 𝑡𝛼,𝑑𝑓
2 2
𝐻0 : 𝜷𝟏 ≤ 0 𝑣𝑠 𝐻1 : 𝜷𝟏 > 0 𝑡 ≥ 𝑡𝛼 , (𝑑𝑓)
𝐻0 : 𝜷𝟏 ≥ 0 𝑣𝑠 𝐻1 : 𝜷𝟏 < 0 𝑡 ≤ −𝑡𝛼 , (𝑑𝑓)
Step-2:- Level of Significance
Step-6:- Results
𝛼 = 0.05
Step-3:- Test Statistic
𝒃𝟏 − 𝜷𝟏 𝑺𝟐 𝒆
𝑡= 𝑤ℎ𝑒𝑟𝑒 𝑺𝑬 𝒃𝟏 =
𝑺𝑬(𝒃𝟏 ) 𝒙𝟐
Step-4:- Calculations
21
Calculation of Residual Mean Square 𝑺 𝟐 or 𝑺𝟐
𝒀.𝑿 𝒆
Y X 𝒀 𝒀 − 𝒀 (Y - 𝒀) 2
385 20 𝒙𝒚 𝟐
395.35 -10.35 107.12 𝒚𝟐 −
𝒙𝟐
400 15 385.70 14.30 204.49 𝑺𝟐 𝒆 =
𝒏−𝟐
395 25 405.00 -10.00 100.00
440 40 433.95 6.05 36.60
1620 100 0 448.22 𝑺𝟐 𝒆 = 𝟐𝟐𝟒. 𝟏𝟏
𝟐
𝒀−𝑌
𝑺𝟐 𝒆 = = 𝟐𝟐𝟒. 𝟏𝟏
𝒏−𝟐
22
Test of Hypothesis for 𝛽1
Test statistic 𝒃𝟏 − 𝜷𝟏 𝟏. 𝟗𝟑 − 𝟎
𝑡= = = 𝟐. 𝟒𝟏
𝑺𝑬(𝒃𝟏 ) 𝟐𝟐𝟒. 𝟏𝟏
𝟑𝟓𝟎
Table value
𝑡𝛼 ,(𝑛−2) = 4.303
2
Alpha
d.f. 0.250 0.100 0.050 0.025 0.0125 0.005
1 1.000 3.078 6.314 12.706 31.821 63.657
2 0.816 1.886 2.920 4.303 6.965 9.925
3 0.765 1.638 2.353 3.182 4.541 5.841
4 0.741 1.533 2.132 2.776 3.747 4.604
5 0.727 1.476 2.015 2.571 3.365 4.032
6 0.718 1.440 1.943 2.447 3.143 3.707
7 0.711 1.415 1.895 2.365 2.998 3.499
8 0.706 1.397 1.860 2.306 2.896 3.355
9 0.703 1.383 1.833 2.262 2.821 3.250
10 0.700 1.372 1.812 2.228 2.764 3.169
11 0.697 1.363 1.796 2.201 2.718 3.106
12 0.695 1.356 1.782 2.179 2.681 3.055
13 0.694 1.350 1.771 2.160 2.650 3.012
14 0.692 1.345 1.761 2.145 2.624 2.977
15 0.691 1.341 1.753 2.131 2.602 2.947
16 0.690 1.337 1.746 2.120 2.583 2.921
17 0.689 1.333 1.740 2.110 2.567 2.898
24
18 0.688 1.330 1.734 2.101 2.552 2.878
Confidence Interval for 𝜷𝟏
𝑏1 ± 𝑡𝛼 ,(𝑛−2) 𝑆𝐸 (𝑏1 )
2
• b1 is the estimate of 𝜷𝟏
• SE(b1), already computed
• t is the table value
• There will be two limits of the confidence interval,
the lower limit and the upper limit
25
Confidence Interval for 𝜷𝟏
𝑏1 ± 𝑡𝛼 ,(𝑛−2) 𝑆𝐸 (𝑏1 )
2
𝟐𝟐𝟒. 𝟏𝟏
𝟏. 𝟗𝟑 ± 𝟒. 𝟑𝟎𝟑
𝟑𝟓𝟎
−𝟏. 𝟓𝟏 , 𝟓. 𝟑𝟕
26
Test of Hypothesis for 𝜷𝟎
Step-1:- Construction of Hypothesis Step-5:- Decision Rule, Reject H0 if
• b0 is the estimate of 𝜷𝟎
• SE(b0), already computed
• t is the table value
• There will be two limits of the confidence interval,
the lower limit and the upper limit
28
ANOVA
• Partition of total variation in Response Variable Y into two components i.e.,
explained (due to Regression) and the unexplained (Residual) variation. Explained
variation is the variation due to regression i.e., variation due to X and the
unexplained variation is the variation due to uncontrolled factors other than X.
𝑇𝑆𝑆 = 𝑦2
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑆𝑆 = 𝑏1 × 𝑥𝑦
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑆𝑆 = 𝑏1 × 𝑥 𝑦 = 1301.79
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝑅2 = × 100
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝟐
1301.79
𝑹 = × 100 = 74.39%
1750
• The Advertisement Expenditures (X) explains 74.39% of the variation in Sales (Y)
and the rest is due to some unknown factors.
31
Example
Following data shows the Revenue of six firms in (000)$ along with the
expenditures on Research & Development (000)$.
Revenue in (000) $ Y 31 40 30 34 25 20
Expenditure on R & D in (000) $ X 5 11 4 5 3 2
• Draw a scatter plot and assess the relationship between Y and X.
• Fit simple linear regression equation and interpret the parameters.
• Test the hypothesis that there is no linear relation between Y and X i.e., β1=0. Also
compute 95% Confidence Interval for β1.
• Test the hypothesis that β0 > 15. Compute 95% Confidence Interval for β0.
• Perform analysis of variance (ANOVA) and test the significance of the regression
model. Calculate coefficient of determination and interpret it.
• Test the hypothesis that the mean revenue for firm at X=9 is greater than 30 i.e.,
𝜇𝑌 𝑋=9 > 30. Also construct 95% Confidence Interval.
32
ANOVA
• Partition of total variation in Response Variable Y into two components i.e.,
explained (due to Regression) and the unexplained (Residual) variation. Explained
variation is the variation due to regression i.e., variation due to X and the
unexplained variation is the variation due to uncontrolled factors other than X.
𝑇𝑆𝑆 = 𝑦2
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑆𝑆 = 𝑏1 × 𝑥𝑦
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑆𝑆 = 𝑏1 × 𝑥 𝑦 = 200
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝑅2 = × 100
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝟐
200
𝑹 = × 100 = 82.64
242
• The R& D expenditures (X) explains 82.64% of the variation in Revenue (Y) and the
rest is due to some unknown factors.
35
Test of hypothesis for
𝑌𝑥 − 𝜇𝑦 1 𝑋0 − 𝑋 2
𝑥
𝑡= , 𝑤ℎ𝑒𝑟𝑒 SE 𝑌𝑥 = 𝑺𝟐 𝒀.𝑿 +
𝑆𝐸 𝑌𝑥 𝑛 𝒙𝟐
36
Thanks
37