Professional Documents
Culture Documents
Simple Linear Regressionclassroom
Simple Linear Regressionclassroom
At the end of this module, it is expected that the students will be able to:
▪ Deals in finding the best relationship between y and x, quantifying the strength of that relationship
and using the methods that allow the prediction of the response values of the regressor x.
y
dependent variable
x
independent variable
Simple Linear Regression
▪ Has only one dependent or response variable (y) and one independent,
regressor or predictor variable (x)
𝑦 = 𝑎 + 𝑏𝑥 + 𝜀
𝑦 = 𝑎 + 𝑏𝑥 + 𝜀
Where:
y – dependent variable
x – independent variable
a – regression coefficient, y intercept, constant
b - regression coefficient slope of the regression line
𝜀 – error which we are trying to minimize
▪ Estimates of a and b should result to a line that is the “best fit” to the given
data.
Method of Least Squares
𝑛 𝑛 2
2
σ 𝑖 𝑥𝑖
𝑆𝑆𝑥𝑦 𝑆𝑆𝑥𝑥 = 𝑥𝑖 −
𝑏 = 𝑛
𝑆𝑆𝑥𝑥 𝑖
𝑛
σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
𝑆𝑆𝑥𝑦 = 𝑥𝑖 𝑦𝑖 −
𝑎ො = 𝑦ത − 𝑏 𝑥ҧ 𝑛
𝑖
σ𝑛𝑖 𝑦𝑖 𝑦ො = 𝑎ො + 𝑏 𝑥ҧ
𝑦ത =
𝑛 𝑖 + 𝑒𝑖
𝑦𝑖 = 𝑎ො + 𝑏𝑥
σ𝑛𝑖 𝑥𝑖
𝑥ҧ = 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖
𝑛
Example
𝑦ത = 𝑎ො + 𝑏 𝑥ҧ
𝑛 𝑛 2
2
σ𝑖 𝑥𝑖
𝑆𝑆𝑥𝑥 = 𝑥𝑖 −
𝑛
𝑖
23.922
𝑆𝑆𝑥𝑥 = 29.2892 −
20
𝑆𝑆𝑥𝑥 = 0.68088
𝑛
σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
𝑆𝑆𝑥𝑦 = 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖
23.92 ∗ 1843.21
𝑆𝑆𝑥𝑦 = 2214.6566 −
20
𝑆𝑆𝑥𝑦 = 10.17744
Example
𝑆𝑆𝑥𝑥 = 0.68088
𝑆𝑆𝑥𝑦 = 10.17744
𝑦ො = 𝑎ො + 𝑏x
𝑆𝑆𝑥𝑦
𝑏 = 𝑦ො = 74.28 + 14.95x
𝑆𝑆𝑥𝑥
10.17744
𝑏=
0.68088
𝑏 = 14.94748
𝑎ො = 𝑦ത − 𝑏 𝑥ҧ
𝑎ො = 92.1605 − 14.94748 ∗ 1.1960
𝑎ො = 74.28331
Example
𝑛 𝑛
𝑖=1 𝑖=1
𝑆𝑆𝐸 = 21.2498
𝑆𝑆𝐸
𝜎ො 2 =
𝑛−2
𝜎ො 2 = 1.1805
Correlation: Estimating the Strength of
Linear Relation
Correlation
𝑆𝑆𝑥𝑦
𝑟=
𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
σ𝑥 2
2
𝑆𝑆𝑥𝑥 = 𝑥 −
n
σ𝑦 2
𝑆𝑆𝑦𝑦 2
= 𝑦 −
n
σ𝑥 σ𝑦
𝑆𝑆𝑥𝑦 = 𝑥𝑦 −
n
Example
𝑆𝑆𝑥𝑦
𝑟= = 0.9367
𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
σ𝑥 2
2 = 0.68088
𝑆𝑆𝑥𝑥 = 𝑥 −
n
σ𝑦 2
𝑆𝑆𝑦𝑦 2
= 𝑦 − = 173.3769
n
σ𝑥 σ𝑦
𝑆𝑆𝑥𝑦 = 𝑥𝑦 − = 10.17744
n
Hypothesis Tests in Simple Linear
Regression
Use of t-tests
• Hypothesis
𝐻𝑜 : 𝑎 =0
𝐻𝑜 : 𝑏 = 0 𝐻1 : 𝑎 ≠0
𝐻1 : 𝑏 ≠ 0
▪ If the null hypothesis is rejected, it could mean that the straight-line model is
adequate or there is a linear effect of the independent variable.
Sums of Squares: Measures of Variation
SSR
SSE
SST
Coefficient Determination
▪ R²
▪ It does not measure the magnitude of the slope of the regression line
▪ A large value of R² does not imply a steep slope
▪ Does not measure the appropriateness of the model because it can
artificially inflated by adding higher-order polynomial terms to the model
▪ A large R² does not necessarily imply that the regression model will provide
accurate predictions of future observations
Coefficient Determination
Coefficient Determination
SSR
SSE
SST
Correlation
𝑆𝑆𝑅/1 𝑆𝑆𝑅
𝑓= = 2
𝑆𝑆𝐸/(𝑛 − 2) 𝑠
𝑓 > 𝑓𝑎 (1, 𝑛 − 2)
Sums of Square: Measures of Variation
Analysis of Variance Approach to Test Significance of Regression
Steps
1. Parameter of Interest is b or slope
2. 𝐻𝑜 : 𝑏 = 0 no linear relationship
3. 𝐻1 : 𝑏 ≠ 0 𝟏𝟓𝟐.𝟏𝟐𝟕𝟏
4. 𝛼 = 0.05 F=
𝟏.𝟏𝟖𝟎𝟓
5.Test Statistic = SSR/s2
6. Rejection region – F >F0.05,1, 18 F = 128.8617
7. Computation
8. Conclusion
SSR
SSE
SST
Sums of Square: Measures of Variation
18
𝟏𝟓𝟐.𝟏𝟐𝟕𝟏
F= 𝟏.𝟏𝟖𝟎𝟓
19
F = 128.8617
Adequacy of the Regression Model
▪ frequently helpful to check the assumption that the errors are approximately
normally distributed with constant variance and to determine whether
additional terms in the model would be useful.