Simple Linear Regressionclassroom

Intended Learning Outcomes
At the end of this module, it is expected that the students will be able to:
1. Construct empirical models using simple linear regression.

2. Estimate the parameters in a linear regression model using the Least-
Square Approach.
3. Test hypothesis on simple linear regression.
4. Predict future observation using the regression model.
5. Determine the adequacy of the regression model using residual analysis
and coefficient determination.
6. Apply the correlation model.
Empirical Models
Empirical Models
Analysis of the relationship between variables
pressure and temperature of a gas in a container

velocity and the area of the channel
displacement and velocity
related to each other
fuel usage of a car (y) and its weight (x)

electrical energy consumption of a house (y) and the size of the house in sq ft (x)
related but relationships are not deterministic

Regression Analysis
▪ Collection of statistical tools that are used to model and explore relationships between variables that
are of nondeterministic relationship
▪ Deals in finding the best relationship between y and x, quantifying the strength of that relationship
and using the methods that allow the prediction of the response values of the regressor x.
y
dependent variable
x
independent variable
Simple Linear Regression
▪ Has only one dependent or response variable (y) and one independent,
regressor or predictor variable (x)
𝑦 = 𝑎 + 𝑏𝑥 + 𝜀
𝑦 = 𝑎 + 𝑏𝑥 + 𝜀
Where:
y – dependent variable
x – independent variable
a – regression coefficient, y intercept, constant
b - regression coefficient slope of the regression line
𝜀 – error which we are trying to minimize
▪ Estimates of a and b should result to a line that is the “best fit” to the given
data.
Method of Least Squares
𝑛 𝑛 2
2
σ 𝑖 𝑥𝑖
𝑆𝑆𝑥𝑦 𝑆𝑆𝑥𝑥 = ෍ 𝑥𝑖 −
𝑏෠ = 𝑛
𝑆𝑆𝑥𝑥 𝑖
𝑛
σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
𝑆𝑆𝑥𝑦 = ෍ 𝑥𝑖 𝑦𝑖 −
𝑎ො = 𝑦ത − 𝑏෠ 𝑥ҧ 𝑛
𝑖
σ𝑛𝑖 𝑦𝑖 𝑦ො = 𝑎ො + 𝑏෠ 𝑥ҧ
𝑦ത =
𝑛 ෠ 𝑖 + 𝑒𝑖
𝑦𝑖 = 𝑎ො + 𝑏𝑥
σ𝑛𝑖 𝑥𝑖
𝑥ҧ = 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖
𝑛
Example
𝑦ത = 𝑎ො + 𝑏෠ 𝑥ҧ
𝑛 𝑛 2
2
σ𝑖 𝑥𝑖
𝑆𝑆𝑥𝑥 = ෍ 𝑥𝑖 −
𝑛
𝑖
23.922
𝑆𝑆𝑥𝑥 = 29.2892 −
20
𝑆𝑆𝑥𝑥 = 0.68088
𝑛
σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
𝑆𝑆𝑥𝑦 = ෍ 𝑥𝑖 𝑦𝑖 −
𝑛
𝑖
23.92 ∗ 1843.21
𝑆𝑆𝑥𝑦 = 2214.6566 −
20
𝑆𝑆𝑥𝑦 = 10.17744
Example
𝑆𝑆𝑥𝑥 = 0.68088
𝑆𝑆𝑥𝑦 = 10.17744
෠
𝑦ො = 𝑎ො + 𝑏x
𝑆𝑆𝑥𝑦
𝑏෠ = 𝑦ො = 74.28 + 14.95x
𝑆𝑆𝑥𝑥
10.17744
෠
𝑏=
0.68088
𝑏෠ = 14.94748
𝑎ො = 𝑦ത − 𝑏෠ 𝑥ҧ
𝑎ො = 92.1605 − 14.94748 ∗ 1.1960
𝑎ො = 74.28331
Example
𝑛 𝑛
𝑆𝑆𝐸 = ෍ 𝑒𝑖2 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2
𝑖=1 𝑖=1
𝑆𝑆𝐸 = 21.2498
𝑆𝑆𝐸
𝜎ො 2 =
𝑛−2
𝜎ො 2 = 1.1805
Correlation: Estimating the Strength of
Linear Relation
Correlation
▪ Statistical method used to determine if there is a relationship between

variables and the strength of the relationship
▪ correlation coefficient – measures how closely the points in a scatter

diagram are spread around a line
▪ r – symbol for sample correlation coefficient

▪ ρ – symbol for the population coefficient
Correlation
High positive correlation Low positive correlation it

between set of point reaches zero
correlation coefficient points on the line but

will be +1 because all notice the line is going
the points are on a line down and so the
and the line has a correlation coefficient will
positive slope. As x be equal to -1. As x
increases y increases increases y decreases -
there is a direct inverse
Relationship No correlation.
relationship between x r can be very close to zero
and y
Correlation
𝑆𝑆𝑥𝑦
𝑟=
𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
σ𝑥 2
2
𝑆𝑆𝑥𝑥 = ෍𝑥 −
n
σ𝑦 2
𝑆𝑆𝑦𝑦 2
= ෍𝑦 −
n
σ𝑥 σ𝑦
𝑆𝑆𝑥𝑦 = ෍ 𝑥𝑦 −
n
Example
𝑆𝑆𝑥𝑦
𝑟= = 0.9367
𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦
σ𝑥 2
2 = 0.68088
𝑆𝑆𝑥𝑥 = ෍𝑥 −
n
σ𝑦 2
𝑆𝑆𝑦𝑦 2
= ෍𝑦 − = 173.3769
n
σ𝑥 σ𝑦
𝑆𝑆𝑥𝑦 = ෍ 𝑥𝑦 − = 10.17744
n
Hypothesis Tests in Simple Linear
Regression
Use of t-tests
• Hypothesis
𝐻𝑜 : 𝑎 =0
𝐻𝑜 : 𝑏 = 0 𝐻1 : 𝑎 ≠0
𝐻1 : 𝑏 ≠ 0
• Testing Approaches: Critical Value and P Value 𝑎ො − 𝑎

𝑡𝑜 =
1 𝑥ҧ 2
𝑏෠ − 𝑏 df = n-2 𝜎ො 2 +
𝑛 𝑆𝑆𝑥𝑥
𝑡𝑜 =
𝜎ො 2 /𝑆𝑆𝑥𝑥
𝑏෠ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 slope coefficient
𝑏 = hypothesized slope (usually b=0)
𝜎ො 2 = estimated variance
𝑆𝑆𝑥𝑥 = sum of squares of x
𝑎 = ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑦 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝑎ො = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 y intercept coefficient
Use of t-tests
Use sample data,
𝐻𝑜 : 𝑎 = 0 No linear relationship; not significant
𝐻1 : 𝑎 ≠ 0 There is a linear relationship: significant
Degrees of freedom = n-2 = 18 Conclusion: Since 46.62
Critical t = ±2.101 is greater than CV 2.101,
reject 𝐻𝑜 and conclude
that the population y
intercept coefficient is
significant, there is a
74.28 − 0 linear relationship
𝑎ො − 𝑎𝑜 𝑡𝑜 =
𝑡𝑜 = 1 1.1962
1 𝑥ҧ 2 1.18 +
20 0.68
𝜎ො 2 𝑛 + 𝑆𝑆𝑥𝑥 𝑡𝑜 =46.62
Use of t-tests
Use sample data, given a level of significance of 0.05.
𝐻𝑜 : 𝑏 = 0 No linear relationship; not significant
𝐻1 : 𝑏 ≠ 0 There is a linear relationship: significant
Degrees of freedom = n-2 = 18 Conclusion: Since 11.35
Critical t = ±2.101 is greater than CV 2.101,
reject 𝐻𝑜 and conclude
that the population slope
coefficient is significant,
there is a linear
relationship
𝑏෠ − 𝑏 14.94748 − 0
𝑡𝑜 = 𝑡𝑜 =
𝜎ො 2 /𝑆𝑆𝑥𝑥 1.18/0.68
𝑡𝑜 =11.35
Significance of Regression
▪ Failure to reject the null hypothesis is equivalent to concluding that there is

no linear relationship between the dependent and the independent variables
or that the true relationship between the two variables is not linear.
▪ If the null hypothesis is rejected, it could mean that the straight-line model is
adequate or there is a linear effect of the independent variable.
Sums of Squares: Measures of Variation
SSR
SSE
SST
Coefficient Determination
▪ R²
▪ Often used to judge the adequacy of a regression model

▪ Square of the correlation coefficient between jointly distributed random
variables X and Y and has a value of 0 ≤ R² ≤ 1
▪ Often referred to as the amount of variability in the data explained or
accounted for by the regression model
▪ It does not measure the magnitude of the slope of the regression line
▪ A large value of R² does not imply a steep slope
▪ Does not measure the appropriateness of the model because it can
artificially inflated by adding higher-order polynomial terms to the model
▪ A large R² does not necessarily imply that the regression model will provide
accurate predictions of future observations
Approximately 87.7% of the

152.127 variation in purity of oxygen can
R²= 173.377 = 0.877
be explained by the percentage of
hydrocarbon
SSR
SSE
SST
Correlation
▪ Correlation analysis attempts to measure the strength of such relationships

between two variables by means of a singe number called a correlation
coefficient.
▪ Correlation coefficient measures how closely the points in a scatter diagram

are spread around a line.
▪ +1 implies a perfect linear relationship with a positive slope

▪ -1 implies a perfect linear relationship with a negative slope
▪ 0 indicates no correlation
Analysis of Variance Approach to Test Significance of Regression
▪ Analysis-of-Variance (ANOVA) approach is used in analyzing the quality of the

estimated regression line.
▪ A procedure where the total variation in the dependent variable is subdivided

into meaningful components that are then observed and treated
systematically
𝑆𝑆𝑅/1 𝑆𝑆𝑅
𝑓= = 2
𝑆𝑆𝐸/(𝑛 − 2) 𝑠
𝑓 > 𝑓𝑎 (1, 𝑛 − 2)
Sums of Square: Measures of Variation
Analysis of Variance Approach to Test Significance of Regression
Steps
1. Parameter of Interest is b or slope
2. 𝐻𝑜 : 𝑏 = 0 no linear relationship
3. 𝐻1 : 𝑏 ≠ 0 𝟏𝟓𝟐.𝟏𝟐𝟕𝟏
4. 𝛼 = 0.05 F=
𝟏.𝟏𝟖𝟎𝟓
5.Test Statistic = SSR/s2
6. Rejection region – F >F0.05,1, 18 F = 128.8617
7. Computation
8. Conclusion
SSR
SSE
SST
Since F = 128.86 >

F0.05,1,18= 4.414,
18
F=
𝟏𝟓𝟐.𝟏𝟐𝟕𝟏 Reject the Ho
𝟏.𝟏𝟖𝟎𝟓
Since p value =
F = 128.8617
0.000<∝=0.05,
reject null and
conclude the slope of
coefficient is greater
than zero.
18
𝟏𝟓𝟐.𝟏𝟐𝟕𝟏
F= 𝟏.𝟏𝟖𝟎𝟓
19
F = 128.8617
Adequacy of the Regression Model
Assumptions required to fit a regression model

• Errors are uncorrelated random variables with mean zero and constant
variance.
• Errors should be normally distributed.
• Order of the model is correct.
Residual Analysis
▪ frequently helpful to check the assumption that the errors are approximately
normally distributed with constant variance and to determine whether
additional terms in the model would be useful.
▪ A frequency histogram of the residuals or a normal probability plot of

residuals can be constructed and be used to approximately check the
normality.
Example
Example

Simple Linear Regressionclassroom

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Linear Regressionclassroom

Uploaded by

Copyright:

Available Formats

Intended Learning Outcomes

1. Construct empirical models using simple linear regression.

Analysis of the relationship between variables

pressure and temperature of a gas in a container

related to each other

fuel usage of a car (y) and its weight (x)

related but relationships are not deterministic

𝑆𝑆𝐸 = ෍ 𝑒𝑖2 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2

▪ Statistical method used to determine if there is a relationship between

▪ correlation coefficient – measures how closely the points in a scatter

▪ r – symbol for sample correlation coefficient

High positive correlation Low positive correlation it

correlation coefficient points on the line but

• Testing Approaches: Critical Value and P Value 𝑎ො − 𝑎

▪ Failure to reject the null hypothesis is equivalent to concluding that there is

▪ Often used to judge the adequacy of a regression model

Approximately 87.7% of the

▪ Correlation analysis attempts to measure the strength of such relationships

▪ Correlation coefficient measures how closely the points in a scatter diagram

▪ +1 implies a perfect linear relationship with a positive slope

▪ Analysis-of-Variance (ANOVA) approach is used in analyzing the quality of the

▪ A procedure where the total variation in the dependent variable is subdivided

Since F = 128.86 >

Assumptions required to fit a regression model

▪ A frequency histogram of the residuals or a normal probability plot of

You might also like