Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

Chapter 14

Simple Linear Regression


Regression vs. ANOVA
 ANOVA
◼ Independent variables are discrete
◼ Dependent variables are continuous
 Regression:
◼ Independent variables are continuous
◼ Dependent variables are continuous
Example
 Armand’s Pizza Parlor are mostly located near college
campuses. In the following data, quarterly sales appear to
be higher at campuses of larger student populations.

Branch 1 2 3 4 5 6 7 8 9 10

Student 2 6 8 8 12 16 20 20 22 26
Population
(1000s)
Quarterly 58 105 88 118 117 137 157 169 149 202
Sales($1000s)
Scatter plot
First-Order Linear Model
 First-Order Linear Model is also called simple linear
regression.
 There is only one independent variable and we assume
it to be linearly correlated with the dependent variable.
 The model is

𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Intercept Slope Random term
First-Order Linear Model
 First-Order Linear Model is also called simple linear
regression.
 There is only one independent variable and we assume
it to be linearly correlated with the dependent variable.
 The model is

𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Mean model
which means 𝐸 𝑌 𝑥 = 𝛽0 + 𝛽1 𝑥
 Note that “linear model” is linear in parameters β0
and β1.
If the relationship between x and y seems to be
quadratic or any other patterns, we can still build
a linear model as

Y = 𝛽0 + 𝛽1 𝑥 2 + 𝜀 or
Y = 𝛽0 + 𝛽1 𝑒 𝑥 + 𝜀, etc.
 Again, we would like to estimate the population
mean β0+β1x with samples.
 We use least squares as the criterion and
calculate the numbers b0, b1 that can minimize
the following function:
𝐹(𝑏0 , 𝑏1 ) = σ𝑛𝑖=1[𝑦𝑖 − (𝑏0 + 𝑏1 𝑥𝑖 )]2
𝛽0 and 𝛽1 are parameters in the model.
𝑏0 and 𝑏1 are the estimators.
Estimating the coefficients
 Least square estimation 𝑦ො𝑖 = 𝑏0 + 𝑏1 𝑥𝑖
Least Square Line Coefficients

σ𝑛 lj
𝑖=1(𝑥𝑖 −𝑥)(𝑦 lj
𝑖 −𝑦) σ𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 −𝑛𝑥lj 𝑦lj
𝑏1 = σ𝑛 lj 2 = σ𝑛 2 −𝑛𝑥lj 2
𝑖=1(𝑥𝑖 − 𝑥) 𝑥
𝑖=1 𝑖
𝑠𝑥𝑦 ×(𝑛−1) 𝑠𝑥𝑦
= =
𝑠𝑥2 ×(𝑛−1) 𝑠𝑥2

𝑏0 = 𝑦lj − 𝑏1 𝑥lj
𝑌𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖

𝑦ො𝑖 = 𝑏0 + 𝑏1 𝑥𝑖 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖

 εi stands for the difference of Yi and population


mean.

 ei stands for the difference of yi and estimated


mean 𝑦ො𝑖 . They are called residuals.
Branch 1 2 3 4 5 6 7 8 9 10 Sum

Student
Population (𝑥𝑖 ) 2 6 8 8 12 16 20 20 22 26 140
(1000s)
Quarterly
58 105 88 118 117 137 157 169 149 202 1300
Sales($1000s) ( 𝑦𝑖 )

𝑥𝑖 𝑦𝑖 116 630 704 944 1404 2192 3140 3380 3278 5252 21040

𝑥𝑖2 4 36 64 64 144 256 400 400 484 676 2528

𝑥ҧ = 14, 𝑦ത = 130, ෍ 𝑥𝑖 𝑦𝑖 = 21040, ෍ 𝑥𝑖2 = 2528

⇒ (𝑛 − 1)𝑠𝑥𝑦 = σ 𝑥𝑖 𝑦𝑖 − 𝑛𝑥ҧ 𝑦ത = 2840


⇒ (𝑛 − 1) 𝑠𝑥2 = σ 𝑥𝑖 2 − 𝑛𝑥ҧ 2 = 568
2840
𝑏1 = 568 = 5, 𝑏0 = 130 − 5 14 = 60
Estimated values
 b0=60, b1 =5
Student
Population (𝑥𝑖 ) 2 6 8 8 12 16 20 20 22 26
(1000s)
Quarterly
58 105 88 118 117 137 157 169 149 202
Sales($1000s) (𝑦𝑖 )
𝑦ෝ𝑖 70 90 100 100 120 140 160 160 170 190
𝑟𝑖 = 𝑦𝑖 − 𝑦ෝ𝑖 -12 15 -12 18 -3 -3 -3 9 -21 12

In average, each unit (1000) of population increase will bring


in an average of 5 units ($1000) of the quarterly sales.
The base level $60,000 is actually meaningless in this case.
Sum of Squares Due to Errors (SSE)
 The variation due to random error is also
estimated with sum of squares for error (SSE)
and is defined as:
𝑛

𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2


𝑖=1
 SSE is very important and we will make
hypothesis testing based on it.
Sum of Squares
 Total sum of squares is defined as
𝑛

𝑆𝑆𝑇 = ෍(𝑦𝑖 − 𝑦ҧ)2


𝑖=1
 Sum of squares due to regression is defined as
𝑛

𝑆𝑆𝑅 = ෍(𝑦ෝ𝑖 − 𝑦ത )2
𝑖=1
 𝑆𝑆𝑇 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸
Coefficient of Determination
 We often use Coefficient of Determination to
assess how well the model fits the data.
𝑆𝑆𝑅 𝑆𝑆𝐸 𝑠𝑥𝑦 2
𝑅2 = =1− = = 𝑟2
𝑆𝑆𝑇 𝑆𝑆𝑇 𝑠𝑥 2 𝑠𝑦 2
variation of 𝑦 explained by the linear relation in 𝑥
=
total variation of 𝑦

 If we know that b1 is negative and 𝑅2 is 0.81, what


is the correlation coefficient between x and y?
Distribution of Y given x

𝐸(𝑌|𝑥) = 𝛽0 + 𝛽1 𝑥 and 𝜀 ~ 𝑁 (0, 𝜎 2 )


⇒ 𝑌|𝑥 ~ 𝑁 (𝛽0 + 𝛽1 𝑥, 𝜎 2 )
Required conditions for ε
 It is normally distributed.
 Its mean is zero.
 Its variance is constant.
 The ε's associated with different observations
are independent to each other.
Estimation of σ2
𝑛

𝑆𝑆𝐸 = ෍(𝑦𝑖 − 𝑦ො𝑖 )2


𝑖=1
2
𝑠𝑥𝑦
= (𝑛 − 1)(𝑠𝑦2 − 2 )
𝑠𝑥
𝑠𝑥2 = ෍(𝑥𝑖 − 𝑥)
lj 2 /(𝑛 − 1) , 𝑠𝑦2 = ෍(𝑦𝑗 − 𝑦)
lj 2 /(𝑛 − 1)

𝑠𝑥𝑦 = ෍(𝑥𝑖 − 𝑥)(𝑦


lj 𝑖 − 𝑦)/(𝑛
lj − 1)

𝑆𝑆𝐸
The estimate of 𝜎 2 is s𝜀2 =
(𝑛 − 2)
Branch 1 2 3 4 5 6 7 8 9 10 Sum

Student
Population (𝑥𝑖 ) 2 6 8 8 12 16 20 20 22 26 140
(1000s)
Quarterly
58 105 88 118 117 137 157 169 149 202 1300
Sales($1000s) (𝑦𝑖 )

𝑥𝑖 𝑦𝑖 116 630 704 944 1404 2192 3140 3380 3278 5252 21040

𝑥𝑖2 4 36 64 64 144 256 400 400 484 676 2528

𝑦𝑖2 3364 11025 7744 13924 13689 18769 24649 28561 22201 40804 184730

𝑥ҧ = 14, 𝑦ത = 130, σ 𝑥𝑖 𝑦𝑖 = 21040, σ 𝑥𝑖2 = 2528


𝑠𝑥𝑦 = (σ 𝑥𝑖 𝑦𝑖 − 𝑛𝑥ҧ 𝑦)/(𝑛
ത − 1) = 315.56
𝑠𝑥2 = (σ 𝑥𝑖 2 − 𝑛𝑥ҧ 2 )/(𝑛 − 1) = 63.11
σ 𝑦𝑖 2 −𝑛𝑦ത 2
𝑠𝑦2 = = 1747.78
𝑛−1
2
𝑠𝑥𝑦
𝑆𝑆𝐸 = 𝑛 − 1 𝑠𝑦2 − = 1530
𝑠𝑥2
Hypothesis Testing for β1
 H0: β1=0; H1:β1≠0
Under the H0: the model is 𝑌 = 𝛽0 + 𝜀.
Hypothesis Testing for β1
 H0: β1=0; H1:β1≠0
Under the H1: the model is 𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀.
 In ANOVA, we contrast the variation of yi across
groups with variation within groups.
 In regression, we also contrast variation across
different xi values with variation within xi values.

𝑦ො𝑖
Between group
variation: The
𝑦lj
variation of 𝑦ො𝑖 ′s
and it is the
summation of
lj 2.
(𝑦ො𝑖 − 𝑦)

Note: The average of 𝑦ො𝑖 is 𝑦.lj


 In ANOVA, we contrast the variation of yi across
groups with variation within groups.
 In regression, we also contrast variation across
different xi values with variation within xi values.

Within group
variation: The
summation of
(𝑦𝑖 − 𝑦ො𝑖 )2 .
Test with ANOVA Table

lj 2
𝑆𝑆𝑅 = ෍(𝑦ො𝑖 − 𝑦)

lj 2
𝑆𝑆𝑇 = ෍(𝑦𝑖 − 𝑦)

𝑆𝑆𝑅/1 under 𝐻0
Test Statistic 𝐹 = ~ 𝐹(1, 𝑛 − 2)
𝑆𝑆𝐸/(𝑛 − 2)
If 𝐹 > 𝐹1,𝑛−2,𝛼 , we will reject the null hypothesis
Distribution of b1

σ𝑛 ҧ
𝑖=1(𝑥𝑖 −𝑥)(𝑌

𝑖 −𝑌) 𝜎2
 b1= σ𝑛 ҧ 2
~𝑁(𝛽1 , σ𝑛 ҧ 2)
𝑖=1(𝑥𝑖 −𝑥) 𝑥
𝑖=1 𝑖 −𝑥

𝑏1 − 𝛽1
𝑇= ~ 𝑡 (𝑛 − 2)
𝑠𝜀2 /(𝑛 − 1)𝑠𝑥2
Test with t-statistic
𝑏1
Under 𝐻0 : 𝛽1 = 0, 𝑇 = ~ 𝑡 (𝑛 − 2)
𝑠𝜀2 /(𝑛
− 1)𝑠𝑥2
We will reject the null hypothesis with |𝑇|>t n−2,𝛼 /2 for 𝐻1 : 𝛽1 ≠ 0
(This is equivalent to ANOVA table)
We will reject the null hypothesis with T>t n−2,𝛼 for 𝐻1 : 𝛽1 > 0
We will reject the null hypothesis with T<−t n−2,𝛼 for 𝐻1 : 𝛽1 < 0

𝑏1 − 𝑐
We can also test 𝐻0 : 𝛽1 = 𝑐 with 𝑇 = ~ 𝑡 (𝑛 − 2)
𝑠𝜀2 /(𝑛 − 1)𝑠𝑥2
Hypothesis Testing for β1

Branch 1 2 3 4 5 6 7 8 9 10 Sum

Student
Population (𝑥𝑖 ) 2 6 8 8 12 16 20 20 22 26 140
(1000s)
Quarterly
58 105 88 118 117 137 157 169 149 202 1300
Sales($1000s) (𝑦𝑖 )

𝑦ෝ𝑖 70 90 100 100 120 140 160 160 170 190 1300

ത 2
(𝑦ෝ𝑖 − 𝑦) 3600 1600 900 900 100 100 900 900 1600 3600 14200

𝑆𝑆𝑅 = 14200
𝑆𝑆𝐸
𝑆𝜀2 = 𝑛−2 = 191.25
 Test H0: β1=0; H1:β1≠0 with ANOVA:
Source df SS MS F P-value
Regression 1 14200 14200 74.25 .000
Error 10-2=8 1530 191.25
Total 10-1=9 15730

F>F1,8,0.05 ➔reject H0
The linear relationship between x and y is significant.
𝑅2 = 0.903
 Test H0: β1=0; H1:β1≠0 with t-statistic:
𝑏1 5
T= = = 8.62
191.25/(9×63.11)
𝑆𝜖 2 /(𝑛−1)𝑆𝑥 2

T > 𝑡8,0.025 = 2.306, ∴ reject 𝐻0


Note that T 2 = 𝐹.
Observational Data
 Observational data can only result in conclusions of
association.
◼ The student population is attached to the restaurant chosen
and is not designed.
◼ The sales might be a direct result of the locations. Schools in
the big cities tend to have larger student populations. If a
small school also locates at a big city, the attached restaurant
might also get large sales.
 In the example, the restaurants assessed are randomly
chosen and we do not know how large the student
population would be before we choose the restaurants.
Hence x is a random variable as well as y.
Experimental Data
 Experimental data can infer causal effect
◼ If we can assign student populations to the
campuses (maybe at the same location, and this is
very difficult though…) and put restaurants there to
study their sales, the data collected are designed
since the population is not attached to the schools
chosen.
◼ The conclusion can infer causal effect.
Coefficient of Correlation

For a pair of random variables (X, Y ), we have defined


the covariance and correlation as follows.
σ𝑁
𝑖=1(𝑥𝑖 −𝜇𝑥 )(𝑦𝑖 −𝜇𝑦 )
𝐶𝑜𝑣(𝑋, 𝑌) =
𝑁
= σall 𝑥 σall 𝑦 (𝑥 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 )𝑃(𝑥, 𝑦)
𝐶𝑜𝑣(𝑋, 𝑌)
𝜌𝑋𝑌 =
𝜎𝑋 𝜎𝑌
The population correlation 𝜌𝑋𝑌 is estimated with the
𝑠
sample correlation 𝑟𝑋𝑌 = 𝑋𝑌 .
𝑠𝑋 𝑠𝑌
R2 vs. Sample Coefficient of Correlation
2
𝑠𝑥𝑦
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅2 = =1− = = 𝑟 2 where r is used to
𝑆𝑆𝑇 𝑆𝑆𝑇 𝑠𝑥2 𝑠𝑦2
𝜎𝑥𝑦
estimate the coefficient of correlation 𝜌 = .
𝜎𝑥 𝜎𝑦
When (𝑋𝑖 , 𝑌𝑖 ) are assumed to independently follow a
bivariate normal distribution, we can also test 𝜌 with
the same T-statistic.
Confidence interval for 𝛽1
 𝑏1 ± 𝑡𝑛−2,𝛼/2 𝑠𝜀2 /(𝑛 − 1)𝑠𝑥2
Prediction with the regression line
 Point estimation/prediction
 𝑌෠ = 𝑏0 + 𝑏1 𝑥𝑔
 Intervals
◼ Estimate the expected value of Y at a given 𝑥𝑔 .
1 (𝑥𝑔 −𝑥)ҧ 2
 𝑌෠ ± 𝑡𝑛−2,𝛼/2 𝑠𝜀 + (narrower)
𝑛 (𝑛−1)𝑠𝑥2

◼ Predict a particular value of Y at a given 𝑥𝑔 .


1 (𝑥𝑔 −𝑥)ҧ 2
 𝑌෠ ± 𝑡𝑛−2,𝛼/2 𝑠𝜀 1+ +
𝑛 (𝑛−1)𝑠𝑥2
Prediction of bonus
 The point estimation for the expected mean sale of a
restaurant close to a campus with student population
10,000 is 60+5x10=110 (unit:$1000)
 The 95% confidence interval for the expected mean
sale of such a restaurant is 110 ±11.424
 The prediction interval for the sales of such a
restaurant is 110 ±33.875
 The further away the given value xg is to the
center 𝑥,ҧ the larger the estimated error is.

xg
Residual
 Residual
◼ 𝑒𝑖 = 𝑦𝑖 − 𝑦ෝ𝑖
 Standardized residual
𝑒𝑖

𝑠𝜀

𝑒𝑖 1 (𝑥𝑖 −𝑥)ҧ 2
◼ where 𝑠𝑒𝑖 = 𝑠𝜀 1 − −
𝑠𝑒𝑖 𝑛 (𝑛−1)𝑠𝑥2
Regression diagnostics based on residuals
 Normality
 Homoscedasticity: equal variance assumption
 Independence
 Outliers
 Influential points
Normality
Normal Probability Plot (Q-Q plot)
Heteroscedasticity

0 0
Independence
 There are many forms of correlation among
your data. Some can be detected from your
residuals while some cannot.
 Time series data is the most commonly
discussed type of correlated data and can be
detected through the plot of residual vs. time.
Plot of Residuals Versus Time Indicating Autocorrelation
(Alternating)
Plot of Residuals Versus Time Indicating Autocorrelation
(Increasing)
Outlier
𝑦ො
Influential Point
Leverage of Observations

1 (𝑥𝑖 −𝑥)ҧ 2
Leverage of observation i: ℎ𝑖 = + σ(𝑥𝑖 −𝑥)ҧ 2
𝑛
Outliers and Influential Points
 An outlier is an observation that is unusually small or
unusually large.

An outlier An influential observation

+++++++++++
+ +
+ … but, some outliers
+ +
+ +
may be very influential
+
+ + + +
+
+ +
+

The outlier causes a shift


in the regression line
Outliers and Influential Points
 An outlier is an observation that is unusually small or
unusually large.

+ +
+
+ +
+ +
+
+ + + +
+
+ +
+

The outlier causes a shift


in the regression line
Outliers and Influential Points
 An outlier is an observation that is unusually small or
unusually large.

An outlier

+
+ +
+
+ +
+ +
+
+ + + +
+
+ +
+

The outlier causes a shift


in the regression line
Outliers and Influential Points
 An outlier is an observation that is unusually small or
unusually large.

An influential observation

+
+ +
+ … but, some outliers
+ +
+ +
may be very influential
+
+ + + +
+
+ +
+
Procedure for Regression Diagnostics…
1. Develop a model that has a theoretical basis.
2. Gather data for the two variables in the model. Experimental data is
preferred if possible.
3. Draw the scatter plot to determine whether a linear model appears
to be appropriate. Identify possible outliers.
4. Determine the regression equation.
5. Calculate the residuals and check the required conditions.
6. Assess the model’s fit.
7. If the model fits the data, use the regression equation to predict a
particular value of the dependent variable and/or estimate its mean.

You might also like