Chapter 8 Residual Analysis (Auto-Saved)

CUHK
Chapter 8
Residual Analysis
HU, Qinlu
Email: qinlu.hu@link.cuhk.edu.hk
Date: 2024.03.18
DSME 2021
CV
Name: Qinlu Hu
PhD Candidate, Department of Decisions, Operations and Technology, CUHK
Business School, qinlu.hu@link.cuhk.edu.hk
Major : Management Information Systems
Research interests:
two-sided online platforms
CONTENTS
02 04 06
Detecting Detecting Outliers and
Regression Analysis Unequal Variance Identifying Influential
01 03 05 Observations
07
Introduction Detecting Check the Normality Detecting Residual

Lack of fit Assumption Correlations: The Durbin-
Watson Test
8.1 Introduction
The validity of many of the inferences associated with a regression

analysis depends on the error term, satisfying certain assumptions.
There are four assumptions:
• is normally distributed;
• with a mean of 0;
• the variance is constant;
• all pairs of error terms are uncorrelated;
Based on these assumptions, least squares regression analysis

produces reliable statistical tests and confidence intervals.
Chapter 08 4
8.1 Introduction
What if these assumptions are not satisfied?
What will happen:
• if is not normally distributed; -- it’s fine for large sample

• The Central Limit Theorem supports this assumption for large sample sizes, suggesting that the sampling
distribution of the mean of the residuals will approximate a normal distribution regardless of the shape of the
distribution of the error term in the population
• If with a mean of not 0;
• If the mean of the error terms were not zero, it would indicate that the model has a bias. (Estimation)
• If the variance is not constant;
• Homoscedasticity (Equal variance) is crucial for the reliability of standard errors, confidence intervals, and
hypothesis tests. (inference)
• If all pairs of error terms are correlated;
• Will affect the inference-- standard errors, confidence intervals, and hypothesis tests.
Based on these assumptions, least squares regression analysis produces reliable statistical tests and
confidence intervals. Violations of these assumptions may lead to inefficiency of the OLS estimators
and incorrect inferences
Chapter 08 5
8.1 Introduction
How can we know these assumptions are not satisfied?

How to detect it?
• ε is normally distributed --- Checking the Normality Assumption

• with a mean of 0 –- detect lack of fit
• the variance σ2 is constant --- detect unequal variance
• all pairs of error terms are uncorrelated --- detecting Residual Correlation
provide you with both graphical tools and statistical tests that will aid
in identifying significant departures from the assumptions.
Chapter 08 6
CONTENTS
02 04 06
Regression Residuals Unequal Variance Identifying Influential
07

Watson Test
8.2 Regression Residuals
• The definition of Regression Residuals:

• Consider the model
• Use the data to obtain least squares estimates ：
• The regression residual is the observed value of the dependent variable minus the
predicted value:
𝜀^ =𝑦 − ^𝑦 =𝑦 − ( ^𝛽 0 + ^𝛽 1 𝑥 1+ ⋯ + ^𝛽 𝑘 𝑥 𝑘)
Chapter 08 8
• Properties of Regression Residuals:
(1) The mean of the residuals is equal to 0. This property follows from the fact
that the sum of the differences between the observed y-values and their
least squares predicted ˆy values is equal to 0.
𝑛 𝑛
∑ 𝜀^ 𝑖 =∑ ( 𝑦 𝑖 − ^𝑦 𝑖 ) =0
𝑖=1 𝑖=1
(2) The standard deviation of the residuals is equal to the standard deviation of
the fitted regression model, s.
𝑛 𝑛
∑ (𝜀−0) =∑ (𝑦𝑖− ^𝑦𝑖) =𝑆𝑆𝐸

2 2
Chapter 08
𝑖=1 𝑖=1 9
• Examples:
• Google Colab: https://colab.research.google.com
• Python tutorial:
https://colab.research.google.com/drive/1LBD-pZPYm_GopWQDv3THOp
5waicPq_-S?usp=sharing
• Examples for Chapter:

https://colab.research.google.com/drive/12IEGdJbkKcCM3AZGpizcjqgMO
yeDzKUx?usp=sharing
Chapter 08 10
CONTENTS
02 04 06
Regression Analysis Unequal Variance Identifying Influential
07

Watson Test
8.3 Detecting Lack of Fit
• Definition of Lack of Fit:
• True model:
𝑦=𝐸 ( 𝑦 ) +𝜀
𝐸 (𝑦 )= 𝛽 0+ 𝛽 1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘
𝑦 = 𝛽0 + 𝛽1 𝑥 1 +𝛽 2 𝑥 2 + ⋯ +𝛽 𝑘 𝑥 𝑘 +𝜀
𝐸 (𝜀)=0
• Mis-specified model:
E.g.: some variables are dropped
𝐸𝑚 (𝑦)≠ 𝐸 (𝑦)
𝐸 ( 𝜀𝑚 ) ≠ 0
Chapter 08 12
• Detecting Model Lack of Fit with Residuals:
1. Residual plot : x: independent variable, y: residuals
2. Residual plot: x: predicted value, y: residuals
In each plot, look for trend, dramatic changes in variability, and/or more than
5% of residuals that lie outside 2s of 0. Any of these patterns indicates a
problem with model fit.
Chapter 08 13
• Detecting Model Lack of Fit with Residuals:
Partial regression residuals plot: y: partial residual; x: Xj
We can use partial residual plot to find the trend between y and x1.
1. Partial residual plot– model with more than one independent variable
In each plot, look for trend, dramatic changes in variability, and/or more than
5% of residuals that lie outside 2s of 0. Any of these patterns indicates a
problem with model fit.
We can use partial residual plot to find the trend between y and x1.
Chapter 08 14
• Examples:
• Google Colab:

yeDzKUx?usp=sharing
Chapter 08 15
CONTENTS
02 04 06

Regression Analysis
Unequal Variance Identifying Influential
07

Watson Test
8.4 Detecting Unequal Variances
• Definition of unequal variance:
• One of the assumptions necessary for the validity of regression inferences is

that the error term have constant variance for all levels of the
independent variable(s).
• Variances that satisfy this property are called homoscedastic.
• Unequal variances for different settings of the independent variable(s) are
said to be heteroscedastic.
Chapter 08 17
• When data fail to be homoscedastic, the reason is often that the variance of
the response y is a function of its mean E(y).
• Examples:
1. If the response y is a count that has a Poisson distribution, the variance
will be equal to the mean E(y).
Chapter 08 18
• Examples:
1. If the response y is a count that has a Binomial distribution, the variance
will be equal to: 𝑝 𝑖 ( 1− 𝑝𝑖 ) 𝐸 ( 𝑦 𝑖 ) [ 1 − 𝐸 ( 𝑦 𝑖 ) ]
𝑉𝑎𝑟 ( 𝑦 𝑖 ) = =
𝑛𝑖 𝑛𝑖
Chapter 08 19
• Examples:
1. If the response y is a count that has a multiplicative
2 2 model, the variance
will be equal to:
𝑉𝑎𝑟 ( 𝑦 )=[ 𝐸 ( 𝑦 ) ] 𝜎
Chapter 08 20
Poisson Distribution Formula
−𝜆 𝑥
𝑒 𝜆
𝑃 ( 𝑋=𝑥∨ 𝜆)=
𝑋!
where:
x = number of events in an area of opportunity
 = expected number of events
e = base of the natural logarithm (2.71828...)
5-21
Poisson Distribution Formula
• Mean
𝜇= 𝜆
 Variance and Standard Deviation
2
𝜎 =𝜆
𝜎 =√ 𝜆
where  = expected number of events
5-22
Binomial Distribution Formula
where:
n = the number of experiments
x = the number of successful experiment: 0, 1, 2…
p = Probability of Success in a single experiment
5-23
Binomial Distribution Formula
• Mean
𝜇=𝑛𝑝
 Variance and Standard Deviation
2
𝜎 =𝑛𝑝(1 −𝑝 )
𝜎 = √ n𝑝(1 −𝑝)
5-24
Multiplicative Model Formula
• The random error component has been assumed to be
additive in all the models.
𝑦 =𝐸 ( 𝑦 ) +𝜀
• Another useful type of model is the multiplicative model. In
this model, the response is written as the product of its
mean and the random error component,
𝑦 =[ 𝐸 ( 𝑦 ) ] 𝜀
• The variance of this response will growth proportionally to
the square of the mean
2
𝑉𝑎𝑟 ( 𝑦 )=[ 𝐸 ( 𝑦 ) ] 𝜎
2
Chapter 08 25
• Solution:
• Variance-stabilizing transformations
• When the variance of y is a function of its mean, we can often satisfy the
least squares assumption of homoscedasticity by transforming the
response to some new response that has a constant variance.
• For example, if the response y is a count that follows a Poisson
distribution, the square root transform can be shown to have
approximately constant variance. Consequently, if the response is a
Poisson random variable, we would let
𝑦 =√ 𝑦
∗
∗
𝑦 =𝛽 0 +𝛽 1 𝑥 1+ 𝛽 2 𝑥 2+ ⋯ + 𝛽 𝑘 𝑥 𝑘+𝜀
• This model will satisfy approximately the least squares assumption of
homoscedasticity.
Chapter 08 26
• Solution:
• Variance-stabilizing transformations
• When the variance of y is a function of its mean, we can often satisfy the
least squares assumption of homoscedasticity by transforming the
response to some new response that has a constant variance.
Chapter 08 27
• Examples:
• Google Colab:

yeDzKUx?usp=sharing
Chapter 08 28

Chapter 8 Residual Analysis (Auto-Saved)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 8 Residual Analysis (Auto-Saved)

Uploaded by

Copyright:

Available Formats

CUHK

Major : Management Information Systems

Introduction Detecting Check the Normality Detecting Residual

The validity of many of the inferences associated with a regression

There are four assumptions:

Based on these assumptions, least squares regression analysis

What if these assumptions are not satisfied?

What will happen:

• if is not normally distributed; -- it’s fine for large sample

How can we know these assumptions are not satisfied?

• ε is normally distributed --- Checking the Normality Assumption

Introduction Detecting Check the Normality Detecting Residual

• The definition of Regression Residuals:

• Use the data to obtain least squares estimates ：

• Properties of Regression Residuals:

∑ (𝜀−0) =∑ (𝑦𝑖− ^𝑦𝑖) =𝑆𝑆𝐸

• Examples for Chapter:

Introduction Detecting Check the Normality Detecting Residual

1. Residual plot : x: independent variable, y: residuals

2. Residual plot: x: predicted value, y: residuals

• Examples for Chapter:

Detecting Detecting Outliers and

Introduction Detecting Check the Normality Detecting Residual

• One of the assumptions necessary for the validity of regression inferences is

• Examples for Chapter:

You might also like