Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Introduction to Linear Regression

Overview of the Week:


● Monday - Linear Regression Basics
○ What is the main idea
○ Errors and Residuals
● Wednesday - Least Squares
○ How is the typical regression line calculated
● Friday - Conditions/Assumptions
○ When is linear regression not appropriate
○ What happens when assumptions are violated

2
Topics:
● The main idea

● Two types of regression models

● Interpreting model coefficients

3
Models

● Models don’t need to


perfectly reflect reality
to be useful
● Think of Models as
Maps
● Nearly everything we
do in statistics relates to
models

4
The main idea of a regression model
Example: Factory production*
Y = RunTime (the time of a production run in minutes)

X = RunSize (the size of a production run)

Mathematical equation of the blue line

X Y

• Predictor variable • Response variable


• Explanatory variable
• Independent variable • Dependent variable

A regression model allows us to:


• Predict Y based on X, as X is often easier to obtain than Y.
• Describe how Y relates to X

When the model has only one predictor variable, it is Simple Linear Regression.

5
*Based on the example in A Modern Approach to Regression in R (2009) by Dr Simon Sheather.
The two types of regression models:
Population model vs. Fitted model
● By Assuming the entire population has a linear relationship we can write the
population (regression) model.

● We can estimate this true population model with a fitted (regression) model
(shown in Blue).
The two types of regression models:
Population model vs. Fitted model
Most of the time, we don’t have access to the entire population, but want to draw
conclusions about the population.

One major goal of Statistics: Making Inference!


● Given the sample information, draw conclusion about the trends
in the broader population
Interpret the model coefficients in the fitted model

Intercept
Basic: When x =0, the value of y will be ________ on average.
In context: The average setup time for the production is 149.75 min.
The intercept doesn’t always have a meaningful interpretation in context.

Slope
Basic: When x increases by 1 unit, the value of y (increases/decreases) by 0.26
on average.
In context: When we produce one more product, the production time
increases by 0.26 min on average. 8
Error vs. Residual
Topics:
● The concept of error

● The concept of residual

● Error vs. Residual

10
Error is the difference between the
observed Y and the expected Y.
Point A

Point B x y E(Y|X=x) e
=5-0.5x
Point A 4.5 4.2 2.75 1.45
Point B 0.8 4 4.6 -0.6

• Error accounts for the variation in Y that


cannot be explained by the population model.

• Error does not depend on X.


• Error is a random variable, even though it is
denoted by a lowercase letter.

Signal Noise • When we do not know the population model,11


error cannot be measured.
Residual is the difference between the
observed Y and the fitted Y.
Point A

Point B x y Fitted Y Residual Error


Point A 4.5 4.2 3.22 0.98 1.45
Point B 0.8 4 4.85 -0.85 -0.6

• Residual measures the “goodness of the fit”.

• Residual is a random variable.

Signal Noise
12
Error vs. Residual
Error Residual
Notation

Definition

Random?

Distribution

You might also like