Simple Linear Regression

Linear Regression
Sir Francis Galton is the inventor of regression.

A statistical technique used to model and investigate the
relationship between one dependent variable and one or more
independent variables in order to predict the output of one
variable with the input of another.
A statistical measure that attempts to determine the strength of
the relationship between one dependent variable (denoted by Y)
and a series of other changing variables (known as independent
variables X).
Prediction is the estimation of the event happening
Forecasting is associated with the time dimension
05/25/24 1
Simple Linear Regression
 A statistical technique used to model and investigate the
relationship between one continuous dependent variable and
one continuous independent variable.
 Regression Analysis can be used to:
 Test hypotheses about the relationship of potential explainable
variables on the response
 Predict the value interval of the response variable, for specific
values of the explainable variables
 Predict, at a stated confidence level, the range of values within
which the response is expected to lie, given specific values for
the explainable variables
 Estimate the direction and degree of association between the
response
05/25/24
variable and an explainable variable 2
Simple Linear Regression Model
• Only one independent variable, X
• Relationship between X and Y is described by a linear function
• Changes in Y are assumed to be caused by changes in X
• The first order linear model
Yi= β0 + β1 X+ Єi
Y = dependent variable
X = independent variable
β0 = Y-intercept
β1 = slope of the line
05/25/24
Є = error variable 3
Basic of regression
05/25/24 4
Regression Assumptions
Assumption #1 >> Linearity
Assumption #2 >> Independence of Error
05/25/24 5
Regression Assumptions
Assumption #3 >> Homoscedasticity
Assumption #4 >> Normality of error distribution
05/25/24 6
Major Assumptions in the regression
model are :
•The relationship between the response Y and
the regressors is linear, at least
approximately.
•The error term Є has zero mean.
•The error term Є has constant variance σ2.
•The errors are uncorrelated
•The errors are normally distributed.
05/25/24 7
Population Random
Population Independent Error term
Slope
Y intercept Variable
Coefficient
Dependent
Variable
Yi  β 0  β1Xi  ε i
Linear component Random Error
component
05/25/24 8
Linear Regression with an Example
• In the below mentioned scenario X is the experience
and Y is the salary and B0 is the base value
05/25/24 9
Simple Linear Regression Equation
(Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line
Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept
Value of X for
observation i
Ŷi  b0  b1Xi
05/25/24 10
Y Yi  β0  β1Xi  ε i
Observed Value of
Y for Xi
εi Slope = β1
Predicted Value of Random Error for this Xi

Y for Xi value
Intercept = β0
05/25/24
Xi
11
X
Ordinary Least Squares
 Now to decide which line we fit for our input so that it
predicts well for unknown data, hence we need an error
measure. There are various error measures and the
most commonly used is the least squares method.
 Residual is simply the deviation of Y from a fitted value.
 The least squares method states that the most optimal
fit of a model to data occurs when the sum of the
squares of residuals is minimum.
2 2
^ ^ ^
 ^
 n   ^ ^

SSE  i 1  yi  y i   i 1  yi    0   1 xi  
n
y   0 1 x
    
05/25/24 12
Ordinary Least Square
 0  Mean response when x=0 (y-intercept)
 1  Change in mean response when x increases by
1 unit (slope)
• 0, 1 are unknown parameters (like )
• 0+1x  Mean response when explanatory variable
takes on the value x
• Objective : Choose values (estimates) that minimize
the sum of squared errors (SSE) of observed values to
the straight-line:
• Ordinary
05/25/24
least square minimizes the sum of squares13of
the errors.
Ordinary Least Square
The least squares estimate of the slope coefficient β1 of
true regression line is
β1= Σ(Xi-X’)(Yi-Y’)
Σ (Xi-X’)2
The least squares estimate of the intercept β0 of true
regression line is
β0 = Y’ – β1x’
min  (Yi Ŷi )  min  (Yi  (b0  b1Xi ))

2 2
05/25/24 14
Ordinary Least Square Example – The line that has the
smallest sum will be the best fitting line
05/25/24 15
R –Squared
• Is the measure of how well the line fits the past data.
• R2 tells us how much of the variation in Y is explained
by the independent variables
• How good is the line when compared to average line. If
R2 =1 then it is the best model, but normally that does
not happen. If R2 is closer to 1 then it is good. If R 2 is
more away from 1 then it is worst
• How well your model fit it to the data.
• If SS res goes zero then the line touches all the data
points
• R05/25/24
2
fitness of good (greater the better) 16
R-Squared
05/25/24 17
Adjusted R2
The use of Adjusted R2 is an attempt is to take account
of the phenomenon of the R2 automatically and
spuriously increasing when extra independent variables
are added to the model.
05/25/24 18
Adjusted R2
05/25/24 19
Example for linear regression
Suppose we have data related to the height and weight of school
children. The data scientist can start thinking about whether there
is any relation between the weights and heights of these children.
Formally, could the weight of a child be predicted based upon
his/her given height.
To fit in linear regression, the first step is to understand the data
and see whether there is a correlation between the two variables
(weight and height). Since in this case we are dealing with just two
dimensions, visualizing the data using a scatter plot will help us
understand it quickly. This will also enable us to determine if the
variables have some linear relationship or not.
05/25/24 20
#Height and weight vectors for 19 children
height <-
c(69.1,56.4,65.3,62.8,63,57.3,59.8,62.5,62.5,59.0,51.3,6
4,56.4,66.5,72.2,65.0,67.0,57.6,66.6)
weight <-
c(113,84,99,103,102,83,85,113,84,99,51,90,77,112,150,
128,133,85,112)
plot(height,weight)
cor(height,weight)
Output: [1] 0.8848454
05/25/24 21
The preceding scatter plot proves that our intuition about
weight and height having a linear relationship was correct.
This can further be confirmed using the correlation
function, which gives us a value of 0.88.
One can use the inbuilt utility lm or the linear model utility
to find the coefficients b0 and b1.
#Fitting the linear model
model <- lm(weight ~ height) # weight = slope*weight +
intercept
#get the intercept(b0) and the slope(b1) values model
The Output is :
05/25/24 22
05/25/24 23
• #check all attributes calculated by lm
• attributes(model)
• #getting only the intercept
• model$coefficients[1] #or model$coefficients[[1]]
• #getting only the slope
• model$coefficients[2] #or model$coefficients[[2]]
• #checking the residuals
• residuals(model)
05/25/24 24
#predicting the weight for a given height, say 60 inches
model$coefficients[[2]]*50 + model$coefficients[[1]]
#detailed information about the model
summary(model)
To visualize the regression line on our scatter plot itself.
#plot data points
plot(height,weight)
#draw the regression line
abline(model)
05/25/24 25
05/25/24 26
References Used
 Introduction to Linear Regression Analysis by
Douglas C. Montgomery ,Elizabeth A. Peck , G.
Geoffrey Vining – Third Edition –Wiley
Student Edition
 https://en.wikipedia.org
 http://www.statisticssolutions.com/assumptions-of-linea
r-regression/
 http://www.statisticshowto.com/simple-linear-regressio
n/
 http://stattrek.com/statistics/problems.aspx
05/25/24 27

Simple Linear Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Linear Regression

Uploaded by

Copyright:

Available Formats

Linear Regression

Sir Francis Galton is the inventor of regression.

Predicted Value of Random Error for this Xi

min  (Yi Ŷi )  min  (Yi  (b0  b1Xi ))

You might also like