Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

Linear Regression

Sir Francis Galton is the inventor of regression.


A statistical technique used to model and investigate the
relationship between one dependent variable and one or more
independent variables in order to predict the output of one
variable with the input of another.
A statistical measure that attempts to determine the strength of
the relationship between one dependent variable (denoted by Y)
and a series of other changing variables (known as independent
variables X).
Prediction is the estimation of the event happening
Forecasting is associated with the time dimension

05/25/24 1
Simple Linear Regression
 A statistical technique used to model and investigate the
relationship between one continuous dependent variable and
one continuous independent variable.
 Regression Analysis can be used to:
 Test hypotheses about the relationship of potential explainable
variables on the response
 Predict the value interval of the response variable, for specific
values of the explainable variables
 Predict, at a stated confidence level, the range of values within
which the response is expected to lie, given specific values for
the explainable variables
 Estimate the direction and degree of association between the
response
05/25/24
variable and an explainable variable 2
Simple Linear Regression Model
• Only one independent variable, X
• Relationship between X and Y is described by a linear function
• Changes in Y are assumed to be caused by changes in X
• The first order linear model
Yi= β0 + β1 X+ Єi
Y = dependent variable
X = independent variable
β0 = Y-intercept
β1 = slope of the line
05/25/24
Є = error variable 3
Basic of regression

05/25/24 4
Regression Assumptions
Assumption #1 >> Linearity
Assumption #2 >> Independence of Error

05/25/24 5
Regression Assumptions
Assumption #3 >> Homoscedasticity
Assumption #4 >> Normality of error distribution

05/25/24 6
Major Assumptions in the regression
model are :
•The relationship between the response Y and
the regressors is linear, at least
approximately.
•The error term Є has zero mean.
•The error term Є has constant variance σ2.
•The errors are uncorrelated
•The errors are normally distributed.
05/25/24 7
Simple Linear Regression Model

Population Random
Population Independent Error term
Slope
Y intercept Variable
Coefficient
Dependent
Variable

Yi  β 0  β1Xi  ε i
Linear component Random Error
component

05/25/24 8
Linear Regression with an Example
• In the below mentioned scenario X is the experience
and Y is the salary and B0 is the base value

05/25/24 9
Simple Linear Regression Equation
(Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line

Estimated (or
predicted) Y Estimate of the Estimate of the
value for regression regression slope
observation i intercept

Value of X for
observation i
Ŷi  b0  b1Xi

05/25/24 10
Simple Linear Regression Model

Y Yi  β0  β1Xi  ε i
Observed Value of
Y for Xi

εi Slope = β1

Predicted Value of Random Error for this Xi


Y for Xi value

Intercept = β0

05/25/24
Xi
11
X
Ordinary Least Squares
 Now to decide which line we fit for our input so that it
predicts well for unknown data, hence we need an error
measure. There are various error measures and the
most commonly used is the least squares method.
 Residual is simply the deviation of Y from a fitted value.
 The least squares method states that the most optimal
fit of a model to data occurs when the sum of the
squares of residuals is minimum.
2 2
^ ^ ^
 ^
 n   ^ ^

SSE  i 1  yi  y i   i 1  yi    0   1 xi  
n
y   0 1 x
    
05/25/24 12
Ordinary Least Square
 0  Mean response when x=0 (y-intercept)
 1  Change in mean response when x increases by
1 unit (slope)
• 0, 1 are unknown parameters (like )
• 0+1x  Mean response when explanatory variable
takes on the value x
• Objective : Choose values (estimates) that minimize
the sum of squared errors (SSE) of observed values to
the straight-line:
• Ordinary
05/25/24
least square minimizes the sum of squares13of
the errors.
Ordinary Least Square
The least squares estimate of the slope coefficient β1 of
true regression line is

β1= Σ(Xi-X’)(Yi-Y’)
Σ (Xi-X’)2
The least squares estimate of the intercept β0 of true
regression line is
β0 = Y’ – β1x’

min  (Yi Ŷi )  min  (Yi  (b0  b1Xi ))


2 2

05/25/24 14
Ordinary Least Square Example – The line that has the
smallest sum will be the best fitting line

05/25/24 15
R –Squared
• Is the measure of how well the line fits the past data.
• R2 tells us how much of the variation in Y is explained
by the independent variables
• How good is the line when compared to average line. If
R2 =1 then it is the best model, but normally that does
not happen. If R2 is closer to 1 then it is good. If R 2 is
more away from 1 then it is worst
• How well your model fit it to the data.
• If SS res goes zero then the line touches all the data
points
• R05/25/24
2
fitness of good (greater the better) 16
R-Squared

05/25/24 17
Adjusted R2
The use of Adjusted R2 is an attempt is to take account
of the phenomenon of the R2 automatically and
spuriously increasing when extra independent variables
are added to the model.

05/25/24 18
Adjusted R2

05/25/24 19
Example for linear regression
Suppose we have data related to the height and weight of school
children. The data scientist can start thinking about whether there
is any relation between the weights and heights of these children.
Formally, could the weight of a child be predicted based upon
his/her given height.
To fit in linear regression, the first step is to understand the data
and see whether there is a correlation between the two variables
(weight and height). Since in this case we are dealing with just two
dimensions, visualizing the data using a scatter plot will help us
understand it quickly. This will also enable us to determine if the
variables have some linear relationship or not.
05/25/24 20
Example for linear regression
#Height and weight vectors for 19 children
height <-
c(69.1,56.4,65.3,62.8,63,57.3,59.8,62.5,62.5,59.0,51.3,6
4,56.4,66.5,72.2,65.0,67.0,57.6,66.6)
weight <-
c(113,84,99,103,102,83,85,113,84,99,51,90,77,112,150,
128,133,85,112)
plot(height,weight)
cor(height,weight)
Output: [1] 0.8848454
05/25/24 21
Example for linear regression
The preceding scatter plot proves that our intuition about
weight and height having a linear relationship was correct.
This can further be confirmed using the correlation
function, which gives us a value of 0.88.
One can use the inbuilt utility lm or the linear model utility
to find the coefficients b0 and b1.
#Fitting the linear model
model <- lm(weight ~ height) # weight = slope*weight +
intercept
#get the intercept(b0) and the slope(b1) values model
The Output is :
05/25/24 22
Example for linear regression

05/25/24 23
Example for linear regression
• #check all attributes calculated by lm
• attributes(model)
• #getting only the intercept
• model$coefficients[1] #or model$coefficients[[1]]
• #getting only the slope
• model$coefficients[2] #or model$coefficients[[2]]
• #checking the residuals
• residuals(model)

05/25/24 24
Example for linear regression
#predicting the weight for a given height, say 60 inches
model$coefficients[[2]]*50 + model$coefficients[[1]]
#detailed information about the model
summary(model)
To visualize the regression line on our scatter plot itself.
#plot data points
plot(height,weight)
#draw the regression line
abline(model)
05/25/24 25
Example for linear regression

05/25/24 26
References Used
 Introduction to Linear Regression Analysis by
Douglas C. Montgomery ,Elizabeth A. Peck , G.
Geoffrey Vining – Third Edition –Wiley
Student Edition
 https://en.wikipedia.org
 http://www.statisticssolutions.com/assumptions-of-linea
r-regression/
 http://www.statisticshowto.com/simple-linear-regressio
n/
 http://stattrek.com/statistics/problems.aspx
05/25/24 27

You might also like