Chapter 14

Simple Linear Regression
Lecture 14
Centre for Data Science, ITER

Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India.
1 / 18
Contents
1 Introduction
2 Simple Linear Regression model
3 Gradient Descent
4 Maximum Likelihood Estimation
2 / 18
Introduction
Correlation:
Correlation function to measure the strength and direction of the
linear linear relationship between two variables.
The value of the correlation between two variable lies between -1
to 1.
If the two variables move in the same direction, then those
variables are said to have a positive correlation. If they move in
opposite directions, then they have a negative correlation.
What is the need of linear regression ?
3 / 18
Simple Linear Regression model
The model for linear regression with one predictor variable can be
stated as follows:
yi = βxi + α + ϵi , where
yi is the value of the response variable in the ith trial.

αandβ are parameters.
xi is known, it is the value of the predictor variable in the ith trial.
ϵi is a random error term ,normally distributed with mean,E( ϵi ) = 0
and variance, V( ϵi ) = σ 2
Cov(ϵi , ϵj ) =0 if i ̸= j for i=1,...,n and j=1,...,n.
The above regression model is said to be simple linear regression
because it is linear in terms of parameters.
4 / 18
Simple Linear Regression (Contd.)
Examples of Simple Linear Regression model:
Figure 1: first figure Figure 2: second figure
5 / 18
Program to plot regression line and scatter plot :
1 i m p o r t numpy as np
2 i m p o r t seaborn as sns
3 i m p o r t m a t p l o t l i b . p y p l o t as p l t
4
5 x =[1 ,2 ,3 ,4 ,9]
6 y =[5 ,2 ,9 ,7 ,3]
7
8 plt . figure ()
9 sns . r e g p l o t ( x , y , f i t r e g = True )
10 p l t . s c a t t e r ( np . mean ( x ) , np . mean ( y ) , c o l o r = ’ green ’ )
6 / 18
Least square Estimation of α and β :
Correlation(x, y) ∗ σy
βb =
σx
b = y - βx
α b , where
y = mean of y
x= mean of x
σy =standard deviation of y
σx =standard deviation of x
7 / 18
Simple Linear Regression (Cont.)
Program to find the least square estimates of α and β :
1.By Formula:
1 i m p o r t numpy as np
2 x = np . a r r a y ( [ 5 , 15 , 25 , 35 , 45 , 5 5 ] )
3 y = np . a r r a y ( [ 5 , 20 , 14 , 32 , 22 , 3 8 ] )
4
5 from t y p i n g i m p o r t Tuple
6 from s c r a t c h . l i n e a r a l g e b r a i m p o r t V e c t o r
7 from s c r a t c h . s t a t i s t i c s i m p o r t c o r r e l a t i o n , s t a n d a r d d e v i a t i o n ,
mean
8 ” ” ” Given two v e c t o r s x and y ,
9 f i n d t h e l e a s t −squares v a l u e s o f alpha and beta ” ” ”
10
11 d e f l e a s t s q u a r e s f i t ( x : Vector , y : V e c t o r ) −> Tuple [ f l o a t ,
float ]:
12 beta = c o r r e l a t i o n ( x , y ) * s t a n d a r d d e v i a t i o n ( y ) /
standard deviation ( x )
13 alpha = mean ( y ) − beta * mean ( x )
14 r e t u r n alpha , beta
8 / 18
Program to predict for any value of x and to find the error :
1 d e f p r e d i c t ( alpha : f l o a t , beta : f l o a t , x i : f l o a t ) −> f l o a t :

2 r e t u r n beta * x i + alpha
3 ” ” ” The e r r o r from p r e d i c t i n g beta * x i + alpha
4 when t h e a c t u a l v a l u e i s y i ” ” ”
5 d e f e r r o r ( alpha : f l o a t , beta : f l o a t , x i : f l o a t , y i : f l o a t ) −>
float :
6 r e t u r n p r e d i c t ( alpha , beta , x i ) − y i
7
8 from s c r a t c h . l i n e a r a l g e b r a i m p o r t V e c t o r
9 d e f s u m o f s q e r r o r s ( alpha : f l o a t , beta : f l o a t , x : Vector , y :
V e c t o r ) −> f l o a t :
10 r e t u r n sum ( e r r o r ( alpha , beta , x i , y i ) * * 2
11 for x i , y i in zip (x , y ) )
12
13 m1= l e a s t s q u a r e s f i t ( x , y )
14 p r e d i c t (m1 [ 0 ] , m1 [ 1 ] , 20)
15 E= e r r o r (m1 [ 0 ] , m1 [ 1 ] , x , y )
9 / 18
Program to find the least square estimates of α and β :

2.By Inbuilt module of python:
1 import s t a t s m o d e l s . a p i as s
2 import numpy as np
3 import seaborn as sns
4 import m a t p l o t l i b . p y p l o t as p l t
5 import s t a t s m o d e l s . f o r m u l a . a p i as sm
6
7 x = np . a r r a y ( [ 5 , 15 , 25 , 35 , 45 , 5 5 ] ) . reshape ( ( − 1 , 1 ) )
8 y = np . a r r a y ( [ 5 , 20 , 14 , 32 , 22 , 3 8 ] )
9
10 x= s . a d d c o n s t a n t ( x )
11
12 model1=s . OLS( y , x )
13 r e s u l t 1 =model1 . f i t ( )
14 p r i n t ( r e s u l t 1 . summary ( ) )
10 / 18
Test the least square module:
1 x = [ i f o r i i n range ( −100 , 110 , 10) ]

2 y = [3 * i − 5 f or i in x ]
3 # Should f i n d t h a t y = 3x − 5
4 p= l e a s t s q u a r e s f i t ( x , y )
5 a s s e r t l e a s t s q u a r e s f i t ( x , y ) == ( −5 , 3 )
Use of assert statement in least square method:
1 from s c r a t c h . s t a t i s t i c s i m p o r t num friends good ,

daily minutes good
2 alpha , beta = l e a s t s q u a r e s f i t ( num friends good ,
daily minutes good )
3 a s s e r t 22.9 < alpha < 23.0
4 a s s e r t 0 . 9 < beta < 0.905
11 / 18
Calculation of R-square:
1 from s c r a t c h . s t a t i s t i c s i m p o r t de mean
2 d e f t o t a l s u m o f s q u a r e s ( y : V e c t o r ) −> f l o a t :
3 # ” ” ” t h e t o t a l squared v a r i a t i o n o f y i ’ s from t h e i r mean ” ” ”
4 r e t u r n sum ( v * * 2 f o r v i n de mean ( y ) )
5 d e f r s q u a r e d ( alpha : f l o a t , beta : f l o a t , x : Vector , y : V e c t o r )
−> f l o a t :
6 r e t u r n 1 . 0 − ( s u m o f s q e r r o r s ( alpha , beta , x , y ) /
total sum of squares ( y ) )
7
8 # ” ” ” t h e f r a c t i o n o f v a r i a t i o n i n y c a p t u r e d by t h e model , which
equals
9 #1 − t h e f r a c t i o n o f v a r i a t i o n i n y n o t c a p t u r e d by t h e model
”””
10
11 r s q = r s q u a r e d ( alpha , beta , num friends good ,
daily minutes good )
12 a s s e r t 0.328 < r s q < 0.330
12 / 18
R-square:
R-square value lies between 0 to 1
The model whose R-squared value close to 1 indicates that the
model is a maximum variance can be explained using the model.
If the R-square value close to 0 indicates that is the variance of
response can’t be explained using x.
13 / 18
R-square comparison between two plots:
Figure 3: R 2 close to 0 Figure 4: R 2 close to 1
14 / 18
Gradient Descent :
Program to estimate the parameters using gradient and Descent:

1 i m p o r t random
2 i m p o r t tqdm
3 from s c r a t c h . g r a d i e n t d e s c e n t i m p o r t g r a d i e n t s t e p
4 num epochs = 10000
5 random . seed ( 0 )
6 guess = [ random . random ( ) , random . random ( ) ] # choose random
value to s t a r t
7 l e a r n i n g r a t e = 0.00001
8 w i t h tqdm . t r a n g e ( num epochs ) as t :
9 for in t :
10 alpha , beta = guess
11 # P a r t i a l d e r i v a t i v e o f l o s s w i t h r e s p e c t t o alpha
12 grad a = sum( 2 * e r r o r ( alpha , beta , x i , y i )
13 f o r x i , y i i n z i p ( num friends good ,
daily minutes good ) )
15 / 18
Gradient Descent (Cont...) :
Program to estimate the parameters using gradient and Descent:

1 # P a r t i a l d e r i v a t i v e o f l o s s w i t h r e s p e c t t o beta
2 grad b = sum( 2 * e r r o r ( alpha , beta , x i , y i ) * x i
3 f o r x i , y i i n z i p ( num friends good ,
4 daily minutes good ) )
5 # Compute l o s s t o s t i c k i n t h e tqdm d e s c r i p t i o n
6 l o s s = s u m o f s q e r r o r s ( alpha , beta ,
7 num friends good , d a i l y m i n u t e s g o o d
)
8 t . set description ( f ” loss : { loss : . 3 f } ” )
9 # F i n a l l y , update t h e guess
10 guess = g r a d i e n t s t e p ( guess , [ grad a , grad b ] , −
learning rate )
11 # We should g e t p r e t t y much t h e same r e s u l t s :
12 alpha , beta
13 a s s e r t 22.9 < alpha < 23.0
14 a s s e r t 0 . 9 < beta < 0.905
16 / 18
Maximum Likelihood Estimation :
Imagine that we have a sample of data v1 , v2 , ..., vn that comes

from a distribution that depends on some unknown parameter θ .
If θ is unknown then we can consider the likelihood of θ given the
sample that is
L (θ|v1 , ..., vn )
Under this approach, the most likely is the value that maximizes
this likelihood function.
As per the assumption of the Linear regression model, the errora
are normally distributed with mean 0 and standard deviation σ. So
the likelihood of α and β based on x and y data set is :
1
L (α, β|xi , yi , σ) = √ exp − (yi − α − βxi )2 /2σ 2
2πσ
17 / 18
Thank You
Any Questions?
18 / 18

Chapter 14

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 14

Uploaded by

Copyright:

Available Formats

Simple Linear Regression

Centre for Data Science, ITER

2 Simple Linear Regression model

4 Maximum Likelihood Estimation

What is the need of linear regression ?

yi is the value of the response variable in the ith trial.

Examples of Simple Linear Regression model:

Figure 1: first figure Figure 2: second figure

Program to plot regression line and scatter plot :

Least square Estimation of α and β :

1 d e f p r e d i c t ( alpha : f l o a t , beta : f l o a t , x i : f l o a t ) −> f l o a t :

Program to find the least square estimates of α and β :

Test the least square module:

1 x = [ i f o r i i n range ( −100 , 110 , 10) ]

Use of assert statement in least square method:

1 from s c r a t c h . s t a t i s t i c s i m p o r t num friends good ,

R-square comparison between two plots:

Figure 3: R 2 close to 0 Figure 4: R 2 close to 1

Program to estimate the parameters using gradient and Descent:

Program to estimate the parameters using gradient and Descent:

Imagine that we have a sample of data v1 , v2 , ..., vn that comes

You might also like