Regression

Fundamentals of Machine Learning (DSE 2222)
by
Shavantrevva S. B. & Padmashree G.
Dept. of Data Science and Computer Applications
Manipal Institute of Technology, Manipal
February 5, 2024
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 1 / 45

Overview
1 Regression Models

Overview
1 Regression Models

Regression Analysis
Regression: helps in finding the correlation between variables and
enables us to predict the continuous output variable based on the one
or more predictor variables.
Applications: prediction, forecasting, time series modeling, and deter-

mining the causal-effect relationship between variables.
Dependent Variable: The main factor in Regression analysis which
we want to predict or understand is called the dependent variable. It
is also called target variable.
Independent Variable: The factors which affect the dependent vari-

ables or which are used to predict the values of the dependent
variables are called independent variable, also called as a predictor.
Regression shows a line or curve that passes through all the data-
points on target-predictor graph in such a way that the vertical dis-
tance between the datapoints and the regression line is minimum.
Why do we use Regression Analysis?
Regression estimates the relationship between the target and the

independent variable.
It is used to find the trends in data.
It helps to predict real/continuous values.
By performing the regression, we can confidently determine the most

important factor, the least important factor, and how each factor
is affecting the other factors.

Types of Regression models
Shavantrevva S B , Dept of DSCA Figure 1.1: Types of Regression models

Subject Code: DSE 2222 6 / 45
Simple Linear Regression
There must be linear relation-

ship between independent and
dependent variables.
Relationship as a best fit line:
y = β0 + β1 x1 + ϵ
we can assume that there is
some noise in the data which
causes Random error ϵ
Error is normally distributed with mean 0, and standard deviations σ

To get best fit line use Least Square Method
Linear Regression is very sensitive to Outliers.

Simple Linear Regression
When a random sample of observations is given, then the regression

line is expressed as;
y = β0 + β1 x1
where β0 is a constant
β1 is the regression coefficient,
X is the independent variable,
y is known as the predicted value of the dependent variable.
Given below is the formula to find the value of the regression coefficient.
Pn
(Xi −X̄ )(Yi −Ȳ )
β1 = i=1Pn 2
i=1 (Xi −X̄ )
β0 = Ȳ − β1 X̄
Where
X̄ is the mean of X values.
Ȳ is the mean of Y values.

Linear Regression: Problem

Linear Regression: Applications
Analyzing trends and sales estimates

Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.

Linear Regression:Cost Function
The cost is the difference between estimated values, or the hypothesis

and the real values.
We can determine the optimal values for the model’s parameters
and improve its performance.
Figure 1.2: Cost Function

Gradient Descent
Gradient Descent is an iterative optimization algorithm that tries to find
the optimum value (Minimum/Maximum) of an objective function.
Used for updating the parameters of a model in order to minimize a
cost function.
Gradient descent is to find the best parameters of a model which gives
the highest accuracy on training as well as testing datasets.
1 Initialize theta with random

values
2 Measure how good is our theta
using the error function. Best
Theta is the one for which error
is the least.
3 Calculate the gradient used to
update the theta values
Shavantrevva S B , Dept of DSCA vfill Subject Code: DSE 2222 12 / 45

Linear Regression: Matrix method

1
1
https://www.youtube.com/watch?v=JUQXyQ2U1b8
2
https://www.youtube.com/watch?v=JUQXyQ2U1b8
Multiple Linear Regression
Simple Linear Regression, where a single Independent/Predictor(X)
variable is used to model the response variable (Y).
What if, the response variable is affected by more than one predictor
variable
Multiple Linear Regression algorithm is used.
Multiple Linear Regression which models the linear relationship be-
tween a single dependent continuous variable and more than one
independent variable.
Keypoints about MLR
For MLR, the dependent or target variable(Y) must be the continu-
ous/real, but the predictor or independent variable may be of continuous
or categorical form.
Each feature variable must model the linear relationship with the depen-
dent variable.
MLR tries to fit a regression line through a multidimensional space of
data-points.
Assumptions for Multiple Linear Regression
A linear relationship should exist between the Target and predictor

variables.
The regression residuals must be normally distributed.
MLR assumes little or no multicollinearity (correlation between the in-
dependent variable) in data.

Multiple Linear Regression: Equation




Polynomial Regression
Relationship between the independent variable (X) and the dependent

variable (Y) is modeled as an nth-degree polynomial.
Polynomial regression can include linear relationships (n = 1), it extends
to higher degrees, allowing for more complex and curved relationships.
Equation: Y = β0 + β1 X + β2 X 2 + β3 X 3 + ... + βn X n
Here:
Y is the dependent variable.
X is the independent variable.
β0 , β1 , β2 ....βn are the coefficients representing the intercept and the
weights for each degree of the polynomial.
ϵ represents the error term.
The degree of the polynomial (n) determines the complexity of the
model. A polynomial of degree 1 corresponds to a linear regression,
while higher degrees introduce curvature to the relationship.

Need for Polynomial Regression
consider a scenario where the relationship between the input and output
is not linear.
Assume it is a quadratic relationship: Y = β0 + β1 X + β2 X 2
If we try to fit a Simple Linear Regression model to this data, it won’t
be able to represent the quadratic nature of the relationship.
The linear regression line fails to
capture the quadratic pattern.
The loss function will be high,
and the accuracy of predictions
will be low for this non-linear
dataset.
To handle non-linear relation-
ships, more complex models like
polynomial regression, decision
trees, or neural networks may be
more suitable.
Need for Polynomial Regression
where data points are arranged in a non-linear fashion, we need the

Polynomial Regression model.
A Polynomial Regression algorithm is also called Polynomial Linear
Regression because it does not depend on the variables, instead, it
depends on the coefficients, which are arranged in a linear fashion.
from sklearn.preprocessing import
PolynomialFeatures
polyregs=
PolynomialFeatures(degree= 2)
xpoly= polyregs.fit transform(x)
linreg2 =LinearRegression()
linreg2.fit(xpoly, y)

Figure 1.3: Linear Regression Figure 1.4: Polynomial regression: de-
gree 2
Figure 1.5: Polynomial regression: de-

gree 3 Figure 1.6: Polynomial regression: de-
gree 4
Polynomial Regression: Polynomial Linear Regression
The model is still linear with respect to the coefficients.
it becomes non-linear in terms of the input variable x due to the pres-

ence of higher-degree terms.
Even though the equation contains terms with x raised to various pow-
ers, it’s still considered a form of linear regression because the linearity
refers to the coefficients, not the input variables.

Classification Algorithm
Machine Learning algorithm can be broadly classified into Regression
and Classification Algorithms.
In Regression algorithms, we have predicted the output for continuous
values.
To predict the categorical values, we need Classification algorithms.
Binary Classifier: If the clas-
sification problem has only two
possible outcomes, then it is
called as Binary Classifier.
Examples: YES or NO, MALE
or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than two
outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of
music.
Types of ML Classification Algorithms
Classification Algorithms can be further divided into the Mainly two

category:
Linear Models
Logistic Regression
Support Vector Machines
Non-linear Models
K-Nearest Neighbours
Kernel SVM
Naı̈ve Bayes
Decision Tree Classification
Random Forest Classification

Logistic Regression
Logistic regression is a statistical method that is used for building ma-
chine learning models where the dependent variable is binary.
The independent variables can be nominal, ordinal, or of interval type.
The name “logistic regression” is derived from the concept of the lo-
gistic function that it uses.
The logistic function is also known as the sigmoid function (logistic
function lies between zero and one)

Logistic Function (Sigmoid Function):
The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which

cannot go beyond this limit, so it forms a curve like the ”S” form. The
S-form curve is called the Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which

defines the probability of either 0 or 1. Such as values above the
threshold value tends to 1, and a value below the threshold values
tends to 0.
The dependent variable must be categorical in nature.
The independent variable should not have multi-collinearity.
Logistic Regression
EXAMPLE:
Equation
1
P(x) = (1)
1+ e − (β 0 + β1 x)
3
3
https://www.youtube.com/watch?v=2C8IqOLO1os&t=0s
Logistic Regression Equation





Logistic Regression
EXAMPLE:
4
https://www.youtube.com/watch?v=2C8IqOLO1os&t=0s
Logistic Regression Solved Example

Linear Vs. Logistic Regression

Regularization in Machine Learning
Regularization is a technique used to reduce errors by fitting the func-

tion appropriately on the given training set and avoiding overfitting.
The commonly used regularization techniques are :
1 Lasso Regularization – L1 Regularization
2 Ridge Regularization – L2 Regularization
3 Elastic Net Regularization – L1 and L2 Regularization
Lasso Regression is employed to address the issue of overfitting in
linear regression models.
Overfitting occurs when the model fits the training data too closely,
leading to poor generalization to new, unseen data.
Lasso regression also helps us achieve feature selection by penalizing
the weights to approximately equal to zero if that feature does not
serve any purpose in the model.

Regularization: Lasso Regression
Penalty Term in Cost Function:
LASSO(Least Absolute Shrinkage and Selection Operator) regression.
A penalty term, also known as the regularization term, is added to the

linear regression cost function.
Lasso Regression adds the “absolute value of magnitude” of the coef-
ficient as a penalty term to the loss function(L).
n m
1X X
cost = (yi − yˆi )2 + λ |wi | (2)
n
i=1 i=1
where,
m – Number of Features
n – Number of Examples
y i – Actual Target Value
y i(hat) – Predicted Target Value
Regularization: Lasso Regression
Coefficient Shrinkage:
The presence of the penalty term encourages the optimization algo-
rithm to minimize not only the error term but also the sum of squared
coefficients.
As a result, the coefficients of the linear regression model are ”shrunk”
toward zero.
Ridge regression
Ridge regression adds the “squared magnitude” of the coefficient as a
penalty term to the loss function(L).
n m
1X X
cost = (yi − yˆi )2 + λ wi2 (3)
n
i=1 i=1

Elastic Net Regularization – L1 and L2 Regularization
This model is a combination of L1 as well as L2 regularization. That

implies that we add the absolute norm of the weights as well as the
squared measure of the weights.
In regularization technique, we reduce the magnitude of the features by

keeping the same number of features.
This technique discourages learning a more complex or flexible model, so

as to avoid the risk of overfitting.
λ is the tuning parameter that decides how much we want to penalize the
flexibility of our model.

Thank You

Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression

Uploaded by

Copyright:

Available Formats

Fundamentals of Machine Learning (DSE 2222)

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 1 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 2 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 3 / 45

Applications: prediction, forecasting, time series modeling, and deter-

Independent Variable: The factors which affect the dependent vari-

Regression estimates the relationship between the target and the

It is used to find the trends in data.

It helps to predict real/continuous values.

By performing the regression, we can confidently determine the most

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 5 / 45

Shavantrevva S B , Dept of DSCA Figure 1.1: Types of Regression models

There must be linear relation-

Error is normally distributed with mean 0, and standard deviations σ

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 7 / 45

When a random sample of observations is given, then the regression

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 8 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 9 / 45

Analyzing trends and sales estimates

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 10 / 45

The cost is the difference between estimated values, or the hypothesis

Figure 1.2: Cost Function

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 11 / 45

1 Initialize theta with random

Shavantrevva S B , Dept of DSCA vfill Subject Code: DSE 2222 12 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 13 / 45

A linear relationship should exist between the Target and predictor

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 17 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 18 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 19 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 20 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 21 / 45

Relationship between the independent variable (X) and the dependent

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 22 / 45

where data points are arranged in a non-linear fashion, we need the

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 24 / 45

Figure 1.5: Polynomial regression: de-

The model is still linear with respect to the coefficients.

it becomes non-linear in terms of the input variable x due to the pres-

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 26 / 45

Classification Algorithms can be further divided into the Mainly two

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 28 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 29 / 45

The value of the logistic regression must be between 0 and 1, which

In logistic regression, we use the concept of the threshold value, which

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 32 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 33 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 34 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 35 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 36 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 38 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 40 / 45

Regularization is a technique used to reduce errors by fitting the func-

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 41 / 45

A penalty term, also known as the regularization term, is added to the

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 43 / 45

This model is a combination of L1 as well as L2 regularization. That

In regularization technique, we reduce the magnitude of the features by

This technique discourages learning a more complex or flexible model, so

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 44 / 45

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 45 / 45

You might also like