Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Fundamentals of Machine Learning (DSE 2222)

by
Shavantrevva S. B. & Padmashree G.
Dept. of Data Science and Computer Applications
Manipal Institute of Technology, Manipal

February 5, 2024

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 1 / 45


Overview

1 Regression Models

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 2 / 45


Overview

1 Regression Models

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 3 / 45


Regression Analysis
Regression: helps in finding the correlation between variables and
enables us to predict the continuous output variable based on the one
or more predictor variables.

Applications: prediction, forecasting, time series modeling, and deter-


mining the causal-effect relationship between variables.
Dependent Variable: The main factor in Regression analysis which
we want to predict or understand is called the dependent variable. It
is also called target variable.

Independent Variable: The factors which affect the dependent vari-


ables or which are used to predict the values of the dependent
variables are called independent variable, also called as a predictor.
Regression shows a line or curve that passes through all the data-
points on target-predictor graph in such a way that the vertical dis-
tance between the datapoints and the regression line is minimum.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 4 / 45
Why do we use Regression Analysis?

Regression estimates the relationship between the target and the


independent variable.

It is used to find the trends in data.

It helps to predict real/continuous values.

By performing the regression, we can confidently determine the most


important factor, the least important factor, and how each factor
is affecting the other factors.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 5 / 45


Types of Regression models

Shavantrevva S B , Dept of DSCA Figure 1.1: Types of Regression models


Subject Code: DSE 2222 6 / 45
Simple Linear Regression

There must be linear relation-


ship between independent and
dependent variables.
Relationship as a best fit line:
y = β0 + β1 x1 + ϵ
we can assume that there is
some noise in the data which
causes Random error ϵ

Error is normally distributed with mean 0, and standard deviations σ


To get best fit line use Least Square Method
Linear Regression is very sensitive to Outliers.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 7 / 45


Simple Linear Regression

When a random sample of observations is given, then the regression


line is expressed as;
y = β0 + β1 x1
where β0 is a constant
β1 is the regression coefficient,
X is the independent variable,
y is known as the predicted value of the dependent variable.
Given below is the formula to find the value of the regression coefficient.
Pn
(Xi −X̄ )(Yi −Ȳ )
β1 = i=1Pn 2
i=1 (Xi −X̄ )

β0 = Ȳ − β1 X̄
Where
X̄ is the mean of X values.
Ȳ is the mean of Y values.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 8 / 45


Linear Regression: Problem

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 9 / 45


Linear Regression: Applications

Analyzing trends and sales estimates


Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 10 / 45


Linear Regression:Cost Function

The cost is the difference between estimated values, or the hypothesis


and the real values.
We can determine the optimal values for the model’s parameters
and improve its performance.

Figure 1.2: Cost Function

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 11 / 45


Gradient Descent
Gradient Descent is an iterative optimization algorithm that tries to find
the optimum value (Minimum/Maximum) of an objective function.
Used for updating the parameters of a model in order to minimize a
cost function.
Gradient descent is to find the best parameters of a model which gives
the highest accuracy on training as well as testing datasets.

1 Initialize theta with random


values
2 Measure how good is our theta
using the error function. Best
Theta is the one for which error
is the least.
3 Calculate the gradient used to
update the theta values

Shavantrevva S B , Dept of DSCA vfill Subject Code: DSE 2222 12 / 45


Linear Regression: Matrix method

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 13 / 45


Linear Regression: Matrix method

1
1
https://www.youtube.com/watch?v=JUQXyQ2U1b8
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 14 / 45
Linear Regression: Matrix method

2
https://www.youtube.com/watch?v=JUQXyQ2U1b8
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 15 / 45
Multiple Linear Regression
Simple Linear Regression, where a single Independent/Predictor(X)
variable is used to model the response variable (Y).
What if, the response variable is affected by more than one predictor
variable
Multiple Linear Regression algorithm is used.
Multiple Linear Regression which models the linear relationship be-
tween a single dependent continuous variable and more than one
independent variable.
Keypoints about MLR
For MLR, the dependent or target variable(Y) must be the continu-
ous/real, but the predictor or independent variable may be of continuous
or categorical form.
Each feature variable must model the linear relationship with the depen-
dent variable.
MLR tries to fit a regression line through a multidimensional space of
data-points.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 16 / 45
Assumptions for Multiple Linear Regression

A linear relationship should exist between the Target and predictor


variables.
The regression residuals must be normally distributed.
MLR assumes little or no multicollinearity (correlation between the in-
dependent variable) in data.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 17 / 45


Multiple Linear Regression: Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 18 / 45


Multiple Linear Regression: Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 19 / 45


Multiple Linear Regression: Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 20 / 45


Multiple Linear Regression: Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 21 / 45


Polynomial Regression

Relationship between the independent variable (X) and the dependent


variable (Y) is modeled as an nth-degree polynomial.
Polynomial regression can include linear relationships (n = 1), it extends
to higher degrees, allowing for more complex and curved relationships.
Equation: Y = β0 + β1 X + β2 X 2 + β3 X 3 + ... + βn X n
Here:
Y is the dependent variable.
X is the independent variable.
β0 , β1 , β2 ....βn are the coefficients representing the intercept and the
weights for each degree of the polynomial.
ϵ represents the error term.
The degree of the polynomial (n) determines the complexity of the
model. A polynomial of degree 1 corresponds to a linear regression,
while higher degrees introduce curvature to the relationship.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 22 / 45


Need for Polynomial Regression
consider a scenario where the relationship between the input and output
is not linear.
Assume it is a quadratic relationship: Y = β0 + β1 X + β2 X 2
If we try to fit a Simple Linear Regression model to this data, it won’t
be able to represent the quadratic nature of the relationship.
The linear regression line fails to
capture the quadratic pattern.
The loss function will be high,
and the accuracy of predictions
will be low for this non-linear
dataset.
To handle non-linear relation-
ships, more complex models like
polynomial regression, decision
trees, or neural networks may be
more suitable.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 23 / 45
Need for Polynomial Regression

where data points are arranged in a non-linear fashion, we need the


Polynomial Regression model.
A Polynomial Regression algorithm is also called Polynomial Linear
Regression because it does not depend on the variables, instead, it
depends on the coefficients, which are arranged in a linear fashion.
from sklearn.preprocessing import
PolynomialFeatures
polyregs=
PolynomialFeatures(degree= 2)
xpoly= polyregs.fit transform(x)
linreg2 =LinearRegression()
linreg2.fit(xpoly, y)

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 24 / 45


Figure 1.3: Linear Regression Figure 1.4: Polynomial regression: de-
gree 2

Figure 1.5: Polynomial regression: de-


gree 3 Figure 1.6: Polynomial regression: de-
gree 4
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 25 / 45
Polynomial Regression: Polynomial Linear Regression

The model is still linear with respect to the coefficients.

it becomes non-linear in terms of the input variable x due to the pres-


ence of higher-degree terms.

Even though the equation contains terms with x raised to various pow-
ers, it’s still considered a form of linear regression because the linearity
refers to the coefficients, not the input variables.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 26 / 45


Classification Algorithm
Machine Learning algorithm can be broadly classified into Regression
and Classification Algorithms.
In Regression algorithms, we have predicted the output for continuous
values.
To predict the categorical values, we need Classification algorithms.
Binary Classifier: If the clas-
sification problem has only two
possible outcomes, then it is
called as Binary Classifier.
Examples: YES or NO, MALE
or FEMALE, SPAM or NOT
SPAM, CAT or DOG, etc.
Multi-class Classifier: If a classification problem has more than two
outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of
music.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 27 / 45
Types of ML Classification Algorithms

Classification Algorithms can be further divided into the Mainly two


category:
Linear Models
Logistic Regression
Support Vector Machines
Non-linear Models
K-Nearest Neighbours
Kernel SVM
Naı̈ve Bayes
Decision Tree Classification
Random Forest Classification

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 28 / 45


Logistic Regression
Logistic regression is a statistical method that is used for building ma-
chine learning models where the dependent variable is binary.
The independent variables can be nominal, ordinal, or of interval type.
The name “logistic regression” is derived from the concept of the lo-
gistic function that it uses.
The logistic function is also known as the sigmoid function (logistic
function lies between zero and one)

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 29 / 45


Logistic Function (Sigmoid Function):
The sigmoid function is a mathematical function used to map the
predicted values to probabilities.
It maps any real value into another value within a range of 0 and 1.

The value of the logistic regression must be between 0 and 1, which


cannot go beyond this limit, so it forms a curve like the ”S” form. The
S-form curve is called the Sigmoid function or the logistic function.

In logistic regression, we use the concept of the threshold value, which


defines the probability of either 0 or 1. Such as values above the
threshold value tends to 1, and a value below the threshold values
tends to 0.
The dependent variable must be categorical in nature.
The independent variable should not have multi-collinearity.
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 30 / 45
Logistic Regression
EXAMPLE:

Equation
1
P(x) = (1)
1+ e − (β 0 + β1 x)
3
3
https://www.youtube.com/watch?v=2C8IqOLO1os&t=0s
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 31 / 45
Logistic Regression Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 32 / 45


Logistic Regression Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 33 / 45


Logistic Regression Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 34 / 45


Logistic Regression Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 35 / 45


Logistic Regression Equation

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 36 / 45


Logistic Regression

EXAMPLE:

4
https://www.youtube.com/watch?v=2C8IqOLO1os&t=0s
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 37 / 45
Logistic Regression Solved Example

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 38 / 45


Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 39 / 45
Linear Vs. Logistic Regression

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 40 / 45


Regularization in Machine Learning

Regularization is a technique used to reduce errors by fitting the func-


tion appropriately on the given training set and avoiding overfitting.
The commonly used regularization techniques are :
1 Lasso Regularization – L1 Regularization
2 Ridge Regularization – L2 Regularization
3 Elastic Net Regularization – L1 and L2 Regularization
Lasso Regression is employed to address the issue of overfitting in
linear regression models.
Overfitting occurs when the model fits the training data too closely,
leading to poor generalization to new, unseen data.
Lasso regression also helps us achieve feature selection by penalizing
the weights to approximately equal to zero if that feature does not
serve any purpose in the model.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 41 / 45


Regularization: Lasso Regression
Penalty Term in Cost Function:
LASSO(Least Absolute Shrinkage and Selection Operator) regression.

A penalty term, also known as the regularization term, is added to the


linear regression cost function.
Lasso Regression adds the “absolute value of magnitude” of the coef-
ficient as a penalty term to the loss function(L).
n m
1X X
cost = (yi − yˆi )2 + λ |wi | (2)
n
i=1 i=1
where,
m – Number of Features
n – Number of Examples
y i – Actual Target Value
y i(hat) – Predicted Target Value
Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 42 / 45
Regularization: Lasso Regression

Coefficient Shrinkage:
The presence of the penalty term encourages the optimization algo-
rithm to minimize not only the error term but also the sum of squared
coefficients.
As a result, the coefficients of the linear regression model are ”shrunk”
toward zero.
Ridge regression
Ridge regression adds the “squared magnitude” of the coefficient as a
penalty term to the loss function(L).
n m
1X X
cost = (yi − yˆi )2 + λ wi2 (3)
n
i=1 i=1

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 43 / 45


Elastic Net Regularization – L1 and L2 Regularization

This model is a combination of L1 as well as L2 regularization. That


implies that we add the absolute norm of the weights as well as the
squared measure of the weights.

In regularization technique, we reduce the magnitude of the features by


keeping the same number of features.

This technique discourages learning a more complex or flexible model, so


as to avoid the risk of overfitting.
λ is the tuning parameter that decides how much we want to penalize the
flexibility of our model.

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 44 / 45


Thank You

Shavantrevva S B , Dept of DSCA Subject Code: DSE 2222 45 / 45

You might also like