Unit 7

Linear Regression
• Linear Regression is a machine learning algorithm based on supervised

learning.
• It performs a regression task.
• Regression models a target prediction value based on independent
variables.
• It is mostly used for finding out the relationship between variables and
forecasting.
• Different regression models differ based on – the kind of relationship
between dependent and independent variables they are considering, and
the number of independent variables getting used.
Linear Regression
Linear Regression
• Linear regression performs the task to predict a dependent variable value
(y) based on a given independent variable (x).
• So, this regression technique finds out a linear relationship between x
(input) and y(output). Hence, the name is Linear Regression.
• In the figure above, X (input) is the work experience and Y (output) is the
salary of a person.
• The regression line is the best fit line for our model.
Hypothesis function for Linear Regression :
While training the model we are given :

x: input training data (univariate – one input
variable(parameter))
y: labels to data (supervised learning)
Linear Regression
• When training the model – it fits the best line to predict the value of y for
a given value of x. The model gets the best regression fit line by finding
the best θ1 and θ2 values.
• θ1: intercept
• θ2: coefficient of x
• Once we find the best θ1 and θ2 values, we get the best fit line. So when
we are finally using our model for prediction, it will predict the value of y
for the input value of x.
Linear Regression
How to update θ1 and θ2 values to get the best
fit line ?
• Cost Function (J):
• By achieving the best-fit regression line, the model aims to predict y
value such that the error difference between predicted value and true
value is minimum. So, it is very important to update the θ1 and θ2 values,
to reach the best value that minimize the error between predicted y value
(pred) and true y value (y).
Linear Regression
• Cost function(J) of Linear Regression is the Root Mean Squared Error

(RMSE) between predicted y value (pred) and true y value (y)
Linear Regression
• Gradient Descent:
• To update θ1 and θ2 values in order to reduce Cost function (minimizing
RMSE value) and achieving the best fit line the model uses Gradient
Descent.
• The idea is to start with random θ1 and θ2 values and then iteratively
updating the values, reaching minimum cost.
Multiple Linear regression:
• If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.
Model Performance:
• The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is
called optimization.
• R-squared method:
• R-squared is a statistical method that determines the goodness of fit.
• It measures the strength of the relationship between the dependent and
independent variables on a scale of 0-100%.
• The high value of R-square determines the less difference between the predicted
values and actual values and hence represents a good model.
• It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
Linear regression:
• It can be calculated from the below formula:
• SSregression is the sum of squares due to regression (explained sum of squares)

• SStotal is the total sum of squares
Linear regression:
Consider the following two variables x and y, you are required to calculate the R Squared in Regression.
Linear regression:
Problems with R-squared statistic
• The R-squared statistic isn’t perfect. In fact, it suffers from a major flaw.
Its value never decreases no matter the number of variables we add to our
regression model.
• That is, even if we are adding redundant variables to the data, the value of
R-squared does not decrease.
• It either remains the same or increases with the addition of new
independent variables.
• This clearly does not make sense because some of the independent
variables might not be useful in determining the target variable.
• Adjusted R-squared deals with this issue.
Adjusted R-squared statistic
• The Adjusted R-squared takes into account the number of independent
variables used for predicting the target variable.
• In doing so, we can determine whether adding new variables to the model
actually increases the model fit.
• Let’s have a look at the formula for adjusted R-squared to better
understand its working.
• Here,
• n represents the number of data points in our dataset
• k represents the number of independent variables, and
• R represents the R-squared values determined by the model.
• So, if R-squared does not increase significantly on the addition of a new
independent variable, then the value of Adjusted R-squared will actually
decrease.
• On the other hand, if on adding the new independent variable we see a
significant increase in R-squared value, then the Adjusted R-squared
value will also increase.
Logistic Regression in Machine Learning
• Logistic regression is one of the most popular Machine Learning algorithms,
which comes under the Supervised Learning technique. It is used for predicting
the categorical dependent variable using a given set of independent variables.
• Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
• In Logistic regression, instead of fitting a regression line, we fit an "S"
shaped logistic function, which predicts two maximum values (0 or 1).
• The curve from the logistic function indicates the likelihood of something
such as whether the cells are cancerous or not, etc.
• Logistic Regression is a significant machine learning algorithm because it
has the ability to provide probabilities and classify new data using
continuous and discrete datasets.
• Binary logistic regression - When we have two possible outcomes, like
example of whether a person is likely to be infected with COVID-19 or
not.
• Multinomial logistic regression - When we have multiple outcomes, say
if we build out our original example to predict whether someone may
have the flu, an allergy, a cold, or COVID-19.
• The hypothesis of logistic regression tends it to limit the cost function
between 0 and 1. Therefore linear functions fail to represent it as it can
have a value greater than 1 or less than 0 which is not possible as per the
hypothesis of logistic regression.
• In order to map predicted values to probabilities, we use the Sigmoid

function. The function maps any real value into another value between 0
and 1.
• When using linear regression we used a formula of the hypothesis i.e.
• hΘ(x) = β₀ + β₁X
• For logistic regression we are going to modify it a little bit i.e.
• σ(Z) = σ(β₀ + β₁X)

• We have expected that our hypothesis will give values between 0 and 1.
• Z = β₀ + β₁ X
• hΘ(x) = sigmoid(Z)
• i.e. hΘ(x) = 1/(1 + e^-(β₀ + β₁X)
Decision Boundary
• We expect our classifier to give us a set of outputs or classes based on
probability when we pass the inputs through a prediction function and
returns a probability score between 0 and 1.
• For Example, We have 2 classes, let’s take them like cats and dogs(1 —
dog , 0 — cats). We basically decide with a threshold value above which
we classify values into Class 1 and of the value goes below the threshold
then we classify it in Class 2.
• As shown in the above graph we have chosen the threshold as 0.5, if the
prediction function returned a value of 0.7 then we would classify this
observation as Class 1(DOG).
• If our prediction returned a value of 0.2 then we would classify the
observation as Class 2(CAT).
Cost Function
• For logistic regression, the Cost function is defined as:
• −log(hθ(x)) if y = 1
• −log(1−hθ(x)) if y = 0
• The above two functions can be compressed into a single function i.e.
• Now to minimize our cost function we need to run the gradient descent
function on each parameter i.e.
• The Logistic Regression model can be generalized to support multiple
classes directly, without having to train and combine multiple binary
classifiers. This is called Softmax Regression, or Multinomial Logistic
Regression.
• The idea is simple: when given an instance x, the Softmax Regression
model first computes a score sk(x) for each class k, then estimates the
probability of each class by applying the softmax function (also called the
normalized exponential) to the scores.
K-Nearest Neighbors Algorithm
• K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
• K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification
but mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.
• Example: Suppose, we have an image of a creature that looks similar to
cat and dog, but we want to know either it is a cat or dog. So for this
identification, we can use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of the new data
set to the cats and dogs images and based on the most similar features it
will put it in either cat or dog category.
• The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
• Step-6: Our model is ready.
• Firstly, we will choose the number of neighbors, so we will choose the
k=5.
• Next, we will calculate the Euclidean distance between the data points.
The Euclidean distance is the distance between two points, which we
have already studied in geometry. It can be calculated as:
• By calculating the Euclidean distance we got the nearest neighbors, as
three nearest neighbors in category A and two nearest neighbors in
category B. Consider the below image:
• As we can see the 3 nearest neighbors are from category A, hence this new data
point must belong to category A.
• How to select the value of K in the K-NN Algorithm?
• Below are some points to remember while selecting the value of K in the K-NN
algorithm:
• There is no particular way to determine the best value for "K", so we need to try
some values to find the best out of them. The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the effects
of outliers in the model.
• Large values for K are good, but it may lead to more calculations
• Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
• Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex some
time.
• The computation cost is high because of calculating the distance between
the data points for all the training samples.

Unit 7

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 7

Uploaded by

Copyright:

Available Formats

Linear Regression

• Linear Regression is a machine learning algorithm based on supervised

While training the model we are given :

• Cost function(J) of Linear Regression is the Root Mean Squared Error

• SSregression is the sum of squares due to regression (explained sum of squares)

• In order to map predicted values to probabilities, we use the Sigmoid

• For logistic regression we are going to modify it a little bit i.e.

• σ(Z) = σ(β₀ + β₁X)

You might also like