Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

LOGISTIC REGRESSION AND K-NN ALGORITHM

Logistic Regression is a popular Machine Learning algorithm that is part of the Supervised Learning
Technique. It is used to predict the categorical dependent variable from a set of
independent variables.

source:https://bit.ly/3Qgu6h0

A categorical dependent variable's output is predicted using logistic regression. As a result, the result
must be a categorical or discrete value. It can be Yes or No, 0 or 1, True or False, and so on, but
instead of giving the exact values as 0 and 1, it gives the probabilistic values that fall between 0 and
1.

source:https://bit.ly/3MMLEhW

Except for how they are used, Logistic Regression and Linear Regression are very similar. Logistic
Regression is used to solve classification problems, whereas Linear Regression is used to solve
regression problems.

Instead of fitting a regression line, we fit a "S" shaped logistic function that predicts two maximum
values in logistic regression (0 or 1).

The logistic function curve indicates the likelihood of something such as whether the cells are
cancerous or not, whether a mouse is obese or not based on its weight, and so on.

© 2024 Athena Global Education. All Rights Reserved


Logistic Regression is an important machine learning algorithm because it can provide
probabilities and classify new data using both continuous and discrete datasets.

Logistic Regression can be used to classify observations using various types of data and can
quickly determine the most effective variables for classification.

Logistic Function

The Sigmoid Function is a mathematical function that is used to convert predicted values into
probabilities. It converts any real value between 0 and 1 into another value. The logistic
regression value must be between 0 and 1, and it cannot exceed this limit, resulting in a curve similar
to the "S" form. The Sigmoid function or logistic function is another name for the S-form curve.

The concept of the threshold value is used in logistic regression to define the probability of either 0 or
1. For example, values above the threshold value tend to be 1, while values below the threshold
value tend to be 0.

Assumptions of Logistic Regression

The target variables in binary logistic regression must always be binary, and the desired
outcome is represented by the factor level 1.
The model should not have any multicollinearity, which means that the independent
variables must be independent of one another.
In order for our model to be meaningful, we must include meaningful variables.
For logistic regression, we should use a large sample size.

Regression Models

Binary Logistic Regression model

The most basic type of logistic regression is binary or binomial logistic regression, in which the target
or dependent variable can only be of two types. either 0 or 1.

Multinomial Logistic Regression model

Multinomial logistic regression is another useful type of logistic regression in which the target or
dependent variable can have three or more possible unordered types, i.e., types with no
quantitative significance.

© 2024 Athena Global Education. All Rights Reserved


Types of Logistic Regression

Binomial or Binary

In this type of classification, a dependent variable will only have two possible values: 1 or 0. These
variables could, for example, represent success or failure, yes or no, win or lose, and so on.

Multinomial

The dependent variable in such a classification can have three or more possible unordered types
or types with no quantitative significance. These variables could, for example, represent "Type A,"
"Type B," or "Type C."

Ordinal

In this type of classification, the dependent variable can have three or more possible ordered
types or types with quantitative significance. For example, these variables could represent "poor" or
"good," "very good," or "Excellent," and each category could have a score of 0, 1, 2, or 3.

KNN-Algorithm

K-Nearest Neighbor is a simple Machine Learning algorithm that uses the Supervised
Learning technique.
The K-NN algorithm assumes similarity between the new case/data and existing cases
and places the new case in the category that is most similar to the existing categories.
The K-NN algorithm can be used for both regression and classification, but it is most
commonly used for classification problems.
K-NN is a non-parametric algorithm, which means it makes no assumptions about the
underlying data.
It is also known as a lazy learner algorithm because it does not immediately learn from the
training set; instead, it stores the dataset and then performs an action on it during
classification.
During the training phase, the KNN algorithm simply stores the dataset, and when new data
is received, it classifies it into a category that is very similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we
want to know whether it is a cat or dog. So for this identification, we can use the KNN algorithm,
as it works on a similarity measure. Our KNN model will find the similar features of the new data
set to the cats and dogs images and based on the most similar features it will put it in either cat
or dog category.

© 2024 Athena Global Education. All Rights Reserved


source:https://bit.ly/3zUT0xd

KNN Working

How to Select the Value of K

source:https://bit.ly/3OkGc7b

© 2024 Athena Global Education. All Rights Reserved


There is no specific way to determine the best value for "K," so we must experiment with various
values to find the best one. The most preferred K value is 5.

A very low value for K, such as K=1 or K=2, can be noisy and cause outlier effects in the model.

Large values for K are preferable, but they may cause complications.

© 2024 Athena Global Education. All Rights Reserved

You might also like