Professional Documents
Culture Documents
42 98451err
42 98451err
Logistic Regression is a popular Machine Learning algorithm that is part of the Supervised Learning
Technique. It is used to predict the categorical dependent variable from a set of
independent variables.
source:https://bit.ly/3Qgu6h0
A categorical dependent variable's output is predicted using logistic regression. As a result, the result
must be a categorical or discrete value. It can be Yes or No, 0 or 1, True or False, and so on, but
instead of giving the exact values as 0 and 1, it gives the probabilistic values that fall between 0 and
1.
source:https://bit.ly/3MMLEhW
Except for how they are used, Logistic Regression and Linear Regression are very similar. Logistic
Regression is used to solve classification problems, whereas Linear Regression is used to solve
regression problems.
Instead of fitting a regression line, we fit a "S" shaped logistic function that predicts two maximum
values in logistic regression (0 or 1).
The logistic function curve indicates the likelihood of something such as whether the cells are
cancerous or not, whether a mouse is obese or not based on its weight, and so on.
Logistic Regression can be used to classify observations using various types of data and can
quickly determine the most effective variables for classification.
Logistic Function
The Sigmoid Function is a mathematical function that is used to convert predicted values into
probabilities. It converts any real value between 0 and 1 into another value. The logistic
regression value must be between 0 and 1, and it cannot exceed this limit, resulting in a curve similar
to the "S" form. The Sigmoid function or logistic function is another name for the S-form curve.
The concept of the threshold value is used in logistic regression to define the probability of either 0 or
1. For example, values above the threshold value tend to be 1, while values below the threshold
value tend to be 0.
The target variables in binary logistic regression must always be binary, and the desired
outcome is represented by the factor level 1.
The model should not have any multicollinearity, which means that the independent
variables must be independent of one another.
In order for our model to be meaningful, we must include meaningful variables.
For logistic regression, we should use a large sample size.
Regression Models
The most basic type of logistic regression is binary or binomial logistic regression, in which the target
or dependent variable can only be of two types. either 0 or 1.
Multinomial logistic regression is another useful type of logistic regression in which the target or
dependent variable can have three or more possible unordered types, i.e., types with no
quantitative significance.
Binomial or Binary
In this type of classification, a dependent variable will only have two possible values: 1 or 0. These
variables could, for example, represent success or failure, yes or no, win or lose, and so on.
Multinomial
The dependent variable in such a classification can have three or more possible unordered types
or types with no quantitative significance. These variables could, for example, represent "Type A,"
"Type B," or "Type C."
Ordinal
In this type of classification, the dependent variable can have three or more possible ordered
types or types with quantitative significance. For example, these variables could represent "poor" or
"good," "very good," or "Excellent," and each category could have a score of 0, 1, 2, or 3.
KNN-Algorithm
K-Nearest Neighbor is a simple Machine Learning algorithm that uses the Supervised
Learning technique.
The K-NN algorithm assumes similarity between the new case/data and existing cases
and places the new case in the category that is most similar to the existing categories.
The K-NN algorithm can be used for both regression and classification, but it is most
commonly used for classification problems.
K-NN is a non-parametric algorithm, which means it makes no assumptions about the
underlying data.
It is also known as a lazy learner algorithm because it does not immediately learn from the
training set; instead, it stores the dataset and then performs an action on it during
classification.
During the training phase, the KNN algorithm simply stores the dataset, and when new data
is received, it classifies it into a category that is very similar to the new data.
Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we
want to know whether it is a cat or dog. So for this identification, we can use the KNN algorithm,
as it works on a similarity measure. Our KNN model will find the similar features of the new data
set to the cats and dogs images and based on the most similar features it will put it in either cat
or dog category.
KNN Working
source:https://bit.ly/3OkGc7b
A very low value for K, such as K=1 or K=2, can be noisy and cause outlier effects in the model.
Large values for K are preferable, but they may cause complications.