Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Logistic regression

Logistic Regression
• Logistic regression is one of the most common machine learning
algorithms. It can be used to predict the probability of an event
occurring, such as whether an incoming email is spam or not, or
whether a tumor is malignant or not, based on a given labeled data
set.
• The model has the word “logistic” in its name, since it uses
the logistic function (sigmoid) to convert a linear combination of the
input features into probabilities
• Recall that in supervised ML problems, we are given a training set
of n labeled samples: D = {(x₁, y₁), (x₂, y₂), … , (xn, yn)}, where xᵢ is a m-
dimensional vector that contains the features of sample i,
and yᵢ represents the label of that sample. Our goal is to build a
model whose predictions are as close as possible to the true labels.
• In classification problems, the label yᵢ can take one of k values,
representing the k classes to which the samples belong. More
specifically, in binary classification problems, the label yᵢ can assume
only two values:
• 0 (representing the negative class)
• and 1 (representing the positive class).
Logistic Regression Model
• Logistic regression is a probabilistic classifier that handles binary
classification problems. Given a sample (x, y), it outputs a
probability p that the sample belongs to the positive class

• If this probability is higher than some threshold value (typically


chosen as 0.5), then the sample is classified as 1, otherwise it is
classified as 0.
• How does the model estimate the probability p?
Logistic Regression Model
• The basic assumption in logistic regression is that the log-odds of the
event that the sample belongs to the positive class is a linear
combination of its features.

• Log-odds (called logit) is the logarithm of the odds ratio:


Logistic Regression model
• In logistic regression, we assume that the log odds is a linear
combination of the features

• where w = (w₀, …, w_n) are the parameters (or weights) of the


model. The parameter w₀ is often called the intercept (or bias).
• We can find a direct correlation between p and the parameters w, by
exponentiating both sides of the log-odds equation:

• where σ is the sigmoid function (also known as the logistic function):


The following diagram summarizes the computational process of
logistic regression starting from the inputs until the final prediction:
Log loss
• Our goal is to find the parameters w that will make the model’s
predictions p= σ(wᵗx) as close as possible to the true labels y.
• We need to define a loss function that will measure how far our
model’s predictions are from the true labels. This function needs to
be differentiable, so it can be optimized using techniques such as
gradient descent
• The loss function used by logistic regression is called log
loss (or logistic loss). It is defined as follows:

• The log loss equals to 0 only in case of a perfect prediction (p = 1


and y = 1, or p = 0 and y = 0), and approaches infinity as the
prediction gets worse (i.e., when y = 1 and p → 0 or y = 0 and p → 1).
• Cost function:

• The gradient update rule:

• where α is a learning rate that controls the step size (0 < α < 1).

You might also like