Download as pdf or txt
Download as pdf or txt
You are on page 1of 91

Machine Learning

Basics

I2DL: Prof. Niessner 1


AI vs ML vs DL

Artificial Intelligence

Machine Learning

Deep
Learning

I2DL: Prof. Niessner 2


A Simple Task:
Image Classification

I2DL: Prof. Niessner 3


Image Classification

Task

I2DL: Prof. Niessner 4


Image Classification

I2DL: Prof. Niessner 5


Image Classification

Occlusions

I2DL: Prof. Niessner 6


Image Classification

Background clutter

I2DL: Prof. Niessner 7


Pose Illumination Appearance
I2DL: Prof. Niessner 8
Image Classification

Representation

I2DL: Prof. Niessner 9


A Simple Classifier

I2DL: Prof. Niessner 10


Nearest Neighbor

?
I2DL: Prof. Niessner 11
Nearest Neighbor
NN classifier = dog

I2DL: Prof. Niessner


distance 12
Nearest Neighbor

k-NN classifier = cat

I2DL: Prof. Niessner


distance 13
Nearest Neighbor
The Data NN Classifier 5NN Classifier

How does the NN classifier perform on training data?


What classifier is more likely to perform best on test data?
What are we actually learning?
I2DL: Prof. Niessner Source: https://commons.wikimedia.org/wiki/File:Data3classes.png
14
Nearest Neighbor
L1 distance : |𝑥 − 𝑐|
• Hyperparameters L2 distance : ||𝑥 − 𝑐||2
No. of Neighbors: 𝑘

• These parameters are problem dependent.

• How do we choose these hyperparameters?

I2DL: Prof. Niessner 15


Machine Learning for
Classification

I2DL: Prof. Niessner 16


Machine Learning
• How can we learn to perform image classification?

Task Experience

Image
Data
classification

I2DL: Prof. Niessner 17


Machine Learning
• 𝑀𝜃 𝐼 = {DOG, CAT}
CAT

Model Image Class Label

DOG DOG
Model Params

CAT

CAT DOG

I2DL: Prof. Niessner 18


Machine Learning
• 𝑀𝜃 𝐼 = {DOG, CAT} Given 𝒊 images with train labels

CAT

Model Image Class Label

DOG DOG
Model Params

CAT
𝜃∗ = argmin σ𝑖 D(𝑀𝜃 𝐼𝑖 − 𝑌𝑖 )
𝜃
CAT DOG
“Distance” function {DOG, CAT}
I2DL: Prof. Niessner 19
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%


train validation test

Find model params 𝜃

Other splits are also possible (e.g., 80%/10%/10%)


I2DL: Prof. Niessner 20
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%


train validation test

Find your hyperparameters

Other splits are also possible (e.g., 80%/10%/10%)


I2DL: Prof. Niessner 21
Basic Recipe for Machine Learning
• Split your data

60% 20% 20%


train validation test

Find your hyperparameters


Test set is only used once!

I2DL: Prof. Niessner 22


Machine Learning
• How can we learn to perform image classification?

Task Experience
Performance
Image
measure Data
classification

Accuracy

I2DL: Prof. Niessner 23


Machine Learning
Unsupervised learning Supervised learning

• Labels or target classes

I2DL: Prof. Niessner 27


Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

I2DL: Prof. Niessner 28


Machine Learning
Unsupervised learning Supervised learning
CAT
• No label or target class
• Find out properties of
the structure of the DOG DOG
data
CAT
• Clustering (k-means,
PCA, etc.) CAT DOG

I2DL: Prof. Niessner 29


Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

I2DL: Prof. Niessner 30


Machine Learning
Unsupervised learning Supervised learning
CAT

DOG DOG

CAT

CAT DOG

I2DL: Prof. Niessner 31


Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

interaction
Agents Environment

I2DL: Prof. Niessner 33


Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

reward
Agents Environment

I2DL: Prof. Niessner 34


Machine Learning
Unsupervised learning Supervised learning

Reinforcement learning

reward
Agents Environment

I2DL: Prof. Niessner 35


Linear Decision Boundaries
Let’s start with a simple linear Model!

What are the pros


and cons for using
linear decision
boundaries?

I2DL: Prof. Niessner 39


Linear Regression

I2DL: Prof. Niessner 40


Linear Regression
• Supervised learning
• Find a linear model that explains a target 𝒚 given
inputs 𝒙
𝒚

𝒙
I2DL: Prof. Niessner 41
Linear Regression
Training

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽


Data points Model parameters

Input (e.g., image,


measurement) Labels
(e.g., cat/dog)

I2DL: Prof. Niessner 42


Linear Regression
can be parameters of a
Training Neural Network

{𝒙1:𝑛 , 𝒚1:𝑛 } Learner 𝜽


Data points Model parameters

Testing

𝑥𝑛+1 , 𝜽 Predictor 𝑦ො𝑛+1


Estimation
I2DL: Prof. Niessner 43
Linear Prediction
• A linear model is expressed in the form
𝑑 input dimension

𝑦ො𝑖 = ෍ 𝑥𝑖𝑗 𝜃𝑗
𝑗=1
weights (i.e., model parameters)

Input data, features

I2DL: Prof. Niessner 44


Linear Prediction
• A linear model is expressed in the form
𝑑

𝑦ො𝑖 = 𝜃0 + ෍ 𝑥𝑖𝑗 𝜃𝑗 = 𝜃0 + 𝑥𝑖1 𝜃1 + 𝑥𝑖2 𝜃2 + ⋯ + 𝑥𝑖𝑑 𝜃𝑑


𝑗=1
𝒚
bias

𝜃0

𝒙
I2DL: Prof. Niessner 45
Linear Prediction
Outside Number of 𝑥3
𝑥1 𝜃1 𝜃3
temperature people

Temperature
𝜃2 of a building 𝜃4
Level of Sun
𝑥2 𝑥4
humidity exposure

I2DL: Prof. Niessner 46


Linear Prediction
𝑦ො1 𝑥11 ⋯ 𝑥1𝑑 𝜃1
𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃
= 𝜃0 + ⋮ ⋱ ⋮ ∙ 2
⋮ ⋮
𝑦ො𝑛 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0


𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃1
= 1 ⋮ ⋱ ⋮
֜ 𝐲ො = 𝐗𝜽
⋮ ⋮ ⋮
𝑦ො𝑛 1 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑

I2DL: Prof. Niessner 47


Linear Prediction
𝐲ො = 𝐗𝜽 Input features
Prediction (one sample has 𝑑
features)

𝑦ො1 1 𝑥11 ⋯ 𝑥1𝑑 𝜃0


𝑦ො2 𝑥21 ⋯ 𝑥2𝑑 𝜃1
= 1 ⋮ ⋱ ⋮ Model
⋮ ⋮ ⋮
𝑦ො𝑛 1 𝑥𝑛1 ⋯ 𝑥𝑛𝑑 𝜃𝑑 parameters
(𝑑 weights and 1 bias)

I2DL: Prof. Niessner 48


Linear Prediction

Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
I2DL: Prof. Niessner 49
Linear Prediction
How do we
obtain the
model?
Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
I2DL: Prof. Niessner 50
How to Obtain the Model?
Labels (ground truth)
Data points
𝑦
𝐗
Optimization
Loss
function

Model parameters Estimation


𝜃 𝑦ො
I2DL: Prof. Niessner 51
How to Obtain the Model?

• Loss function: measures how good my estimation is


(how good my model is) and tells the optimization
method how to make it better.

• Optimization: changes the model in order to improve


the loss function (i.e., to improve my estimation).

I2DL: Prof. Niessner 52


Linear Regression: Loss Function

Prediction:
Temperature
of the building

I2DL: Prof. Niessner 53


Linear Regression: Loss Function

Prediction:
Temperature
of the building

I2DL: Prof. Niessner 54


Linear Regression: Loss Function

𝑛 Objective function
Minimizing 1
𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 2 Energy
𝑛
I2DL: Prof. Niessner
𝑖=1 Cost function 55
Optimization: Linear Least Squares
• Linear least squares: an approach to fit a linear model
to the data
𝑛
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖
𝜃 𝑛
𝑖=1

• Convex problem, there exists a closed-form solution


that is unique.

I2DL: Prof. Niessner 56


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

𝑛 training samples The estimation comes


from the linear model

I2DL: Prof. Niessner 57


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation


𝜽

𝑛 training samples, 𝑛 labels


each input vector has
size 𝑑

I2DL: Prof. Niessner 58


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚) Matrix notation


𝜽

More on matrix notation in the next exercise session

I2DL: Prof. Niessner 59


Optimization: Linear Least Squares
𝑛 𝑛
1 2
1 2
min 𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖 = ෍ 𝐱 𝑖 𝜽 − 𝑦𝑖
𝜽 𝑛 𝑛
𝑖=1 𝑖=1

min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚)
𝜽

Convex
𝜕𝐽(𝜽)
=0
𝜕𝜽

Optimum
I2DL: Prof. Niessner 60
Optimization Details in the
exercise
𝜕𝐽(𝜃) session!
= 2𝐗 𝑇 𝐗𝜽 − 2𝐗 𝑇 𝐲 = 0
𝜕𝜃

𝜃 = 𝐗𝑇 𝐗 −1 𝐗 𝑇 𝐲

We have found True output:


an analytical Inputs: Outside
Temperature of
solution to a temperature,
the building
convex problem number of people,

I2DL: Prof. Niessner 62
Is this the best Estimate?
• Least squares estimate
𝑛
1 2
𝐽 𝜽 = ෍ 𝑦ො𝑖 − 𝑦𝑖
𝑛
𝑖=1

I2DL: Prof. Niessner 63


Maximum Likelihood

I2DL: Prof. Niessner 64


Maximum Likelihood Estimate

𝑝𝑑𝑎𝑡𝑎 (𝐲|𝐗) True underlying distribution

𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽) Parametric family of distributions

Controlled by parameter(s)

I2DL: Prof. Niessner 65


Maximum Likelihood Estimate
• A method of estimating the parameters of a statistical
model given observations,

𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)

Observations from 𝑝𝑑𝑎𝑡𝑎 (𝐲|𝐗)

I2DL: Prof. Niessner 66


Maximum Likelihood Estimate
• A method of estimating the parameters of a statistical
model given observations, by finding the parameter
values that maximize the likelihood of making the
observations given the parameters.

𝜽𝑴𝑳 = arg max 𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)


𝜽

I2DL: Prof. Niessner 68


Maximum Likelihood Estimate
• MLE assumes that the training samples are
independent and generated by the same probability
distribution
𝑛

𝑝𝑚𝑜𝑑𝑒𝑙 𝐲 𝐗, 𝜽 = ෑ 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝑖=1

“i.i.d.” assumption

I2DL: Prof. Niessner 69


Maximum Likelihood Estimate
𝑛

𝜽𝑴𝑳 = arg max ෑ 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜽
𝑖=1

𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜽
𝑖=1

Logarithmic property log 𝑎𝑏 = log 𝑎 + log 𝑏

I2DL: Prof. Niessner 70


Back to Linear Regression
𝑛

𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜽
𝑖=1

What shape does our


probability distribution
have?

I2DL: Prof. Niessner 71


Back to Linear Regression
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) What shape does our probability
distribution have?

I2DL: Prof. Niessner 72


Back to Linear Regression
Gaussian Noise
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) Mean

Assuming 𝑦𝑖 = 𝐱 𝑖 𝜽 + 𝜎𝑖 with 𝜎𝑖 ~ 𝒩(0, 𝜎 2 )

Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 73
Back to Linear Regression
Gaussian Noise
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) Mean

Assuming 𝑦𝑖 = 𝐱 𝑖 𝜽 + 𝜎𝑖 with 𝜎𝑖 ~ 𝒩(0, 𝜎 2 )


𝑦𝑖 ~ 𝒩 𝐱 𝑖 𝜽, 𝜎 2

Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 74
Back to Linear Regression
Gaussian Noise
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = ? Mean

Assuming 𝑦𝑖 = 𝐱 𝑖 𝜽 + 𝜎𝑖 with 𝜎𝑖 ~ 𝒩(0, 𝜎 2 )


𝑦𝑖 ~ 𝒩 𝐱 𝑖 𝜽, 𝜎 2

Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 75
Back to Linear Regression
1
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽 2
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) = 2𝜋𝜎 𝑒 2𝜎 2

Assuming 𝑦𝑖 = 𝐱 𝑖 𝜽 + 𝜎𝑖 with 𝜎𝑖 ~ 𝒩(0, 𝜎 2 )


𝑦𝑖 ~ 𝒩 𝐱 𝑖 𝜽, 𝜎 2

Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 76
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2

𝑛
Original
optimization 𝜽𝑴𝑳 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)
𝜽
problem 𝑖=1

I2DL: Prof. Niessner 77


Back 𝑛to Linear Regression
1 1
2 −2 − 2 𝑦𝑖 −𝒙𝒊 𝜽 2
෍ log 2𝜋𝜎 𝑒 2𝜎
𝑖=1 Canceling log and 𝑒

𝑛 𝑛
1 2
1 2
෍ − log 2𝜋𝜎 + ෍ − 2 𝑦𝑖 − 𝒙𝒊 𝜽
2 2𝜎
𝑖=1 𝑖=1
Matrix notation

𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝒚 − 𝑿𝜽 𝒚 − 𝑿𝜽
2 2𝜎
I2DL: Prof. Niessner 78
Back to Linear Regression
𝑛

𝜃𝑀𝐿 = arg max ෍ log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)


𝜃
𝑖=1
𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝐲 − 𝐗𝜽 𝐲 − 𝐗𝜽
2 2𝜎
Details in the How can we find
𝜕𝐽(𝜽)
=0 the estimate of
exercise session! 𝜕𝜽 theta?

𝜽 = 𝑿𝑇 𝑿 −1 𝑿𝑇 𝐲

I2DL: Prof. Niessner 79


Linear Regression
• Maximum Likelihood Estimate (MLE) corresponds to
the Least Squares Estimate (given the assumptions)

• Introduced the concepts of loss function and


optimization to obtain the best model for regression

I2DL: Prof. Niessner 80


Image Classification

I2DL: Prof. Niessner 81


Regression vs Classification

• Regression: predict a continuous output value (e.g.,


temperature of a room)

• Classification: predict a discrete value


– Binary classification: output is either 0 or 1
– Multi-class classification: set of N classes

I2DL: Prof. Niessner 82


Logistic Regression
CAT classifier

I2DL: Prof. Niessner 83


Sigmoid for Binary Predictions
1
𝑥0 𝜎 𝑥 =
𝜃0 1 + 𝑒 −𝑥
1

𝜃1 Can be interpreted
𝑥1 Σ as a probability

𝜃2
0
𝑥2 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)

I2DL: Prof. Niessner 84


Spoiler Alert: 1-Layer Neural Network
1
𝑥0 𝜎 𝑥 =
𝜃0 1 + 𝑒 −𝑥
1

𝜃1 Can be interpreted
𝑥1 Σ as a probability

𝜃2
0
𝑥2 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)

I2DL: Prof. Niessner 85


Logistic Regression
• Probability of a binary output
𝑛

𝐲ො = 𝑝 𝐲 = 1 𝐗, 𝜽 = ෑ 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝑖=1

The prediction of 𝑦ො𝑖 = 𝜎(𝐱 𝑖 𝜽)


our sigmoid

I2DL: Prof. Niessner 86


Logistic Regression
• Probability of a binary output
𝑛

𝐲ො = 𝑝 𝐲 = 1 𝐗, 𝜽 = ෑ 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝑖=1

Bernoulli trial

Model for 𝜙 , if 𝑧 = 1
𝑝 𝑧𝜙 = 𝜙𝑧 1−𝜙 1−𝑧
=ቊ
coins 1 − 𝜙, if 𝑧 = 0
The prediction of
our sigmoid
I2DL: Prof. Niessner 87
Logistic Regression
• Probability of a binary output
𝑛

𝐲ො = 𝑝 𝐲 = 1 𝐗, 𝜽 = ෑ 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝑖=1
𝑛
Model for
𝑦𝑖 (1−𝑦𝑖 ) coins
𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1

Prediction of the True labels: 0 or 1


I2DL: Prof. Niessner Sigmoid: continuous 88
Logistic Regression: Loss Function
• Probability of a binary output
𝑛
𝑦𝑖 (1−𝑦𝑖 )
𝑝 y 𝐗, 𝜽 = 𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1

• Maximum Likelihood Estimate

𝜽𝑴𝑳 = arg max log 𝑝 y 𝐗, 𝜽


𝜽

I2DL: Prof. Niessner 89


Logistic Regression: Loss Function
𝑛
𝑦𝑖 (1−𝑦𝑖 )
𝑝 y 𝐗, 𝜽 = 𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1
𝑛
𝑦𝑖 (1−𝑦𝑖 )
෍ log 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1

෍ 𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )


𝑖=1
I2DL: Prof. Niessner 90
Logistic Regression: Loss Function
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + 1 − 𝑦𝑖 log(1 − 𝑦ො𝑖 )]
Maximize likelihood
by minimizing the
loss function 𝑦𝑖 = 1 ℒ 𝑦ො𝑖 , 1 = −log 𝑦ො𝑖

Maximize! 𝜽𝑴𝑳 = arg max log 𝑝 y 𝐗, 𝜽


𝜽

I2DL: Prof. Niessner 91


Logistic Regression: Loss Function
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + 1 − 𝑦𝑖 log(1 − 𝑦ො𝑖 )]

𝑦𝑖 = 1 ℒ 𝑦ො𝑖 , 1 = −log 𝑦ො𝑖

To minimize ℒ 𝑦ො𝑖 , 𝑦𝑖 ,we want log 𝑦ො𝑖 large; since


logarithm is a monotonically increasing function, we
want a large 𝑦ො𝑖 .
(1 is the largest value our model’s estimate can take!)
I2DL: Prof. Niessner 92
Logistic Regression: Loss Function
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + 1 − 𝑦𝑖 log(1 − 𝑦ො𝑖 )]

𝑦𝑖 = 1 ℒ 𝑦ො𝑖 , 1 = −log 𝑦ො𝑖


𝑦𝑖 = 0 ℒ 𝑦ො𝑖 , 0 = −log 1 − 𝑦ො𝑖

We want log 1 − 𝑦ො𝑖 large; so we want 𝑦ො𝑖 to be small

(0 is the smallest value our model’s estimate can take!)

I2DL: Prof. Niessner 93


Logistic Regression: Loss Function
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + 1 − 𝑦𝑖 log(1 − 𝑦ො𝑖 )]

Referred to as binary cross-entropy loss (BCE)

• Related to the multi-class loss you will see in this


course (also called softmax loss)

I2DL: Prof. Niessner 94


Logistic Regression: Optimization
• Loss function
ℒ 𝑦ො𝑖 , 𝑦𝑖 = −[𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )]

• Cost function 𝑛
1
𝐶 𝜃 = − ෍ ℒ 𝑦ො𝑖 , 𝑦𝑖
𝑛 𝑦ො𝑖 = 𝜎(𝐱 𝑖 𝜽)
𝑖=1

Minimization 1
𝑛

= − ෍ 𝑦𝑖 log 𝑦ො𝑖 + (1 − 𝑦𝑖 ) log(1 − 𝑦ො𝑖 )


𝑛
I2DL: Prof. Niessner 𝑖=1 95
Logistic Regression: Optimization
• No closed-form solution

• Make use of an iterative method → gradient descent

Gradient descent –
later on!

I2DL: Prof. Niessner 97


Why Machine Learning so Cool
• We can learn from experience
-> Intelligence, certain ability to infer the future!

• Even linear models are often pretty good for


complex phenomena: e.g., weather:
– Linear combination of day-time, day-year etc. is often
pretty good

I2DL: Prof. Niessner 98


Next Lectures

• Next exercise session: Math Recap II

• Next Lecture: Lecture 3:


– Jumping towards our first Neural Networks and
Computational Graphs

I2DL: Prof. Niessner 102


References for further Reading
• Cross validation:
– https://medium.com/@zstern/k-fold-cross-validation-
explained-5aeba90ebb3
– https://towardsdatascience.com/train-test-split-and-
cross-validation-in-python-80b61beca4b6

• General Machine Learning book:


– Pattern Recognition and Machine Learning. C. Bishop.

I2DL: Prof. Niessner 103


See you next week ☺

I2DL: Prof. Niessner 104

You might also like