Professional Documents
Culture Documents
2 Linear
2 Linear
Basics
Artificial Intelligence
Machine Learning
Deep
Learning
Task
Occlusions
Background clutter
Representation
?
I2DL: Prof. Niessner 11
Nearest Neighbor
NN classifier = dog
Task Experience
Image
Data
classification
DOG DOG
Model Params
CAT
CAT DOG
CAT
DOG DOG
Model Params
CAT
𝜃∗ = argmin σ𝑖 D(𝑀𝜃 𝐼𝑖 − 𝑌𝑖 )
𝜃
CAT DOG
“Distance” function {DOG, CAT}
I2DL: Prof. Niessner 19
Basic Recipe for Machine Learning
• Split your data
Task Experience
Performance
Image
measure Data
classification
Accuracy
DOG DOG
CAT
CAT DOG
DOG DOG
CAT
CAT DOG
DOG DOG
CAT
CAT DOG
Reinforcement learning
interaction
Agents Environment
Reinforcement learning
reward
Agents Environment
Reinforcement learning
reward
Agents Environment
𝒙
I2DL: Prof. Niessner 41
Linear Regression
Training
Testing
𝑦ො𝑖 = 𝑥𝑖𝑗 𝜃𝑗
𝑗=1
weights (i.e., model parameters)
𝜃0
𝒙
I2DL: Prof. Niessner 45
Linear Prediction
Outside Number of 𝑥3
𝑥1 𝜃1 𝜃3
temperature people
Temperature
𝜃2 of a building 𝜃4
Level of Sun
𝑥2 𝑥4
humidity exposure
Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
I2DL: Prof. Niessner 49
Linear Prediction
How do we
obtain the
model?
Temperature MODEL
of the building 0.2
𝑦ො1 0.64
1 25 50 2 50
= ⋅ 0
𝑦ො2 1 − 10 50 0 10
1
0.14
I2DL: Prof. Niessner 50
How to Obtain the Model?
Labels (ground truth)
Data points
𝑦
𝐗
Optimization
Loss
function
Prediction:
Temperature
of the building
Prediction:
Temperature
of the building
𝑛 Objective function
Minimizing 1
𝐽 𝜽 = 𝑦ො𝑖 − 𝑦𝑖 2 Energy
𝑛
I2DL: Prof. Niessner
𝑖=1 Cost function 55
Optimization: Linear Least Squares
• Linear least squares: an approach to fit a linear model
to the data
𝑛
1 2
min 𝐽 𝜽 = 𝑦ො𝑖 − 𝑦𝑖
𝜃 𝑛
𝑖=1
min 𝐽 𝜽 = 𝐗𝜽 − 𝒚 𝑇 (𝐗𝜽 − 𝒚)
𝜽
Convex
𝜕𝐽(𝜽)
=0
𝜕𝜽
Optimum
I2DL: Prof. Niessner 60
Optimization Details in the
exercise
𝜕𝐽(𝜃) session!
= 2𝐗 𝑇 𝐗𝜽 − 2𝐗 𝑇 𝐲 = 0
𝜕𝜃
𝜃 = 𝐗𝑇 𝐗 −1 𝐗 𝑇 𝐲
Controlled by parameter(s)
𝑝𝑚𝑜𝑑𝑒𝑙 (𝐲|𝐗, 𝜽)
“i.i.d.” assumption
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 73
Back to Linear Regression
Gaussian Noise
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) Mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 74
Back to Linear Regression
Gaussian Noise
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = ? Mean
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 75
Back to Linear Regression
1
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽 2
𝑝(𝑦𝑖 |𝐱 𝑖 , 𝜽) = 2𝜋𝜎 𝑒 2𝜎 2
Gaussian:
1 1
− 2 𝑦𝑖 −𝜇 2
𝑝 𝑦𝑖 = 𝑒 2𝜎 𝑦𝑖 ~ 𝒩(𝜇, 𝜎 2 )
I2DL: Prof. Niessner
2𝜋𝜎 2 76
Back to Linear Regression
1 2
2 −1/2 − 𝑦𝑖 −𝐱 𝒊 𝜽
𝑝 𝑦𝑖 𝐱 𝑖 , 𝜽 = 2𝜋𝜎 𝑒 2𝜎2
𝑛
Original
optimization 𝜽𝑴𝑳 = arg max log 𝑝𝑚𝑜𝑑𝑒𝑙 (𝑦𝑖 |𝐱 𝑖 , 𝜽)
𝜽
problem 𝑖=1
𝑛 𝑛
1 2
1 2
− log 2𝜋𝜎 + − 2 𝑦𝑖 − 𝒙𝒊 𝜽
2 2𝜎
𝑖=1 𝑖=1
Matrix notation
𝑛 2
1 𝑇
− log 2𝜋𝜎 − 2 𝒚 − 𝑿𝜽 𝒚 − 𝑿𝜽
2 2𝜎
I2DL: Prof. Niessner 78
Back to Linear Regression
𝑛
𝜽 = 𝑿𝑇 𝑿 −1 𝑿𝑇 𝐲
𝜃1 Can be interpreted
𝑥1 Σ as a probability
𝜃2
0
𝑥2 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝜃1 Can be interpreted
𝑥1 Σ as a probability
𝜃2
0
𝑥2 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝐲ො = 𝑝 𝐲 = 1 𝐗, 𝜽 = ෑ 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝑖=1
𝐲ො = 𝑝 𝐲 = 1 𝐗, 𝜽 = ෑ 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝑖=1
Bernoulli trial
Model for 𝜙 , if 𝑧 = 1
𝑝 𝑧𝜙 = 𝜙𝑧 1−𝜙 1−𝑧
=ቊ
coins 1 − 𝜙, if 𝑧 = 0
The prediction of
our sigmoid
I2DL: Prof. Niessner 87
Logistic Regression
• Probability of a binary output
𝑛
𝐲ො = 𝑝 𝐲 = 1 𝐗, 𝜽 = ෑ 𝑝(𝑦𝑖 = 1|𝐱 𝑖 , 𝜽)
𝑖=1
𝑛
Model for
𝑦𝑖 (1−𝑦𝑖 ) coins
𝐲ො = ෑ 𝑦ො𝑖 1 − 𝑦ො𝑖
𝑖=1
• Cost function 𝑛
1
𝐶 𝜃 = − ℒ 𝑦ො𝑖 , 𝑦𝑖
𝑛 𝑦ො𝑖 = 𝜎(𝐱 𝑖 𝜽)
𝑖=1
Minimization 1
𝑛
Gradient descent –
later on!