Professional Documents
Culture Documents
CH 1
CH 1
Introduction
What is Machine Learning?
Input spaces can be large
Data visualization
Supervised Learning
{
Setosa if petal length < 2.45
f (x; θ) =
Versicolor or Virginica, otherwise
Empirical risk
1 ∑
N
L(θ) ≜ I(yn ̸= f (x; θ))
N
n=1
1 ∑
N
L(θ) ≜ ℓ(y, f (xn ; θ))
N
n=1
Training
1 ∑
N
θ∗ = argmin L(θ) = argmin ℓ(yn , f (xn ; θ))
θ θ N
n=1
f (x; θ) = b + wT x = b + w1 x1 + w2 x2 + · · · + wD xD
θ = (b, w) are model parameters, known as bias and weights
• to reduce notational clutter, we often absorb the bias term
into the weights by defining w̃ = [b, w1 , · · · , wD ] and
x̃ = [1, x1 , · · · , xD ] so that
w̃T x̃ = b + wT x
• common to use
ℓ(y, f (x; θ)) = − log p(y|f (x; θ))
assigns a high probability to the true output y for each
corresponding input x
• average negative log probability of the training set is given by
1 ∑
N
NLL(θ) = − log p(yn |f (xn ; θ))
N
n=1
known as negative log likelihood
• compute maximum likelihood estimate
∗
θmle = argminNNL(θ)
θ
Regression
1 ∑
N
MSE(θ) = (yn − f (xn ; θ))2
N
n=1
Uncertainty in regression tasks
• f (x; θ) = b + wx
• least squares solution θ∗ = argminMSE(θ)
θ
• if we have multiple input features, we can write
Multiple linear regression
1 ∑
L(θ; Dtrain ) = ℓ(y, f (x; θ))
|Dtrain |
(x,y)∈Dtrain