Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Everything you’ve ever wanted

to know about linear classifiers

Image source
Outline
• Formalization of statistical learning of classifiers
• Ways to train linear classifiers

Linear regression

Perceptron training algorithm

Logistic regression

Support vector machines
• Gradient descent and stochastic gradient descent
Formalization
• Let’s focus on statistical learning of a parametric model in a
supervised scenario

Image source
Formalization
Given: training data { ( x i , y i ) : 1 ≤ i ≤ n }
• Find: predictor y =f ( x )
• Goal: make good predictions on test data

Source: Y. Liang
Formalization
• Given: training data { ( x i , y i ) : 1 ≤ i ≤ n }
• Find: predictor y =f ( x )
• Goal: make good predictions on test data

What kinds of function

Source: Y. Liang
Formalization
• Given: training data { ( x i , y i ) : 1 ≤ i ≤ n }
• Find: predictor y =f ( x ) ∈ H
• Goal: make good predictions on test data

Hypothesis class

Source: Y. Liang
Formalization
• Given: training data { ( x i , y i ) : 1 ≤ i ≤ n }
• Find: predictor y =f ( x ) ∈ H
• Goal: make good predictions on test data

Connection betwee
training and test data

Source: Y. Liang
Formalization
• Given: training data { ( x i , y i ) : 1 ≤ i ≤ n } i.i.d. from distribution
• Find: predictor y =f ( x ) ∈ H
• Goal: make good predictions on test data
i.i.d. from distribution

Same
distribution

Source: Y. Liang
Formalization
• Given: training data { ( x i , y i ) : 1 ≤ i ≤ n } i.i.d. from distribution
• Find: predictor y =f ( x ) ∈ H
• Goal: make good predictions on test data
i.i.d. from distribution

What kind of performance


measure?

Source: Y. Liang
Formalization
• Given: training data i.i.d. from distribution
• Find: predictor
• S.t. the expected loss is small:
L (f )=Ex , y [ l (f , x , y )]

Various loss functions

Source: Y. Liang
Formalization
• Given: training data i.i.d. from distribution
• Find: predictor
• S.t. the expected loss is small:

• Example losses:

• 0-1 loss : l (f )=I [ f ( x )≠ y ] and L ( f ) = Pr [ f ( x ) ≠ y ]


• Sq error loss : l ( f , x , y )=[ f ( x ) − y ]
2
and L ( f ) = E [ f ( x ) − y ] 2

Source: Y. Liang
Formalization
• Given: training data i.i.d. from distribution
• Find: predictor
• S.t. the expected loss is small:

Can’t optimize this directly

Source: Y. Liang
Formalization
• Given: training data i.i.d. from distribution
• Find: predictor
• s.t. the expected loss is small

Empirical

Source: Y. Liang
Supervised learning in a nutshell
1. Collect training data and labels
2. Specify model: select hypothesis class and loss function
3. Train model: find the function in the hypothesis class that
minimizes the empirical loss on the training data
Training linear classifiers
• Given: i.i.d. training data
Training linear classifiers
• Given: i.i.d. training data

• Hypothesis class:

• Classification with bias, i.e.


can be reduced to the case w/o bias by letting

Source: Y. Liang

You might also like