Linear Models

Regression and Classification"
with Linear Models"

CMPSCI 383
Nov 15, 2011!
Todays topics"
Learning from Examples: brief review!
Univariate Linear Regression!
Batch gradient descent!
Stochastic gradient descent!
Multivariate Linear Regression!

Regularization!
Linear Classifiers!
Perceptron learning rule!
Logistic Regression!
Learning from Examples (supervised learning)"
Important issues"
Generalization !
Overfitting!
Cross-validation!
Holdout cross validation!
K-fold cross validation!
Leave-one-out cross-validation!
Model selection!
Recall Notation"
(x1, y1 ), (x 2 , y 2 ),K (x N , y N )
training set!
Where each y j was generated by !

an unknown function! y = f (x)
Discover a function h that best
approximates
the true function! f
hypothesis!
10
Loss Functions"
Suppose the true prediction for input x is f (x) = y
but the hypothesis gives h(x) = y
L(x, y, y ) = Utility(result of using y given input x)

Utility(result of using y given input x)
Simplified version : L(y, y )
Absolute value loss : L1 (y, y ) = y y
Squared error loss : L2 (y, y ) = ( y y )

0/1 loss :
L0 /1 (y, y ) = 0 if y = y , else 1
Generalization loss: expected loss over all possible examples!

Empirical loss: average loss over available examples!
11
Univariate Linear Regression"
12
Univariate Linear Regression contd."

w = [ w 0 ,w1 ]
weight vector!
hw (x) = w1 x + w 0
Find weight vector that minimizes empirical loss,
e.g., L2:!
Loss(hw ) = L2 (y j , hw (x j )) = (y j hw (x j )) 2 = (y j (w1 x j + w 0 )) 2
j =1
j =1
j =1
i.e., find w * such that!
w* = argmin w Loss(hw )
13
Weight Space"
14
Finding w*"
Find weights such that:!
N
N
2
(y j (w1 x j + w 0 )) = 0 and
(y j (w1 x j + w 0 )) 2 = 0
w 0 j =1
w1 j =1
15
Gradient Descent"
wi wi
Loss(w)
w i
step size or!
learning rate!
16
Gradient Descent contd."

For one training example (x, y) :!
w 0 w 0 + (y hw (x)) and w1 w1 + (y hw (x))x
For N training examples:!

w 0 w 0 + (y j hw (x j )) and w1 w1 + (y j hw (x j ))x j
j
batch gradient descent!

stochastic gradient descent: take a step for
one training example at a time!
17
The Multivariate case"

hsw (x j ) = w 0 + w1 x j,1 +L + w n x j,n = w 0 + w i x j,i
i
Augmented vectors: add a feature to each
x by tacking on a 1:! x j,0 = 1
Then:!
hsw (x j ) = w x j = wT x j = w i x j,i
And batch gradient descent update becomes:!
w i w i + (y j hw (x j ))x j,i
j
18
The Multivariate case contd."

Or, solving analytically:!
Let
be the vector of outputs for the training examples!
data matrix: each row is an input vector!
Solving this for w *:!
y = Xw
1
w* = ( XT X) XT y
pseudo inverse!
19
Regularization"
Cost(h) = EmpLoss(h) + Complexity(h)
Complexity(hw ) = Lq (w) = w i
20
L1 vs. L2 Regularization"
21
Linear Classification: hard thresholds"
22
Linear Classification: hard thresholds contd."

Decision Boundary:!
In linear case: linear separator, a hyperplane!
Linearly separable: !
data is linearly separable if the classes can be
separated by a linear separator!
Classification hypothesis:!
hw (x) = Threshold(w x) where Threshold(z) = 1 if z 0 and 0 otherwise
23
Perceptron Learning Rule"

For a single sample (x, y) :
w i w i + ( y hw (x)) x i
If the output is correct, i.e., y =h w (x), then the weights don't change
If y = 1 but hw (x) = 0, then w i is increased when x i is positive and decreased when x i is negative.
If y = 0 but hw (x) = 1, then w i is decreased when x i is positive and increased when x i is negative.
Perceptron Convergence Theorem: For any data set

thats linearly separable and any training procedure
that continues to present each training example, the
learning rule is guaranteed to find a solution in a finite
number of steps.!
24
Perceptron Performance"
25
Linear Classification with Logistic Regression"
An important function!!
26
Logistic Regression"
1
hw (x) = Logistic(w x) =
1+ e wx
For a single sample (x, y) and L2 loss function :

wi w i + ( y hw (x)) hw (x)(1 hw (x)) x i
derivative of logistic function!
27
Logistic Regression Performance"
separable case!
28
Summary"
Learning from Examples: brief review!
Loss functions!
Generalization!
Overfitting!
Cross-validation!
Regularization!
Univariate Linear Regression!

Batch gradient descent!
Stochastic gradient descent!
Multivariate Linear Regression!

Regularization!
Linear Classifiers!
Perceptron learning rule!
Logistic Regression!
29
Next Class"
Artificial Neural Networks, Nonparametric
Models, & Support Vector Machines!
Secs. 18.7 18.9!
30

Linear Models

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Models

Uploaded by

Copyright:

Available Formats

Regression and Classification"

with Linear Models"

Multivariate Linear Regression!

Learning from Examples (supervised learning)"

Learning from Examples (supervised learning)"

Learning from Examples (supervised learning)"

Learning from Examples (supervised learning)"

Learning from Examples (supervised learning)"

Learning from Examples (supervised learning)"

Where each y j was generated by !

Discover a function h that best

L(x, y, y ) = Utility(result of using y given input x)

Simplified version : L(y, y )

Absolute value loss : L1 (y, y ) = y y

Squared error loss : L2 (y, y ) = ( y y )

Generalization loss: expected loss over all possible examples!

Univariate Linear Regression"

Univariate Linear Regression contd."

i.e., find w * such that!

Gradient Descent contd."

For N training examples:!

batch gradient descent!

The Multivariate case"

Augmented vectors: add a feature to each

x by tacking on a 1:! x j,0 = 1

And batch gradient descent update becomes:!

The Multivariate case contd."

be the vector of outputs for the training examples!

data matrix: each row is an input vector!

Solving this for w *:!

Linear Classification: hard thresholds"

Linear Classification: hard thresholds contd."

Perceptron Learning Rule"

Perceptron Convergence Theorem: For any data set

Linear Classification with Logistic Regression"

For a single sample (x, y) and L2 loss function :

derivative of logistic function!

Logistic Regression Performance"

Univariate Linear Regression!

Multivariate Linear Regression!

You might also like