Professional Documents
Culture Documents
Introml 02 Regression Annotated PDF
Introml 02 Regression Annotated PDF
Machine Learning
Linear Regression
Waitlist updates
2
Basic Supervised Learning Pipeline
“ham”
Learning Classi- Predic-
method tion ?
fier
“spam” ?
f : :XX !
!YY f :X !Y
Prediction/
Model fitting Generalization
3
Regression
Instance of supervised learning
Goal: Predict real valued labels (possibly vectors)
Examples:
X Y
Flight route Delay (minutes)
Real estate objects Price
Customer & ad features Click-through probability
4
Running example: Diabetes
[Efron et al ‘04]
Features X:
Age
Sex
Body mass index
Average blood pressure
Six blood serum measurements (S1-S6)
Label (target) Y: quantitative measure of disease
progression
5
Regression
y
+
+ +
+ + +
+
+
+
x x
How should we measure goodness of fit?
7
Example: linear regression
y
+
+ +
+ + +
+
+
+
8
Homogeneous representation
9
Quantifying goodness of fit
D = {(x1 , y1 ), . . . , (xn , yn )} x i 2 Rd yi 2 R
y +
+ +
+ ++ + +
+
x
10
Least-squares linear regression optimization
[Legendre 1805, Gauss 1809]
Given data set D = {(x1 , y1 ), . . . , (xn , yn )}
n
X
w⇤ = arg min (yi w T xi )2
w
i=1
11
How to solve? Example: Scikit Learn
12
Demo
Disease progression
w⇤ = (XT X) 1
XT y
Hereby:
14
Method 2: Optimization
X
The objective function R̂(w) = (yi w T xi )2
i
is convex!
15
Gradient Descent
Start at an arbitrary w0 2 Rd
For t=1,2,... do wt+1 = wt ⌘t rR̂(wt )
16
Convergence of gradient descent
Under mild assumptions, if step size sufficiently small,
gradient descent converges to a stationary point
(gradient = 0)
For convex objectives, it therefore finds the
optimal solution!
17
Computing the gradient
18
Demo: Gradient descent
19
Choosing a stepsize
What happens if we choose a poor stepsize?
20
Adaptive step size
Can update the step size adaptively. Examples:
1) Via line search (optimizing step size every step)
21
Demo: Gradient Descent for Linear Regression
22
Gradient descent vs closed form
Why would one ever consider performing gradient
descent, when it is possible to find closed form solution?
Computational complexity
May not need an optimal solution
Many problems don‘t admit closed form solution
23
Other loss functions
So far: Measure goodness of fit via squared error
Many other loss functions possible (and sensible!)
24
Fitting nonlinear functions
How about functions like this:
+ +
+ +
++ + ++
+ + +
+
++ +
25
Linear regression for polynomials
We can fit non-linear functions via linear regression,
using nonlinear features of our data (basis functions)
d
X
f (x) = wi i (x)
i=1
26