Professional Documents
Culture Documents
Class 2 B
Class 2 B
Learning:
Feature Scaling
• Idea: Ensure that feature have similar scales
Before Feature Scaling Aver Feature Scaling
20 20
15 15
✓2 10 ✓2 10
5 5
0 0
0 5 10 15 20 0 5 10 15 20
✓1 ✓1
• Makes gradient descent converge much faster
51
Feature StandardizaIon
• Rescales features to have zero mean and unit variance
Xn
1 (i)
– Let μj be the mean of feature j: j
µ = x
n i=1 j
– Replace each value with:
(i)
(i) xj µj for j = 1...d
xj (not x0!)
sj
• sj is the standard deviaIon of feature j
• Could also use the range of feature j (maxj – minj) for sj
Price
Price
Size Size Size
OverfiHng:
• The learned hypothesis may fit the training set very
well ( )
J(✓) ⇡ 0
• ...but fails to generalize to new examples
54
RegularizaIon
• Linear regression objecIve funcIon
Xn ⇣ ⇣ ⌘ ⌘2 XXdd
1
J(✓) = h✓ x(i) y (i) + ✓✓j2j2
2n 2 j=1
i=1 j=1
55
Understanding RegularizaIon
Xn ⇣ ⇣ ⌘ ⌘2 d
X
1
J(✓) = h✓ x(i) y (i) + ✓j2
2n i=1 2 j=1
X d
✓j2 = k✓1:d k22
• Note that
j=1
– This is the magnitude of the feature coefficient vector!
Size
0 Size
0 0 0
• Gradient update:
n ⇣
X ⇣ ⌘ ⌘
@ 1 (i)
@✓0
J(✓) ✓ 0 ✓ 0 ↵ h ✓ x y (i)
n i=1
Xn ⇣ ⇣ ⌘ ⌘
@ 1 (i)
@✓j
J(✓) ✓j ✓j ↵ h✓ x(i) y (i) xj ↵ ✓j
n i=1
regularizaIon
59
Regularized Linear Regression
1 X ⇣ ⇣ (i) ⌘ ⌘2
n d
X
J(✓) = h✓ x y (i) + ✓j2
2n i=1 2 j=1
Xn ⇣ ⇣ ⌘ ⌘
1
✓0 ✓0 ↵ h✓ x(i) y (i)
n i=1
Xn ⇣ ⇣ ⌘ ⌘
1 (i)
✓j ✓j ↵ h✓ x(i) y (i) xj ↵ ✓j
n i=1