Professional Documents
Culture Documents
Regularization
Regularization
Regularization
Hanoi, 2018
Outline
Overfitting Problem
Regularization
2
Outline
Overfitting Problem
Regularization
2
Overfitting Problem
Under Fitting OK
h ( x) 0 1 x h ( x) 0 1 x 2 x 2
Over Fitting
h ( x) 0 1 x 2 x 2 3 x 3 4 x 4 ...
4
Overfitting Problem
x2 x2 OK
Under Fitting
x1 x1
h ( x) g ( 0 1 x1 2 x2 ) h ( x ) g ( 0 1 x1 2 x2
x2 3 x12 4 x22 5 x1 x2 )
Over Fitting
h ( x ) g ( 0 1 x1
2 x2 3 x12 4 x22
x1 5 x13 6 x23 ...)
Overfitting Problem
Under fitting:
Under fitting refers to a model that can neither model the training
data not generalize to new data.
An under fit machine learning model is not a suitable model and will
be obvious as it will have poor performance on the training data.
Over Fitting :
Overfitting happens when a model learns the detail and noise in the
training data to the extent that it negatively impacts the
performance on the model on new data.
6
Outline
Overfitting Problem
Regularization
2
Regularization
8
Overfitting Problem
Over Fitting OK
0 1 x 2 x 2
0 1 x 2 x 2 3 x3 4 x 4
3 0, 4 0
Regularization
2m i 1 j 1
Regularization component
10
Regularization
Question:
What if is set by a extremely large number ( too large for our
problem), which of the following statement is correct:
1. The algorithm works fine
2. Algorithm fail to eliminate overfitting
3. Algorithm results in under fitting
4. Gradient descent will fail to converge
11
Outline
Overfitting Problem
Regularization
2
Regularization with Linear Regression
Regularization:
Minimize the Cost Function
1 m n
E [ (h ( x ) y ) j ]
(i ) (i ) 2 2
2m i 1 j 1
Gradient descent:
j : j E ( )
j
13
Regularization with Linear Regression
Gradient Descent:
Repeat until converged:
{ 1 m
0 : 0
m
(
i 1
h ( x (i )
) y (i )
) x0
i
1 m
j : j (h( x ) y ) x j j j 1: n
(i ) (i ) i
m i 1 m
1 m
j : j (1 ) (h( x (i ) ) y (i ) ) xij j 1: n
m m i 1
}
14
Regularization with Linear Regression
(X X ) X Y
T 1 T
0 0 ... 0
0 1 0 0 1 T
(X X
T
) X Y
... ... 1 ...
0 0 0 1
15
Outline
Overfitting Problem
Regularization
2
Regularization with Logistic Regression
1 m (i ) n
E ( ) [ y log h ( x (i ) ) (1 y (i ) ) log(1 h ( x (i ) )] 2
j
m i 1 2m j 1
Gradient descent:
j : j E ( )
j
17
Regularization with Logistic Regression
Gradient Descent:
Repeat until converged:
{ 1 m
0 : 0
m
(
i 1
h ( x (i )
) y (i )
) x0
i
1 m
j : j [(h( x ) y ) x0 ] j j 1: n
(i ) (i ) i
m i 1 m
1 m
j : j (1 ) (h( x (i ) ) y (i ) ) x0i j 1: n
m m i 1
}
18
Regularization with Logistic Regression
t 1 : t H 1 E
1
E m
( h ( x (i )
) y (i )
) x0
i
0 1
E ... m (h( x ) y ) x1 1
(i ) (i ) i
m
...
E
n 1
m
(h( x ) y ) xn n
(i ) (i ) i
19
Regularization with Logistic Regression
Hessian Matric:
0 0 ... 0
1 m 0 1 0 0
H h( x )(1 h( x ) x ( x )
(i ) (i ) (i ) (i ) T
20
Regularization with Logistic Regression
21
References
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?c
ourse=MachineLearning
Andrew Ng Slides:
https://www.google.com.vn/url?sa=t&rct=j&q=&esrc=s&source=w
eb&cd=2&cad=rja&uact=8&sqi=2&ved=0ahUKEwjNt4fdvMDPAhXI
n5QKHZO1BSgQFggfMAE&url=https%3A%2F%2Fdatajobs.com%2F
data-science-repo%2FGeneralized-Linear-Models-%5BAndrew-
Ng%5D.pdf&usg=AFQjCNGq37q2uVFcpGhNqH-
5KZSlJ_HSxg&sig2=vnCEvyvKQGCuryttAPcokw&bvm=bv.134495766
,d.dGo
22