Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Lecture 5

Regularization

Dr. Le Huu Ton

Hanoi, 2018
Outline

Overfitting Problem

Regularization

Regularization with Linear Regression

Regularization with Logistic Regression

2
Outline

Overfitting Problem

Regularization

Regularization with Linear Regression

Regularization with Logistic Regression

2
Overfitting Problem

Under Fitting OK

h ( x)   0  1 x h ( x)   0  1 x   2 x 2

Over Fitting

h ( x)   0  1 x   2 x 2  3 x 3   4 x 4  ...
4
Overfitting Problem
x2 x2 OK
Under Fitting

x1 x1

h ( x)  g ( 0  1 x1   2 x2 ) h ( x )  g ( 0  1 x1   2 x2
x2   3 x12   4 x22   5 x1 x2 )
Over Fitting

h ( x )  g ( 0  1 x1 
 2 x2   3 x12   4 x22 
x1  5 x13   6 x23  ...)
Overfitting Problem

Under fitting:
Under fitting refers to a model that can neither model the training
data not generalize to new data.
An under fit machine learning model is not a suitable model and will
be obvious as it will have poor performance on the training data.

Over Fitting :
Overfitting happens when a model learns the detail and noise in the
training data to the extent that it negatively impacts the
performance on the model on new data.

6
Outline

Overfitting Problem

Regularization

Regularization with Linear Regression

Regularization with Logistic Regression

2
Regularization

Regularization is a technique used in an attempt to solve the


overfitting problem.

Regularization is done by reduce the magnitude of some coefficient


θj

8
Overfitting Problem

Over Fitting OK

 0  1 x   2 x 2
 0  1 x   2 x 2  3 x3   4 x 4

Regularization: reduce value of θ3 and θ4


Minimize the cost function
1 m
E    
2m i 1
( h ( x (i )
)  y ) 99993  9999 4
(i ) 2

 3  0,   4  0
Regularization

Small values of coefficients  0 ,1 ,... n


 Simpler hypothesis h(x)
 Less prone to overfitting

Regularization: Add a regularization component into the cost


function
1 m n
E    [ (h ( x )  y )     j ]
(i ) (i ) 2 2

2m i 1 j 1

Regularization component
10
Regularization

Question:
What if  is set by a extremely large number ( too large for our
problem), which of the following statement is correct:
1. The algorithm works fine
2. Algorithm fail to eliminate overfitting
3. Algorithm results in under fitting
4. Gradient descent will fail to converge

11
Outline

Overfitting Problem

Regularization

Regularization with Linear Regression

Regularization with Logistic Regression

2
Regularization with Linear Regression

Regularization:
Minimize the Cost Function

1 m n
E    [ (h ( x )  y )     j ]
(i ) (i ) 2 2

2m i 1 j 1

Gradient descent:


 j :  j   E ( )
 j

13
Regularization with Linear Regression

Gradient Descent:
Repeat until converged:
{ 1 m
 0 :  0  
m
 (
i 1
h ( x (i )
)  y (i )
) x0
i

1 m 
 j :  j    (h( x )  y ) x j   j j  1: n
(i ) (i ) i

m i 1 m

 1 m
 j :  j (1   )    (h( x (i ) )  y (i ) ) xij j  1: n
m m i 1
}

14
Regularization with Linear Regression

Normal Equation without regularization:

  (X X ) X Y
T 1 T

Normal Equation with regularization

0 0 ... 0
0 1 0 0 1 T
  (X X  
T
) X Y
... ... 1 ...
0 0 0 1

15
Outline

Overfitting Problem

Regularization

Regularization with Linear Regression

Regularization with Logistic Regression

2
Regularization with Logistic Regression

Logistic Regression: Minimize the cost function

1 m (i )  n
E ( )    [ y log h ( x (i ) )  (1  y (i ) ) log(1  h ( x (i ) )]    2
j
m i 1 2m j 1

Gradient descent:

 j :  j   E ( )
 j

17
Regularization with Logistic Regression

Gradient Descent:
Repeat until converged:
{ 1 m
 0 :  0  
m
 (
i 1
h ( x (i )
)  y (i )
) x0
i

1 m 
 j :  j    [(h( x )  y ) x0 ]   j j  1: n
(i ) (i ) i

m i 1 m
 1 m
 j :  j (1   )    (h( x (i ) )  y (i ) ) x0i j  1: n
m m i 1
}

18
Regularization with Logistic Regression

Newton’s Method with Regularization

 t 1 :  t  H 1 E
1

E   m
 ( h ( x (i )
)  y (i )
) x0
i

 0 1 
 E  ...  m  (h( x )  y ) x1  1
(i ) (i ) i

m
 ...
E  
 n 1 
m
 (h( x )  y ) xn   n
(i ) (i ) i

19
Regularization with Logistic Regression

Hessian Matric:

0 0 ... 0
1 m 0 1 0 0
H    h( x )(1  h( x ) x ( x )   
(i ) (i ) (i ) (i ) T

m i 1 ... ... 1 ...


0 0 0 1

20
Regularization with Logistic Regression

21
References

http://openclassroom.stanford.edu/MainFolder/CoursePage.php?c
ourse=MachineLearning
Andrew Ng Slides:
https://www.google.com.vn/url?sa=t&rct=j&q=&esrc=s&source=w
eb&cd=2&cad=rja&uact=8&sqi=2&ved=0ahUKEwjNt4fdvMDPAhXI
n5QKHZO1BSgQFggfMAE&url=https%3A%2F%2Fdatajobs.com%2F
data-science-repo%2FGeneralized-Linear-Models-%5BAndrew-
Ng%5D.pdf&usg=AFQjCNGq37q2uVFcpGhNqH-
5KZSlJ_HSxg&sig2=vnCEvyvKQGCuryttAPcokw&bvm=bv.134495766
,d.dGo

22

You might also like