Professional Documents
Culture Documents
09 Regularization
09 Regularization
09 Regularization
Regularization
Regularization: consists in adding a penalty term to the loss function:
This technique is used in both regression and classification problems. One of the most
common form is quadratic regularization:
Why regularization?
To counteract the effects of overfitting.
To make the optimization problem better conditioned.
Note: the tuning parameter λ must be determined separately as for the model complexity.
Roberto Diversi Learning and Estimation of Dynamical Systems M – p. 1/6
Regularization
N
ε2 (t) + λ θ T θ = !Y − Φ θ!2 + λ !θ!22
!
minV (θ) : V (θ) =
t=1
This estimation method is called Ridge regression. By setting the gradient of V (θ) to zero
it is easy to find the normal equations
" T
Φ Φ + λ Ip θ = ΦT Y
#
where Ip is the p × p identity matrix and p is the complexity of the model. The
regularized LS estimator is thus given by
T
#−1
ΦT Y
"
θ̂ = Φ Φ + λ Ip
The penalty term λ !θ!22 is also called a shrinkage penalty in statistics because it has
the effect of shrinking the estimated parameters towards zero.
2 T
#−1 T T
#−1 2
(ΦT Φ)−1
" "
cov(θ̂) = σw Φ Φ + λ Ip Φ Φ Φ Φ + λ Ip < σw
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5
-1.5 -2
0 100 200 300 400 500 600 0 100 200 300 400 500 600
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
0 100 200 300 400 500 600 0 100 200 300 400 500 600
Regularization can be exploited also for the identification of dynamic models and for
classification problems. In general:
For every λ there is some K such that the solutions θ̂ of the two optimization
problems are the same. The two approaches can be related using Lagrange
multipliers.
The parameter K can be seen as a budget for how large the norm of θ can be.
Note that K plays the role of 1/λ. In fact, decreasing K has the effect of shrinking
the estimated parameters towards zero.