Professional Documents
Culture Documents
3 Regularizations
3 Regularizations
Shusen Wang
The ℓ" -Norm Regularization
Linear Regression
Input: feature matrix 𝐗 ∈ ℝ&×( and labels 𝐲 ∈ ℝ& .
(
Output: vector 𝐰 ∈ ℝ such that 𝐗𝐰 ≈ 𝐲.
Task
Linear Regression
Input: feature matrix 𝐗 ∈ ℝ&×( and labels 𝐲 ∈ ℝ& .
(
Output: vector 𝐰 ∈ ℝ such that 𝐗𝐰 ≈ 𝐲.
Task
• Least squares regression:
1 "
min 𝐗𝐰 − 𝐲 .
𝐰 & " Methods
• Ridge regression:
1 " "
min 𝐗𝐰 − 𝐲 + 𝛾 𝐰 "
.
𝐰 & "
KLMN 𝐗O 𝐗 P&Q
• Ridge regression: 𝜅 = . (𝛾 ↑, 𝜅 ↓).
KLRS 𝐗O 𝐗 P&Q
−𝑡 𝑡
𝑤1
−𝑡
The ℓ1 -Norm Constraint
1 "
• LASSO: min 𝐗𝐰 − 𝐲 ; s. t. 𝐰 1
≤ 𝑡.
𝐰 "& "
• It is a convex optimization model.
• The optimal solution 𝐰 ⋆ is sparse (i.e., most entries are zeros).
• Smaller 𝑡 è sparser 𝐰 ⋆ .
The ℓ1 -Norm Constraint
1 "
• LASSO: min 𝐗𝐰 − 𝐲 ; s. t. 𝐰 1
≤ 𝑡.
𝐰 "& "
• It is a convex optimization model.
• The optimal solution 𝐰 ⋆ is sparse (i.e., most entries are zeros).
• Smaller 𝑡 è sparser 𝐰 ⋆ .
• Sparsity feature selection. Why?
• Let 𝐱 ^ be a test feature vector.
1 "
• Another form: min 𝐗𝐰 − 𝐲 + 𝛾 𝐰 1
.
𝐰 "& "
Loss Function
1
• Linear regression: 𝐿 𝐰; 𝐱 d , 𝑦d = 𝐰 7 𝐱 d − 𝑦d "
"
• Logistic regression: 𝐿 𝐰; 𝐱 d , 𝑦d = log 1 + exp −𝑦d 𝐰 7 𝐱 d
• SVM: 𝐿 𝐰; 𝐱 d , 𝑦d = max 0, 1 − 𝑦d 𝐰 7 𝐱 d
Regularized ERM
• Regularized empirical risk minimization:
1
mina ∑&df1 𝐿(𝐰; 𝐱 d , 𝑦d ) + 𝑅(𝐰).
𝐰∈ℝ &
Regularization
• ℓ1 -norm: 𝑅 𝐰 =𝛾 𝐰 1
"
• ℓ" -norm: 𝑅 𝐰 =𝛾 𝐰 "
"
• Elastic net: 𝑅 𝐰 = 𝛾1 𝐰 1
+ 𝛾" 𝐰 "