Welcome to Scribd!

Regularization

Uploaded by

0% found this document useful (0 votes)

60 views14 pages

This chapter discusses various regularization techniques for deep learning models including L2 and L1 parameter regularization, dataset augmentation, noise injection, early stopping, and semi-supervised learning. It explains how regularization works to reduce overfitting by adding bias to increase generalization. Specific techniques covered include weight decay, dropout, adversarial training, and modifying optimization objectives.

Original Description:

Regularization

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

60 views14 pages

Regularization

Uploaded by

Sai Sumanth P

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 14

Search inside document

Chapter 7:

Regularization for
Deep Learning

Deep Learning Textbook Study Group, SF

Safak Ozkan
April 15, 2017

1 Safak Ozkan
Chapter 7: Regularization for Deep Learning

L2 Parameter Regularization
L1 Parameter Regularization
Norm Penalties and Constrained Optimization
Regularization and Under-Constrained Problems
Dataset Augmentation
Noise Robustness
Injecting Noise at Output Targets
Early Stopping
Semi Supervised Learning
Multi-Task Learning
Parameter Tying and Parameter Sharing
Bagging and Other Ensemble Methods
Dropout
Adversarial Training
Tangent Distance, Manifold Tangent Classifier
2 of 13 Safak Ozkan
Definition

Regularization is any modification we make to a

learning algorithm that is intended to reduce its
test error but NOT its training error.

Etrain : Training Error Etest : Test Error

(or Generalization Error)

3 of 13 Safak Ozkan
L2 Regularization
(a.k.a. Weight decay, Tikhonov regularization, Ridge regression)

Regularization increases bias and reduces variance.

Regularization
parameter

Regularized Regularization term

cost function Unregularized
Cost function

Gradient Descent update rule:

Additional term

4 of 13 Safak Ozkan
L2 Regularization

Lagrangian Constrained Optimization

Lagrangian
multiplier

is equivalent to optimizing
such that .

5 of 13 Safak Ozkan
L2 Regularization

Lagrangian Constrained Optimization

We typically dont set explicitly,
We set .

Unregularized
solution

Regularized Large small

solution constraint region

Large

6 of 13 Safak Ozkan
L2 Regularization

2nd degree Taylor Approximation of around :

unregularized
problem

At ,

Analysis through e-vector decomposition

Stretching in i th small eigen-directions will be affected

eigen direction: more than larger eigen-directions.

7 of 13 Safak Ozkan
L2 Regularization
Normal Equations for Linear Regression

Assume:

Then, would shrink

more than components.

covariance of input features

covariance of
with the target values.
input features

regularization causes the learning algorithm to

perceive the input with increased variance.

8 of 13 Safak Ozkan
L1 Regularization
(a.k.a. LASSO)

Regularization
Term

2nd degree Taylor Approximation

of around :

(Induces
Sparsity)

9 of 13 Safak Ozkan
Under-Constrained Problems
E.g. Logistic Regression

Linearly non seperable Linearly separable

Well behaved problem. Under-determined problem.

( will continue to increase
in a GD Algorithm)

10 of 13 Safak Ozkan
Data Augmentation
Best way to improve generalization of a model is
to train it on more data.
Data Augmentation works particularly well for
Object Recognition tasks.
Injecting noise to input works well for
Speech Recognition.
Affine Elastic
Distortion Noise Deformation
Original
Input Image

Horizontal Random Hue

Flip Translation Shift

11 of 13 Safak Ozkan
Noise Robustness
Addition of noise with a small variance is
equivalent to imposing norm penalty on weights.
Noise on weights: A stochastic implementation of
Bayesian Inference (uncertainty on weights are
represented by a probability distribution)

For each input data,

apply noise on weights

modified cost
function

regularization term

12 of 13 Safak Ozkan
Early Stopping

regularization
number of parameter
learning rate
steps
13 of 13 Safak Ozkan
Early Stopping
HAPTER 7. REGULARIZATION FOR DEEP LEARNING
Early stopping: Terminate while validation set
performance is better

0.20
Loss (negative log-likelihood)

Training set loss

0.15 Validation set loss

0.10

0.05

0.00
0 50 100 150 200 250
Time (epochs)

gure 7.3: Learning curves showing how the negative log-likelihood loss changes o
14 of 13 Safak Ozkan

Pset 1
Document5 pages
Pset 1
KennethFu
No ratings yet
DS 7001 Data Ecology Week Two Formative Assessment Final
Document2 pages
DS 7001 Data Ecology Week Two Formative Assessment Final
Fejiro Gbagi
100% (1)
Game Lesson Plan
Document3 pages
Game Lesson Plan
api-300676725
100% (1)
Learning Journal Unit 3 CS1104 Final
Document3 pages
Learning Journal Unit 3 CS1104 Final
Kareem Nabil
No ratings yet
Weekly Home Learning Plan For Online Distance Learning-Q 3 Week 1
Document8 pages
Weekly Home Learning Plan For Online Distance Learning-Q 3 Week 1
ROVELYNE DE LUNA
No ratings yet
LEARNING DECLARATIVE AND PROCEDURAL KNOWLEDGE VIA VIDEO LECTURES: Cognitive Load and Learning Effectiveness
Document17 pages
LEARNING DECLARATIVE AND PROCEDURAL KNOWLEDGE VIA VIDEO LECTURES: Cognitive Load and Learning Effectiveness
Jade Michelle Conciso Pabalinas
No ratings yet
Machine Learning Andrew NG Week 6 Quiz 1
Document8 pages
Machine Learning Andrew NG Week 6 Quiz 1
Hương Đặng
No ratings yet
Letter To God Q:Ans PDF
Document6 pages
Letter To God Q:Ans PDF
Arnav Singhal
No ratings yet
Child Directed Speech 2015
Document10 pages
Child Directed Speech 2015
api-315440977
No ratings yet
05-TrainingNN PDF
Document81 pages
05-TrainingNN PDF
Muhammad Rizwan Khalid
No ratings yet
CH4 - Linked List
Document65 pages
CH4 - Linked List
Rewina zerou
No ratings yet
855 Assignment No. 1 Computers in Education PDF
Document13 pages
855 Assignment No. 1 Computers in Education PDF
Muhammad Ali
No ratings yet
Happy Valentine Day SMS 2017 - Short SMS For BF and GF
Document5 pages
Happy Valentine Day SMS 2017 - Short SMS For BF and GF
Abid Tanveer
No ratings yet
Amdahl
Document2 pages
Amdahl
shadabghazali
No ratings yet
An Introduction To Probability: Answers To This Chapter's Questions
Document9 pages
An Introduction To Probability: Answers To This Chapter's Questions
Shekhar Raghav
No ratings yet
Chapter 2part1
Document74 pages
Chapter 2part1
Waikin Yong
No ratings yet
Interaction
Document4 pages
Interaction
joshua patilan
No ratings yet
Use Case Diagram
Document20 pages
Use Case Diagram
Syed Mohammad Safwan
No ratings yet
3 Regression
Document23 pages
3 Regression
JOJO
100% (1)
09 Informed Search
Document18 pages
09 Informed Search
Teddy Mwangi
No ratings yet
Fundamentals of Business Analytics
Document5 pages
Fundamentals of Business Analytics
Christine Joyce Magote
No ratings yet
Curve Fitting
Document4 pages
Curve Fitting
kh5892
No ratings yet
How To Down Load Amharic Poem
Document18 pages
How To Down Load Amharic Poem
belete asmamaw
No ratings yet
Bilingualism in The ESL Classroom A Bles PDF
Document14 pages
Bilingualism in The ESL Classroom A Bles PDF
Julian Arcila
No ratings yet
Rapid Minder Assignment
Document38 pages
Rapid Minder Assignment
Giani Suhikmat
No ratings yet
Curvefitting Manual PDF
Document54 pages
Curvefitting Manual PDF
Arup Kunti
No ratings yet
Smart Goals Template PDF
Document2 pages
Smart Goals Template PDF
tmt
100% (1)
Correlation and Chi-Square Test - LDR 280
Document71 pages
Correlation and Chi-Square Test - LDR 280
RohanPuthalath
100% (1)
Unit 5 - Compiler Design - WWW - Rgpvnotes.in
Document20 pages
Unit 5 - Compiler Design - WWW - Rgpvnotes.in
Akash
No ratings yet
Linear Regression With Gradient Descent
Document8 pages
Linear Regression With Gradient Descent
Gautam Kumar
100% (1)
Truth Tables
Document18 pages
Truth Tables
Carl Harvey Ociones
No ratings yet
Data Processing
Document112 pages
Data Processing
Jay Lester E. Dapat
No ratings yet
Machine Learning Notes
Document3 pages
Machine Learning Notes
honey13
No ratings yet
Maximum-Subarray Problem, Matrix Multiplication and Strassen's Algorithm
Document18 pages
Maximum-Subarray Problem, Matrix Multiplication and Strassen's Algorithm
Paksmiler
No ratings yet
Data Preprocessing: Preprocessing and SNP Calling
Document20 pages
Data Preprocessing: Preprocessing and SNP Calling
Fadhili
No ratings yet
GSAS Parameters & Controls What To Refine When? (Refinement Recipes)
Document14 pages
GSAS Parameters & Controls What To Refine When? (Refinement Recipes)
Alf Simpatico
No ratings yet
CS 4650/7650: Natural Language Processing: Neural Text Classification
Document85 pages
CS 4650/7650: Natural Language Processing: Neural Text Classification
Rahul Gautam
No ratings yet
Dlincv 161110052148 PDF
Document271 pages
Dlincv 161110052148 PDF
Raj Verma
No ratings yet
Python GTU Study Material Presentations Unit-5 20112020032922AM
Document24 pages
Python GTU Study Material Presentations Unit-5 20112020032922AM
Kushal Parmar
No ratings yet
Chapter 13 Nonlinear Simulations 1
Document21 pages
Chapter 13 Nonlinear Simulations 1
Sony Jsd
No ratings yet
ME L20 RootLocusLead
Document6 pages
ME L20 RootLocusLead
Rezkoda
No ratings yet
26 Neural Nets
Document77 pages
26 Neural Nets
damasodra33
No ratings yet
3 Regularizations
Document24 pages
3 Regularizations
MInh Thanh
No ratings yet
Eleven 19204 PDF
Document19 pages
Eleven 19204 PDF
Ram Bhagat Soni
No ratings yet
Chapter 3: Multiple Linear Regression (Estimation)
Document23 pages
Chapter 3: Multiple Linear Regression (Estimation)
Yissek Batalla
No ratings yet
ME451: Control Systems Course Roadmap
Document6 pages
ME451: Control Systems Course Roadmap
Vu Nghia
No ratings yet
Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai
Document14 pages
Optimization For ML (2) : CS771: Introduction To Machine Learning Piyush Rai
Raja
No ratings yet
Lec - 15-16 Denoising, Shrinkage and Other Transforms v4.0
Document4 pages
Lec - 15-16 Denoising, Shrinkage and Other Transforms v4.0
Nikesh Bajaj
No ratings yet
3 Regression Diagnostics
Document53 pages
3 Regression Diagnostics
Arda Hüseyinoğlu
100% (1)
DL6 - Convnets 4
Document57 pages
DL6 - Convnets 4
razifa0
No ratings yet
Tutorials On Design Expert
Document4 pages
Tutorials On Design Expert
Lea Grace Villazor Guilot
No ratings yet
Lecture 5
Document31 pages
Lecture 5
Reema Amgad
No ratings yet
3 - DeepLearning - and - CNN v3
Document50 pages
3 - DeepLearning - and - CNN v3
Dumidu Ghanasekara
No ratings yet
Unit 3 - Diving - Deep - Learning
Document108 pages
Unit 3 - Diving - Deep - Learning
Alekhya Roy
No ratings yet
1sensitivity Lecture Slides
Document59 pages
1sensitivity Lecture Slides
Khaled Hamdaoui
No ratings yet
Twentyone 20466 PDF
Document15 pages
Twentyone 20466 PDF
Ram Bhagat Soni
No ratings yet
Advanced Regression Pres
Document42 pages
Advanced Regression Pres
AnatoliiBalakiriev
No ratings yet
Laplace Transformation - Part 1
Document25 pages
Laplace Transformation - Part 1
Jamiza shenning
No ratings yet
Lecture 18 Intro To Psych
Document3 pages
Lecture 18 Intro To Psych
Rohan Godiyal
No ratings yet
Adaptive Tests of Significance Using Permutations of Residuals with R and SAS
From Everand
Adaptive Tests of Significance Using Permutations of Residuals with R and SAS
Thomas W. O'Gorman
No ratings yet
Partial Differential Equations of Applied Mathematics
From Everand
Partial Differential Equations of Applied Mathematics
Erich Zauderer
Rating: 3.5 out of 5 stars
3.5/5 (1)
MSR (Initialization Better Than Xavier)
Document9 pages
MSR (Initialization Better Than Xavier)
Sai Sumanth P
No ratings yet
Neural Network Based Energy Efficient Clustering and Routing in Wireless Sensor Networks
Document6 pages
Neural Network Based Energy Efficient Clustering and Routing in Wireless Sensor Networks
Sai Sumanth P
No ratings yet
10 SLS I Key
Document1 page
10 SLS I Key
Sai Sumanth P
No ratings yet
Sat 2004 Maths Questions
Document3 pages
Sat 2004 Maths Questions
Sai Sumanth P
No ratings yet
MCQs On Correlation and Regression Analysis 1 PDF
Document3 pages
MCQs On Correlation and Regression Analysis 1 PDF
narayanasmrithi
No ratings yet
Suresh-Sparkling Time Series Forecasting Project Report
Document73 pages
Suresh-Sparkling Time Series Forecasting Project Report
ARCHANA R
No ratings yet
MiM Predictive Analytics Sessions 1 2 (PCA)
Document26 pages
MiM Predictive Analytics Sessions 1 2 (PCA)
Dev Jain
No ratings yet
Correlation and Significance
Document2 pages
Correlation and Significance
Alicia Williams
No ratings yet
PowerPoint CH 03b
Document50 pages
PowerPoint CH 03b
Hafsah Ahmad najib
No ratings yet
House Price Prediction: Group Name: Bug Free
Document32 pages
House Price Prediction: Group Name: Bug Free
Arka Khawash
No ratings yet
Statistics - Central Tendency
Document3 pages
Statistics - Central Tendency
angelaaax
No ratings yet
Package Biglasso': R Topics Documented
Document21 pages
Package Biglasso': R Topics Documented
nfbeusebio3962
No ratings yet
5 Types Regression in 45 Lines of Code
Document43 pages
5 Types Regression in 45 Lines of Code
Teto Schedule
100% (1)
Chi Square
Document10 pages
Chi Square
Abhishek
No ratings yet
Assumption College of Nabunturan: P-1 Poblacion Nabunturan, Compostela Valley 8800 Email
Document7 pages
Assumption College of Nabunturan: P-1 Poblacion Nabunturan, Compostela Valley 8800 Email
Jemuel Luminarias
100% (1)
Hypothesis Testing
Document26 pages
Hypothesis Testing
Azmuth
No ratings yet
Andreas Cellarius Hypothesis, Showing The Planetary Motions
Document10 pages
Andreas Cellarius Hypothesis, Showing The Planetary Motions
anilsai kode
No ratings yet
An Example of Attribute Based MDS Using Discriminant Analysis
Document17 pages
An Example of Attribute Based MDS Using Discriminant Analysis
yashpandya01
No ratings yet
Time Series and Analysis
Document11 pages
Time Series and Analysis
Nereah Debrah
No ratings yet
03 The Nature of Data
Document28 pages
03 The Nature of Data
chteo1976
No ratings yet
Detailed Lesson Plan in Mathematics
Document5 pages
Detailed Lesson Plan in Mathematics
Noli Asuro
No ratings yet
Fraenkel and Wallen
Document8 pages
Fraenkel and Wallen
Stephanie Umali
No ratings yet
Predictive Analysis For Retail Banking
Document28 pages
Predictive Analysis For Retail Banking
Mai Nguyễn Thị
No ratings yet
Analiza Discriminanta
Document3 pages
Analiza Discriminanta
Bodnaras Adrian
No ratings yet
Midterm Exam II (20%) : Winter 2007 Date: Monday 2 April Instructor: Dr. Nizar Bouguila Time: 14:45-16:45
Document13 pages
Midterm Exam II (20%) : Winter 2007 Date: Monday 2 April Instructor: Dr. Nizar Bouguila Time: 14:45-16:45
sanjay_css
No ratings yet
Internals Answers
Document53 pages
Internals Answers
Harish Kumar
No ratings yet
Data Set JAMOVI - ALUBA
Document6 pages
Data Set JAMOVI - ALUBA
SHEEN ALUBA
No ratings yet
Mostly Harmless Statistics Formula Packet: Chapter 3 Formulas
Document10 pages
Mostly Harmless Statistics Formula Packet: Chapter 3 Formulas
kay
No ratings yet
Row 1 Row 2 Row 3 Row 4 Column 1 Column 2 Column 3 Column 4
Document10 pages
Row 1 Row 2 Row 3 Row 4 Column 1 Column 2 Column 3 Column 4
Jacie Tupas
No ratings yet
A Comparison of The Mahalanobis-Taguchi System To A Standard Statistical Method For Defect Detection
Document9 pages
A Comparison of The Mahalanobis-Taguchi System To A Standard Statistical Method For Defect Detection
김형진
No ratings yet
Mantel-Haenszel Test: M M N N
Document11 pages
Mantel-Haenszel Test: M M N N
Irene Wambui
No ratings yet
7.1 Introduction To Data Analysis
Document63 pages
7.1 Introduction To Data Analysis
dru
No ratings yet
CH 17
Document3 pages
CH 17
Heba Sami
No ratings yet