Welcome to Scribd!

Skip carousel

BN Layer

Uploaded by

mavoho1719

0% found this document useful (0 votes)

7 views4 pages

Original Title

bn_layer

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

7 views4 pages

BN Layer

Uploaded by

mavoho1719

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 4

Search inside document

Chapter 1

Batch Normalization layer

You have a minibatch of data and you are taking it through your network. You have inserted
batch normalization layers through the layer which take your input x and make sure that every
feature dimension across the batch you have unit gaussian activations.

Basic Idea

Batch normalization potentially helps in two ways: faster learning and higher overall accuracy.
The improved method also allows you to use a higher learning rate, potentially providing another
boost in speed.

Why does this work? Well, we know that normalization (shifting inputs to zero-mean and unit
variance) is often used as a pre-processing step to make the data comparable across features.
As the data flows through a deep network, the weights and parameters adjust those values,
sometimes making the data too big or too small again - a problem the authors refer to as
”internal covariate shift”. By normalizing the data in each mini-batch, this problem is largely
avoided.

Basically, rather than just performing normalization once in the beginning, you’re doing it all
over place. Of course, this is a drastically simplified view of the matter (since for one thing, I’m
completely ignoring the post-processing updates applied to the entire network), but hopefully
this gives a good high-level overview.

Adding batch normalization normally slows 30%.

Extended explanation

So image you have N samples in your minibatch and D features / neuron activations at
some points. So this matrix of X is the input to the batch normalization layer. Evaluates

1
2 Chapter 1 Batch Normalization layer

empirical mean and variance along every feature. So it makes sure that each column of X is
a unit gaussian. You can do this because its perfectly fine to transform it to unit Gaussian
because it is differentiable so you can backpropagate.

Figure 1.1: Compute empirical mean and variance for each dimension

So what you usually have are fully connected or convoluational layers followed by batch normal-
ization layer before the non-linearity. So they ensure that everything is roughly unit Gaussian
at each step of the neural net. One problem is that its not clear that tanh wants exactly unit
Gaussian. Because you what tanh to be able to make its outputs more or less defused (more or
less saturated) so right now it would not be able to do that.

Figure 1.2: In network

To solve this you do not only normalize X but you also allow the network to shift by gamma
and add B. So after you have centred your data you are allowing the network through the
backprop to shift and scale the distribution. Also note that the network can learn to undo this
layer (it can learn to have the batch normalization layer to be an identity)

[2 good (from fig ??)] As you are swiping through different choices of initialization values with
and without batch norm you’ll see a huge difference. With batch norm it will work with much
bigger settings of the initial scale so you don’t have to worry as much, it really helps.

[3 good (from fig ??)] It acts somehow as a way of regularization because with batch norm when
you have some kind of input x and it goes through the network its representation in some layer
Chapter 1 Batch Normalization layer 3

Figure 1.3: Equations

Figure 1.4: Algorithm

of the network is basically not only function of it but also whatever other examples are with x
in the batch. Normally all examples are processed independently in parallel, but batch norm
tides them together and so your representation at some layer is a function on whatever batch
you happen to be sampled in at what it does is to jigger your place in the representation space
in that layer. Which is a nice regularizer effect.

During testing it works a little different. During test you what this to be a deterministic
function. µ and σ are the once that you used during training.

Figure 1.5: During testing it works a little different

Mick Napier - Improvise
Document67 pages
Mick Napier - Improvise
Elvis
No ratings yet
Mechanical Draughting
Document39 pages
Mechanical Draughting
Neil Braza
80% (5)
Neural Networks: 1 Basic Optimizer
Document8 pages
Neural Networks: 1 Basic Optimizer
Yuan Lee
No ratings yet
Projectile Motion Projectile in Sports
Document3 pages
Projectile Motion Projectile in Sports
Mira Verano
No ratings yet
Abductive Thinking and Sense Making - Jon Kolko
Document11 pages
Abductive Thinking and Sense Making - Jon Kolko
vjvb
No ratings yet
Notes For - Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - Paper GitHub
Document3 pages
Notes For - Batch Normalization - Accelerating Deep Network Training by Reducing Internal Covariate Shift - Paper GitHub
Ophelia Dawn
No ratings yet
IBM Question & Answers
Document3 pages
IBM Question & Answers
Mr SKammer
No ratings yet
A Probabilistic Theory of Deep Learning: Unit 2
Document17 pages
A Probabilistic Theory of Deep Learning: Unit 2
Harshit
No ratings yet
Resnets: Background
Document8 pages
Resnets: Background
Ilmi Kalam
No ratings yet
Convolutional Neural Networks
Document5 pages
Convolutional Neural Networks
Yazan Hassan
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Document7 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
Mrunal Bhilare
No ratings yet
Deep Learning - DL-2
Document44 pages
Deep Learning - DL-2
Hasnain Ahmad
No ratings yet
SP18 Practice Midterm
Document5 pages
SP18 Practice Midterm
Hasim
No ratings yet
Applied Deep Learning
Document48 pages
Applied Deep Learning
Shivich10
No ratings yet
Batch Normalization Separate
Document20 pages
Batch Normalization Separate
Neeraj Garg
No ratings yet
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
Document3 pages
Machine Learning and Pattern Recognition Week 8 Neural Net Intro
zeliawillscumberg
No ratings yet
Deep Learning Interview Questions and Answers
Document21 pages
Deep Learning Interview Questions and Answers
Sumathi M
No ratings yet
Principles of Convolutional Neural Networks
Document9 pages
Principles of Convolutional Neural Networks
Asma Ali
No ratings yet
Computer Vision NN Architecture
Document19 pages
Computer Vision NN Architecture
Prasu Muthyalapati
No ratings yet
Notes On Introduction To Deep Learning
Document19 pages
Notes On Introduction To Deep Learning
thumpsup1223
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
Document3 pages
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
zeliawillscumberg
No ratings yet
Unit 2 v1.
Document41 pages
Unit 2 v1.
Kommi Venkat saketh
No ratings yet
Some Notes For Machine Learning
Document40 pages
Some Notes For Machine Learning
Chaveline Gakosso
No ratings yet
Convolutional Neural Network (CNN)
Document38 pages
Convolutional Neural Network (CNN)
Mohammad Hashim Jaffri
No ratings yet
DL Unit 1
Document16 pages
DL Unit 1
nitin
No ratings yet
Introtodeeplearning MIT 6.S191
Document36 pages
Introtodeeplearning MIT 6.S191
jayateertha
No ratings yet
Intro To DL
Document28 pages
Intro To DL
ee23b007
No ratings yet
Module 3.2 Time Series Forecasting LSTM Model
Document23 pages
Module 3.2 Time Series Forecasting LSTM Model
Duane Eugenio Ani
No ratings yet
Batch Normalisation
Document17 pages
Batch Normalisation
4H5 VISHESH PETWAL
No ratings yet
Artificial Neural Networks (ANN) : 1-Introduction
Document5 pages
Artificial Neural Networks (ANN) : 1-Introduction
Sarah Dhaoui
No ratings yet
TME - Classification of SUSY Data Set
Document9 pages
TME - Classification of SUSY Data Set
Adam Teixidó Bonfill
No ratings yet
Hyperparameters
Document15 pages
Hyperparameters
raja
No ratings yet
Exercise 2: Hopeld Networks: Articiella Neuronnät Och Andra Lärande System, 2D1432, 2004
Document8 pages
Exercise 2: Hopeld Networks: Articiella Neuronnät Och Andra Lärande System, 2D1432, 2004
pavan yadav
No ratings yet
UNIT-2 Foundations of Deep Learning
Document64 pages
UNIT-2 Foundations of Deep Learning
bhavana
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
Document36 pages
Ad3451 ML Unit 4 Notes Eduengg
Md Rafeeque
No ratings yet
Deep Learning Models
Document18 pages
Deep Learning Models
sajid8887570788
No ratings yet
Chap 2 Training Feed Forward Neural Networks
Document22 pages
Chap 2 Training Feed Forward Neural Networks
HRITWIK GHOSH
No ratings yet
Deep Learning PDF
Document55 pages
Deep Learning PDF
Nitesh Kumar Sharma
No ratings yet
Unit - II ML
Document9 pages
Unit - II ML
ravi
No ratings yet
Exercises INF 5860 Solution Hints
Document11 pages
Exercises INF 5860 Solution Hints
Patrick O'Rourke
No ratings yet
Res Net
Document8 pages
Res Net
jaffar bikat
No ratings yet
Cours 2 - Training Deep Neural Networks
Document42 pages
Cours 2 - Training Deep Neural Networks
Sarah Bouammar
No ratings yet
Deep Learning
Document5 pages
Deep Learning
Tom Amit
No ratings yet
Unit 2
Document10 pages
Unit 2
Lakshmi Nandini Mente
No ratings yet
Deep Neural Network DNN
Document5 pages
Deep Neural Network DNN
jaffar bikat
No ratings yet
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
Document8 pages
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
Abhishek Sanap
No ratings yet
Sample Term Paper
Document7 pages
Sample Term Paper
Kabul Ka Pathan
No ratings yet
Unit 3 - Machine Learning
Document16 pages
Unit 3 - Machine Learning
Gauri Bansal
No ratings yet
Convolutional Neural Network Architecture - CNN Architecture
Document13 pages
Convolutional Neural Network Architecture - CNN Architecture
Rathi Priya
No ratings yet
CS601 - Machine Learning - Unit 3 - Notes - 1672759761
Document15 pages
CS601 - Machine Learning - Unit 3 - Notes - 1672759761
mohit jaiswal
No ratings yet
Res Net 50
Document22 pages
Res Net 50
Priti Sharma
No ratings yet
CNN (Convolution Neural Networks)
Document28 pages
CNN (Convolution Neural Networks)
Ahsan Ismail
No ratings yet
Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book
Document3 pages
Layer-Wise Training Rectified Linear Activation Function: Kick-Start Your Project With My New Book
Abhishek Sanap
No ratings yet
Neural Networks For Machine Learning: Lecture 14A Learning Layers of Features by Stacking Rbms
Document39 pages
Neural Networks For Machine Learning: Lecture 14A Learning Layers of Features by Stacking Rbms
Tyler Roberts
No ratings yet
Unit 4
Document38 pages
Unit 4
Abhinav Kaushik
No ratings yet
NNs PDF
Document16 pages
NNs PDF
Akshay Patil
No ratings yet
Assignment - 4
Document24 pages
Assignment - 4
Durga prasad T
No ratings yet
ANN (SPPU AI&DS Insem Solved Question Paper 2019 Pattern)
Document26 pages
ANN (SPPU AI&DS Insem Solved Question Paper 2019 Pattern)
agdada03
No ratings yet
DL Unit3
Document8 pages
DL Unit3
Anonymous xMYE0TiNBc
No ratings yet
Explaining How Resnet-50 Works and Why It Is So Popular
Document15 pages
Explaining How Resnet-50 Works and Why It Is So Popular
Abhishek Sanap
No ratings yet
DL Anonymous Question Bank
Document22 pages
DL Anonymous Question Bank
Anuradha Pise
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
Rating: 5 out of 5 stars
5/5 (2)
Part Learning
Document2 pages
Part Learning
mavoho1719
No ratings yet
Data Preprocessing
Document2 pages
Data Preprocessing
mavoho1719
No ratings yet
Activation F
Document4 pages
Activation F
mavoho1719
No ratings yet
SimplifyingCFGs Examples
Document3 pages
SimplifyingCFGs Examples
mavoho1719
No ratings yet
Substitution Example
Document1 page
Substitution Example
mavoho1719
No ratings yet
Reviewer For 3RD Quarter Grd10math
Document2 pages
Reviewer For 3RD Quarter Grd10math
Nokray
No ratings yet
Engineering Drawing 3rd Edition - K L Narayana P Kannaiah PDF
Document1,489 pages
Engineering Drawing 3rd Edition - K L Narayana P Kannaiah PDF
zongqi zhang
100% (1)
CH 4 Determinants Multiple Choice Questions (With Answers)
Document4 pages
CH 4 Determinants Multiple Choice Questions (With Answers)
CRPF School
100% (1)
(Ferreira, 2009) Matlab Codes For Finite Element Analysis
Document249 pages
(Ferreira, 2009) Matlab Codes For Finite Element Analysis
akıle nese
No ratings yet
Sequence Generator Pro
Document209 pages
Sequence Generator Pro
Clans Clans
No ratings yet
Introduction To Automatic Control
Document10 pages
Introduction To Automatic Control
Fatih Yıldız
No ratings yet
Math 202B Solutions: Assignment 7 D. Sarason
Document2 pages
Math 202B Solutions: Assignment 7 D. Sarason
Super Nezh
No ratings yet
Mickey CD. Chemical Kinetics - Reaction Rates. J Chem Educ. 1980 57 (9) - 659-663.
Document5 pages
Mickey CD. Chemical Kinetics - Reaction Rates. J Chem Educ. 1980 57 (9) - 659-663.
Alphonse Sambrano
No ratings yet
Graph Theory - Coverings - Tutorialspoint
Document5 pages
Graph Theory - Coverings - Tutorialspoint
Kshitiz Goyal
No ratings yet
1.introduction of Particle Physic
Document44 pages
1.introduction of Particle Physic
syazwan
100% (1)
NSM 8 CB Ch09
Document29 pages
NSM 8 CB Ch09
Lino
100% (2)
Gordon - Fixing The Race
Document28 pages
Gordon - Fixing The Race
Alexey Belousov
100% (1)
Chapter 12
Document13 pages
Chapter 12
Abood Alissa
100% (1)
Domino Addition
Document2 pages
Domino Addition
api-258037265
No ratings yet
Grade 7 Practice Worksheet1
Document8 pages
Grade 7 Practice Worksheet1
emi
No ratings yet
Permeability Determination From Well Log Data: W, For Both The Saturation Exponent, N, and Cementation Exponent, M
Document5 pages
Permeability Determination From Well Log Data: W, For Both The Saturation Exponent, N, and Cementation Exponent, M
daniel.fadokun
No ratings yet
Priestey Et Al 2007 Diseño Por Desplazamientos
Document12 pages
Priestey Et Al 2007 Diseño Por Desplazamientos
Ulises Mendoza
No ratings yet
CM #1 (Preliminaries)
Document24 pages
CM #1 (Preliminaries)
whatsapp
No ratings yet
PT Math 4 Quarter 1
Document4 pages
PT Math 4 Quarter 1
Mark Angel Moreno Lpt
No ratings yet
Ground Loop Problems With Measurement Systems and How To Avoid Them
Document77 pages
Ground Loop Problems With Measurement Systems and How To Avoid Them
Alfredo Cordova
No ratings yet
Many Body Localization
Document125 pages
Many Body Localization
Ahmad Saleheen
No ratings yet
Report Harms
Document255 pages
Report Harms
Malik Atik
No ratings yet
Eslq
Document22 pages
Eslq
ecejmr
No ratings yet
Slides Survivor Path Decoding Viterbi Lecture - 16
Document22 pages
Slides Survivor Path Decoding Viterbi Lecture - 16
raymar2k
No ratings yet
Structure of Chapter: Chapter 5: Virtual Work Method On Trusses
Document9 pages
Structure of Chapter: Chapter 5: Virtual Work Method On Trusses
Yao Ssengss
No ratings yet