Professional Documents
Culture Documents
DL Unit-3
DL Unit-3
UNIT III: Deep Learning: Deep Feed Forward network, regularizations, training deep models,
dropouts, Convolution Neural Network, Recurrent Neural Network, and Deep Belief Network.
Each Hidden layer is composed of neurons. The neurons are connected to each other. The
neuron will process and then propagate the input signal it receives the layer above it. The strength
of the signal given the neuron in the next layer depends on the weight, bias and activation function.
The network consumes large amounts of input data and operates them through multiple
layers; the network can learn increasingly complex features of the data at each layer.
A deep neural network provides state-of-the-art accuracy in many tasks, from object
detection to speech recognition. They can learn automatically, without predefined knowledge
explicitly coded by the programmers.
1
The learning occurs in two phases.
The first phase consists of applying a nonlinear transformation of the input and create a
statistical model as output.
The second phase aims at improving the model with a mathematical method known as
derivative.
The neural network repeats these two phases hundreds to thousands of time until it has reached a
tolerable level of accuracy. The repeat of this two-phase is called an iteration.
Shallow neural network: The Shallow neural network has only one hidden layer between the input
and output.
Deep neural network: Deep neural networks have more than one layer. For instance, Google
LeNet model for image recognition counts 22 layers.
2
Feed-forward neural networks have no memory of the input they receive and are bad at
predicting what’s coming next. Because a feed-forward network only considers the current input, it
has no notion of order in time. It simply can’t remember anything about what happened in the past
except its training.
Regularizations:
In Unit - II
2) Data Preparation:
Collect and preprocess your data. This involves cleaning, normalizing, and splitting the
data into training, validation, and test sets.
Data augmentation techniques can be applied to artificially increase the size of the
training dataset, especially in computer vision tasks.
3) Loss Function:
Choose a suitable loss function that quantifies the difference between the model's
predictions and the actual target values. The choice of the loss function depends on the
type of problem (e.g., mean squared error for regression, categorical cross-entropy for
classification).
4) Optimizer:
Select an optimizer to minimize the loss function and update the model parameters.
Common optimizers include stochastic gradient descent (SGD), Adam, RMSprop, etc.
5) Training Process:
Iteratively feed batches of training data into the model.
Calculate the loss on the training data and backpropagate it through the network to
update the weights.
Repeat this process for multiple epochs (passes through the entire training dataset).
3
6) Validation:
Periodically evaluate the model on a separate validation dataset to monitor its
performance and detect overfitting.
Adjust hyperparameters (learning rate, batch size, etc.) based on validation performance.
9) Regularization:
Apply regularization techniques (e.g., dropout, L1 or L2 regularization) to prevent
overfitting.
Remember that the field of deep learning is highly dynamic, and best practices can evolve.
Staying updated with the latest research and tools is essential for effective model training.
Additionally, the computational resources required for training deep models can be substantial, and
techniques like distributed training and hardware acceleration (e.g., GPUs, TPUs) are often
employed to speed up the process.
Dropout:
This is the one of the most interesting types of regularization techniques. It also produces
very good results and is consequently the most frequently used regularization technique in the field
of deep learning.
To understand dropout, let’s say our neural network structure is akin to the one shown below:
4
So what does dropout do? At every iteration, it randomly selects some nodes and removes them
along with all of their incoming and outgoing connections as shown below.
So each iteration has a different set of nodes and this results in a different set of outputs. It can also
be thought of as an ensemble technique in machine learning.
In a feed-forward neural network, the information only moves in one direction — from the
input layer, through the hidden layers, to the output layer. The information moves straight through
the network and never touches a node twice.
Feed-forward neural networks have no memory of the input they receive and are bad at
predicting what’s coming next. Because a feed-forward network only considers the current input, it
has no notion of order in time. It simply can’t remember anything about what happened in the past
except its training.
In a RNN the information cycles through a loop. When it makes a decision, it considers the
current input and also what it has learned from the inputs it received previously.
5
The two images below illustrate the difference in information flow between a RNN and a feed-
forward neural network.
A usual RNN has a short-term memory. In combination with a LSTM they also have a long-term
memory (more on that later).
Imagine you have a normal feed-forward neural network and give it the word "neuron" as an
input and it processes the word character by character. By the time it reaches the character "r," it has
already forgotten about "n," "e" and "u," which makes it almost impossible for this type of neural
network to predict which character would come next.
A recurrent neural network, however, is able to remember those characters because of its
internal memory. It produces output, copies that output and loops it back into the network.
Therefore, a RNN has two inputs: the present and the recent past. This is important because
the sequence of data contains crucial information about what is coming next, which is why a RNN
can do things other algorithms can’t.
A feed-forward neural network assigns, like all other deep learning algorithms, a weight
matrix to its inputs and then produces the output. Note that RNNs apply weights to the current and
also to the previous input. Furthermore, a recurrent neural network will also tweak the weights for
both through gradient descent and backpropagation through time (BPTT).
Also note that while feed-forward neural networks map one input to one output, RNNs can map one
to many, many to many (translation) and many to one (classifying a voice).
The CNN receives an image of let's say a cat, this image, in computer term, is a collection of
the pixel. Generally, one layer for the greyscale picture and three layers for a color picture.
During the feature learning (i.e., hidden layers), the network will identify unique features,
for instance, the tail of the cat, the ear, etc.
When the network thoroughly learned how to recognize a picture, it can provide a
probability for each image it knows. The label with the highest probability will become the
prediction of the network.
Deep Learning has proved to be a very powerful tool because of its ability to handle large amounts
of data. The interest to use hidden layers has surpassed traditional techniques, especially in pattern
recognition. One of the most popular deep neural networks is Convolutional Neural Networks.
The most that a CNN could do at that time was recognize handwritten digits. It was mostly used in
the postal sectors to read zip codes, pin codes, etc. The important thing to remember about any deep
learning model is that it requires a large amount of data to train and also requires a lot of computing
resources. This was a major drawback for CNNs at that period and hence CNNs were only limited
to the postal sectors and it failed to enter the world of machine learning.
7
It uses a special technique called Convolution. Now in mathematics convolution is a mathematical
operation on two functions that produces a third function that expresses how the shape of one is
modified by the other.
Bottom line is that the role of the ConvNet is to reduce the images into a form that is easier to
process, without losing features that are critical for getting a good prediction.
8
Notice that besides the output layer, every pair of layers in the hidden plus input layers compose a
RBM.