Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

UNIT-III

UNIT III: Deep Learning: Deep Feed Forward network, regularizations, training deep models,
dropouts, Convolution Neural Network, Recurrent Neural Network, and Deep Belief Network.

What is Deep Learning?


Deep Learning is a computer software that mimics the network of neurons in a brain. It is a subset
of machine learning based on artificial neural networks with representation learning. It is called
deep learning because it makes use of deep neural networks. This learning can be supervised, semi-
supervised or unsupervised.
Deep learning algorithms are constructed with connected layers.

 The first layer is called the Input Layer


 The last layer is called the Output Layer
 All layers in between are called Hidden Layers. The word deep means the network join
neurons in more than two layers.

Each Hidden layer is composed of neurons. The neurons are connected to each other. The
neuron will process and then propagate the input signal it receives the layer above it. The strength
of the signal given the neuron in the next layer depends on the weight, bias and activation function.
The network consumes large amounts of input data and operates them through multiple
layers; the network can learn increasingly complex features of the data at each layer.
A deep neural network provides state-of-the-art accuracy in many tasks, from object
detection to speech recognition. They can learn automatically, without predefined knowledge
explicitly coded by the programmers.

Deep learning Process

1
The learning occurs in two phases.

 The first phase consists of applying a nonlinear transformation of the input and create a
statistical model as output.
 The second phase aims at improving the model with a mathematical method known as
derivative.
The neural network repeats these two phases hundreds to thousands of time until it has reached a
tolerable level of accuracy. The repeat of this two-phase is called an iteration.
Shallow neural network: The Shallow neural network has only one hidden layer between the input
and output.
Deep neural network: Deep neural networks have more than one layer. For instance, Google
LeNet model for image recognition counts 22 layers.

Feed-forward neural networks:


The simplest type of artificial neural network. With this type of architecture, information flows in
only one direction, forward. It means, the information's flows starts at the input layer, goes to the
"hidden" layers, and end at the output layer.
Does not have a loop. Information stops at the output layers.
In a feed-forward neural network, the information only moves in one direction — from the input
layer, through the hidden layers, to the output layer. The information moves straight through the
network and never touches a node twice.

2
Feed-forward neural networks have no memory of the input they receive and are bad at
predicting what’s coming next. Because a feed-forward network only considers the current input, it
has no notion of order in time. It simply can’t remember anything about what happened in the past
except its training.

Regularizations:
In Unit - II

Training deep models:


Training deep models in deep learning involves optimizing the parameters of a deep neural
network to learn from data and make predictions. Here's a general overview of the process:

1) Define the Architecture:


 Choose the type of neural network architecture that suits your problem (e.g.,
convolutional neural networks (CNNs) for image data, recurrent neural networks
(RNNs) for sequential data).
 Determine the number of layers, the type of activation functions, and the number of
neurons in each layer.

2) Data Preparation:
 Collect and preprocess your data. This involves cleaning, normalizing, and splitting the
data into training, validation, and test sets.
 Data augmentation techniques can be applied to artificially increase the size of the
training dataset, especially in computer vision tasks.
3) Loss Function:
 Choose a suitable loss function that quantifies the difference between the model's
predictions and the actual target values. The choice of the loss function depends on the
type of problem (e.g., mean squared error for regression, categorical cross-entropy for
classification).
4) Optimizer:
 Select an optimizer to minimize the loss function and update the model parameters.
Common optimizers include stochastic gradient descent (SGD), Adam, RMSprop, etc.
5) Training Process:
 Iteratively feed batches of training data into the model.
 Calculate the loss on the training data and backpropagate it through the network to
update the weights.
 Repeat this process for multiple epochs (passes through the entire training dataset).
3
6) Validation:
 Periodically evaluate the model on a separate validation dataset to monitor its
performance and detect overfitting.
 Adjust hyperparameters (learning rate, batch size, etc.) based on validation performance.

7) Test the Model:


 After training, evaluate the model on a test set that it has never seen before to assess
its generalization performance.

8) Fine-Tuning and Hyperparameter Tuning:


 Adjust model architecture or hyperparameters based on validation and test
performance.

9) Regularization:
 Apply regularization techniques (e.g., dropout, L1 or L2 regularization) to prevent
overfitting.

10) Transfer Learning (Optional):


 If relevant, use pre-trained models and fine-tune them on your specific task,
especially when dealing with limited data.

11) Monitoring and Visualization:


 Monitor training and validation metrics over time.
 Visualize model performance, such as learning curves, confusion matrices, and
feature maps.

Remember that the field of deep learning is highly dynamic, and best practices can evolve.
Staying updated with the latest research and tools is essential for effective model training.
Additionally, the computational resources required for training deep models can be substantial, and
techniques like distributed training and hardware acceleration (e.g., GPUs, TPUs) are often
employed to speed up the process.

Dropout:
This is the one of the most interesting types of regularization techniques. It also produces
very good results and is consequently the most frequently used regularization technique in the field
of deep learning.

To understand dropout, let’s say our neural network structure is akin to the one shown below:

4
So what does dropout do? At every iteration, it randomly selects some nodes and removes them
along with all of their incoming and outgoing connections as shown below.

So each iteration has a different set of nodes and this results in a different set of outputs. It can also
be thought of as an ensemble technique in machine learning.

Recurrent neural networks (RNNs):


RNN is a multi-layered neural network that can store information in context nodes, allowing
it to learn data sequences and output a number or another sequence. In simple words it an Artificial
neural networks whose connections between neurons include loops. RNNs are well suited for
processing sequences of inputs.

In a feed-forward neural network, the information only moves in one direction — from the
input layer, through the hidden layers, to the output layer. The information moves straight through
the network and never touches a node twice.
Feed-forward neural networks have no memory of the input they receive and are bad at
predicting what’s coming next. Because a feed-forward network only considers the current input, it
has no notion of order in time. It simply can’t remember anything about what happened in the past
except its training.
In a RNN the information cycles through a loop. When it makes a decision, it considers the
current input and also what it has learned from the inputs it received previously.

5
The two images below illustrate the difference in information flow between a RNN and a feed-
forward neural network.

A usual RNN has a short-term memory. In combination with a LSTM they also have a long-term
memory (more on that later).
Imagine you have a normal feed-forward neural network and give it the word "neuron" as an
input and it processes the word character by character. By the time it reaches the character "r," it has
already forgotten about "n," "e" and "u," which makes it almost impossible for this type of neural
network to predict which character would come next.
A recurrent neural network, however, is able to remember those characters because of its
internal memory. It produces output, copies that output and loops it back into the network.
Therefore, a RNN has two inputs: the present and the recent past. This is important because
the sequence of data contains crucial information about what is coming next, which is why a RNN
can do things other algorithms can’t.
A feed-forward neural network assigns, like all other deep learning algorithms, a weight
matrix to its inputs and then produces the output. Note that RNNs apply weights to the current and
also to the previous input. Furthermore, a recurrent neural network will also tweak the weights for
both through gradient descent and backpropagation through time (BPTT).
Also note that while feed-forward neural networks map one input to one output, RNNs can map one
to many, many to many (translation) and many to one (classifying a voice).

Common uses of RNN

 Help securities traders to generate analytic reports


 Detect abnormalities in the contract of financial statement
 Detect fraudulent credit-card transaction
 Provide a caption for images
 Power chatbots
 The standard uses of RNN occur when the practitioners are working with time-series data or
sequences (e.g., audio recordings or text).
6
Convolutional neural networks (CNN):
CNN is a multi-layered neural network with a unique architecture designed to extract
increasingly complex features of the data at each layer to determine the output. CNN's are well
suited for perceptual tasks.

Convolutional Neural Network


CNN is mostly used when there is an unstructured data set (e.g., images) and the practitioners need
to extract information from it
For instance, if the task is to predict an image caption:

 The CNN receives an image of let's say a cat, this image, in computer term, is a collection of
the pixel. Generally, one layer for the greyscale picture and three layers for a color picture.
 During the feature learning (i.e., hidden layers), the network will identify unique features,
for instance, the tail of the cat, the ear, etc.
 When the network thoroughly learned how to recognize a picture, it can provide a
probability for each image it knows. The label with the highest probability will become the
prediction of the network.
Deep Learning has proved to be a very powerful tool because of its ability to handle large amounts
of data. The interest to use hidden layers has surpassed traditional techniques, especially in pattern
recognition. One of the most popular deep neural networks is Convolutional Neural Networks.

The most that a CNN could do at that time was recognize handwritten digits. It was mostly used in
the postal sectors to read zip codes, pin codes, etc. The important thing to remember about any deep
learning model is that it requires a large amount of data to train and also requires a lot of computing
resources. This was a major drawback for CNNs at that period and hence CNNs were only limited
to the postal sectors and it failed to enter the world of machine learning.

7
It uses a special technique called Convolution. Now in mathematics convolution is a mathematical
operation on two functions that produces a third function that expresses how the shape of one is
modified by the other.

Bottom line is that the role of the ConvNet is to reduce the images into a form that is easier to
process, without losing features that are critical for getting a good prediction.

Deep Belief neural network:


Deep belief networks are a machine learning algorithm that looks remarkably similar to a
deep neural network, but is actually not.
In order to understand DBNs, one must first understand Restricted Boltzmann Machines, or
RBMs. RMBs are a shallow feedforward "neural network alike"
An RBM consists of a two-layer network of fully connected nodes with both forward and
backwards connections (a cycle). The forwards and backwards connections have the restriction that
they share weights. On each pass through the network (either forwards or backwards), a gradient
update is performed which affects the forward connections and the backwards connections
simultaneously. This is, notably, not backpropogation! Instead a training schedule known as
contrastive digestion is used, which is based on a metric known as KL divergence.
A deep belief network consists of a sequence of restricted boltzmann machines which are
sequentially connected. Each of the Boltzmann machines layers is trained until convergence, then
frozen; the result of the "output" layer of the machine is then fed as input to the next Boltzmann
machine in the sequence, which is then itself trained until convergence, and so forth, until the entire
network has been trained.

8
Notice that besides the output layer, every pair of layers in the hidden plus input layers compose a
RBM.

You might also like