Deep Learning Notes

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

DEEP LEARNING NOTES

1. Application of deep learning where it has outperformed humans.

 Object and Image Recognition: DeepFace is a deep learning facial recognition


system created by a research group at Facebook. It identifies human faces in
digital images.
 Voice and Speech Recognition: Google released WaveNet and Baidu released
DeepSpeech. Both are deep learning networks that generate a voice
automatically. The systems learn to mimic human voices by themselves and
improve over time. Differentiating their speech from that of a real human is
much harder to do than one might imagine.
A deep network created by Oxford and Google DeepMind scientists, LipNet,
reached a 93 percent success score in reading people’s lips, where an average
human lip reader only succeeds 52 percent of the time.
 Art Imitation: it can transfer the style from the original artwork into a new
image based on the analysis. An example of this is DeepArt.io, a company that
creates apps that use deep learning to learn hundreds of different styles
which you can apply to your photos.
 Google Sunroof: Google Sunroof. The technology uses aerial photos from
Google Earth to create a 3D model of your roof to separate it from
surrounding trees and shadows. It then uses the sun’s trajectory to predict
how much energy solar panels could produce from your roof according to the
location specs.
 Image and video generation: The generative adversarial network (GAN) framework
has emerged as a powerful tool for various image and video synthesis tasks, allowing
the synthesis of visual content in an unconditional or input-conditional manner. It has
enabled the generation of high-resolution photorealistic images and videos,

2. Describe architecture of Deep feed forward ANN


Deep feedforward artificial neural networks, also known as multilayer perceptrons (MLPs),
consist of multiple layers of interconnected nodes that process inputs and produce outputs.
The architecture of a deep feedforward neural network typically consists of three types of
layers:
Input layer: This layer receives the input data and passes it on to the next layer. Each node
in this layer represents a feature of the input data.
Hidden layers: These layers perform intermediate computations that transform the input
data into a representation that is more useful for the task at hand. Each node in a hidden
layer receives inputs from the nodes in the previous layer, performs a computation on the
inputs using a set of weights, and passes the result on to the next layer. The number of
hidden layers and the number of nodes in each layer can vary depending on the complexity
of the problem being solved.
Output layer: This layer produces the final output of the network. The number of nodes in
the output layer is typically determined by the number of classes or regression targets in the
problem being solved. Each node in the output layer represents a class or a target value.
In addition to the layers, a deep feedforward neural network also includes a set of weights
that determine the strength of the connections between the nodes in adjacent layers.
During training, these weights are adjusted using a process called backpropagation, in which
the error between the predicted output and the true output is propagated backwards
through the network to update the weights.
Overall, the architecture of a deep feedforward neural network allows it to model complex
nonlinear relationships between inputs and outputs, making it a powerful tool for a wide
range of machine learning tasks.

3. Decaying learning rate


Decaying learning rate is a technique used in training machine learning models to gradually
reduce the size of the step taken by the optimization algorithm during gradient descent. The
learning rate is a hyperparameter that determines the size of the update to the model's
parameters at each iteration of the optimization process.
The idea behind decaying the learning rate is to allow the optimization algorithm to take
larger steps in the beginning of the training process when the parameters are far from
optimal, and then gradually reduce the step size as the optimization process approaches the
optimum, allowing the algorithm to converge more slowly and reach a more accurate
solution.
There are several methods for decaying the learning rate, including:
Step decay: The learning rate is reduced by a constant factor after a fixed number of epochs
or iterations.
Exponential decay: The learning rate is exponentially decayed over time, reducing it by a
fixed factor after each epoch or iteration.
Time-based decay: The learning rate is reduced based on the total number of iterations or
epochs that have passed during training.
Inverse time decay: The learning rate is reduced at a rate that decreases over time, typically
proportional to the inverse of the square root of the number of iterations or epochs.
Decaying the learning rate can help prevent overfitting, improve model generalization, and
reduce training time by allowing the optimization process to converge more quickly to an
optimal solution.
4. Why CNN is preferred over ANN for image data as an input.
Convolutional Neural Networks (CNNs) are a type of artificial neural network that is
specifically designed to work with image data. They are preferred over traditional Artificial
Neural Networks (ANNs) for image data inputs because of several reasons:
Local connectivity and parameter sharing: In image data, each pixel's value depends heavily
on the values of the neighboring pixels. CNNs take advantage of this to extract local features
from the image. This helps to reduce the number of parameters in the model, making it
easier to train.
Translation Invariance: CNNs are able to recognize features in an image regardless of where
they are located in the image. This is because they use convolutional filters that slide over
the image to detect features, allowing the network to learn to recognize the same feature in
different parts of the image.
Pooling Layers: CNNs often use pooling layers to downsample the feature maps. Pooling
helps to reduce the spatial dimensionality of the feature maps and makes the model more
robust to small translations or distortions in the input image.
Hierarchical Representation: CNNs learn hierarchical representations of an image, starting
from low-level features such as edges and curves, and moving up to higher-level features
like shapes and objects. This makes the model more capable of recognizing complex
patterns and structures in an image.
In summary, CNNs are specifically designed to work with image data, taking advantage of
the local connectivity and parameter sharing, translation invariance, pooling layers, and
hierarchical representation, making them more effective and efficient than traditional ANNs
for image classification tasks.

6. Explain Different layers of a CNN


Convolutional Neural Networks (CNNs) are composed of different types of layers that work
together to extract features from the input image and classify it into different classes. The
main types of layers in a CNN are:
Convolutional Layer: The convolutional layer is the core building block of a CNN. It applies a
set of filters to the input image, extracting different features from the image. The filters
slide over the image, and at each position, the dot product is taken between the filter and
the portion of the image that the filter is currently covering. The output of the convolutional
layer is a feature map that highlights the locations of the detected features.
ReLU Layer: The ReLU (Rectified Linear Unit) layer applies the activation function f(x) =
max(0,x) element-wise to the output of the convolutional layer. This introduces non-
linearity into the network, allowing it to learn more complex features and decision
boundaries.
Pooling Layer: The pooling layer is used to downsample the feature maps outputted by the
convolutional layers. The most common type of pooling layer is the Max Pooling layer,
which extracts the maximum value from each sub-region of the feature map. This helps to
reduce the dimensionality of the feature map and makes the network more robust to small
changes in the input.
Fully Connected Layer: The fully connected layer is a traditional neural network layer that
takes the output from the previous layers and uses it to classify the input image.
and applies a set of weights to it, followed by a bias term and an activation function such as
ReLU or softmax.
Dropout Layer: The dropout layer is used to prevent overfitting in the network by randomly
dropping out some of the nodes in the network during training. This forces the network to
learn more robust features that are less dependent on any particular set of nodes.
In summary, the different layers of a CNN work together to extract features from the input
image and classify it into different classes. The convolutional layer extracts different features
from the image, the ReLU layer introduces non-linearity, the pooling layer downsamples the
feature map, the fully connected layer classifies the input, and the dropout layer prevents
overfitting.
7. ReLU
The Rectified Linear Unit (ReLU) activation function is one of the most widely used activation
functions in Convolutional Neural Networks (CNNs). Its significance lies in the fact that it
introduces non-linearity into the network, allowing it to learn more complex features and
decision boundaries.
The ReLU activation function f(x) is defined as f(x) = max(0, x). It is a simple and efficient
function that returns 0 for any negative input and the input itself for any positive input. This
means that it is fast to compute and does not suffer from the vanishing gradient problem
that can occur with other activation functions such as the sigmoid or hyperbolic tangent.
The ReLU activation function has several advantages in CNNs:
Non-Linearity: ReLU introduces non-linearity into the network, allowing it to learn more
complex features and decision boundaries.
Sparse Activation: ReLU produces sparse activation in the network, meaning that only a
small subset of neurons are activated for any given input. This helps to reduce the number
of computations required and prevent overfitting.
Faster Convergence: ReLU allows the network to converge faster during training because it
does not suffer from the vanishing gradient problem that can occur with other activation
functions.
Scalability: ReLU scales well to deep neural networks, which is important for CNNs that
often have many layers.
In summary, the ReLU activation function is significant in CNNs because it introduces non-
linearity, produces sparse activation, allows for faster convergence, and is scalable to deep
neural networks. These advantages make it one of the most widely used activation functions
in CNNs.
8. Pooling Layers
Pooling layers are commonly used in Convolutional Neural Networks (CNNs) to reduce the
spatial size of the feature maps and introduce invariance to small translations and rotations
of the input images. Pooling layers work by summarizing the information contained in a
local region of the feature map into a single value.
Dimensionality Reduction: Pooling layers reduce the spatial dimensionality of the feature
maps, which reduces the number of parameters in the network and prevents overfitting. By
summarizing the information contained in a local region of the feature map into a single
value, pooling layers help to reduce the size of the data that the network needs to process.
Computationally Efficient: Pooling layers are computationally efficient, as they require only a
small number of parameters and computations. This makes them suitable for use in large-
scale CNNs, where the number of parameters can quickly become prohibitively large.
9. Explain the role of Convolutional layer in CNN
The Convolutional layer is a fundamental building block of Convolutional Neural Networks
(CNNs) and plays a critical role in their ability to learn features from image data.
Feature Extraction: The Convolutional layer extracts local features from the input image by
applying a set of learnable filters to the image. Each filter looks for a specific feature in the
image, such as edges or corners, and produces a feature map that highlights the locations of
that feature.
10. Parameter Sharing:
In a Convolutional layer, each neuron applies a filter to a local region of the input image to
extract a specific feature. Instead of learning separate parameters for each neuron,
parameter sharing is used to enforce that all neurons in a given feature map share the same
set of weights. This reduces the number of parameters in the network, making it easier to
train and less prone to overfitting.
Parameter sharing allows the network to learn features that are translation invariant,
meaning they can be detected regardless of their location in the image. This is because each
neuron applies the same filter to different regions of the image, so it can detect the same
feature in different locations.
11. Sparsity of Connections:
In a Convolutional layer, each neuron is connected only to a small local region of the input
image, rather than to all neurons in the previous layer. This sparsity of connections reduces
the number of computations required in the network and also helps prevent overfitting.
By only connecting each neuron to a small local region of the input image, the network can
learn features that are spatially invariant, meaning they can be detected regardless of their
location in the image. This is because each neuron is sensitive to the same feature,
regardless of where it appears in the image
12. RCNN for object detection
Region-based Convolutional Neural Networks (R-CNNs) are a type of CNN used for object
detection, which involves detecting the location of objects in an image and classifying them.
R-CNNs are composed of three stages: region proposal, feature extraction, and object
classification.
In the region proposal stage, a selective search algorithm is used to generate a set of
candidate regions, which are likely to contain objects. The algorithm works by segmenting
the image into smaller regions and merging similar regions to form larger ones, which are
then used as proposals.
In the feature extraction stage, each proposal region is fed into a convolutional neural
network (CNN) to extract a fixed-length feature vector. The CNN is typically pre-trained on a
large dataset, such as ImageNet, to learn generic features that can be useful for object
detection.
In the object classification stage, a separate classifier, such as a Support Vector Machine
(SVM), is trained for each object class. The feature vector extracted from each proposal
region is then fed into all the classifiers, and the class with the highest score is assigned to
the region.
Finally, a bounding box regression algorithm is applied to refine the bounding boxes of the
detected objects based on the original proposals.
Overall, R-CNNs have been shown to be effective for object detection in various settings, but
they can be computationally expensive due to the need for multiple CNN evaluations for
each proposal.
13. YOLOV1 for object detection.
YOLO (You Only Look Once) is a popular object detection algorithm that uses a single neural
network to predict bounding boxes and class probabilities for objects in an image. YOLOv1 is
the first version of YOLO, which was introduced in 2015 by Joseph Redmon et al.
The YOLOv1 algorithm consists of a single neural network that takes the entire image as
input and divides it into a grid of cells. Each cell is responsible for predicting a fixed number
of bounding boxes, along with their associated class probabilities. The bounding boxes are
defined by their center coordinates, width, and height, and each box is associated with a
confidence score that indicates the probability that the box contains an object.
To generate the predictions, YOLOv1 uses a series of convolutional layers to extract features
from the input image, followed by a set of fully connected layers that predict the bounding
boxes and class probabilities for each cell. The loss function used to train the network is a
combination of localization loss, which penalizes inaccurate bounding box predictions, and
classification loss, which penalizes incorrect class predictions.
One of the main advantages of YOLOv1 is its speed. Since it only requires a single forward
pass through the neural network to generate predictions for the entire image, it can process
images in real-time on a CPU. However, YOLOv1 can struggle with small objects and heavily
occluded objects, as well as with detecting multiple objects that overlap with each other.
14. YOLOV3 for object detection.
The YOLOv3 algorithm still uses a single neural network to predict bounding boxes and class
probabilities for objects in an image. However, it incorporates several enhancements to
improve performance, such as:
Darknet-53 Backbone: YOLOv3 uses a new backbone network called Darknet-53, which is a
deeper and more powerful version of the previous Darknet-19 network used in YOLOv2. The
Darknet-53 backbone allows for better feature extraction and improved accuracy.
Feature Pyramid Network (FPN): YOLOv3 uses FPN to extract features at multiple scales,
which allows it to detect objects of different sizes more accurately. This technique involves
concatenating features from different layers of the neural network to produce a feature
map that contains information from multiple scales.
Prediction Head: The prediction head in YOLOv3 predicts bounding boxes and class
probabilities at three different scales, allowing it to detect small, medium, and large objects.
The prediction head also uses anchor boxes, which are pre-defined bounding boxes that the
model uses to predict object locations and sizes.
Objectness Score: In YOLOv3, the objectness score is calculated as the product of the box
confidence and the conditional class probabilities. This modification helps the model to
better distinguish between objects and background regions.
Overall, YOLOv3 is faster and more accurate than its predecessors, with a mAP (mean
average precision) score of 57.9 on the COCO dataset, which is significantly higher than
YOLOv2's score of 44.0. YOLOv3 is also capable of real-time object detection on a GPU, with
an average speed of 30 frames per second.
15. Components of RNN
Recurrent Neural Networks (RNNs) are a type of neural network that can process sequential
data by maintaining a state, or "memory", of previous inputs. RNNs are widely used for
tasks such as speech recognition, natural language processing, and time-series prediction.
Recurrent Neural Networks (RNNs) are a type of artificial neural network designed to
process sequential data, where the current output depends on the previous inputs. Unlike
feedforward neural networks, which process input data in a single pass through the
network, RNNs maintain an internal state, which allows them to process sequences of
variable length and model temporal dependencies.
RNNs are characterized by a set of recurrent connections, which enable information to be
passed from one time step to the next. At each time step, the network takes an input x(t),
and produces an output y(t) and a hidden state h(t), which is used to maintain a
representation of the previous inputs. The hidden state at each time step is calculated as a
function of the current input and the previous hidden state, using a set of weights and
biases that are learned during training.
One of the key advantages of RNNs is their ability to handle inputs of variable length. Unlike
feedforward neural networks, which require fixed-size input vectors, RNNs can process
sequences of arbitrary length, and are able to capture dependencies between inputs that
are separated by many time steps. This makes them particularly well-suited to tasks such as
speech recognition, natural language processing, and time series analysis.
The main components of an RNN are:
Input: The input to an RNN is a sequence of vectors, where each vector represents an
element in the sequence, such as a word in a sentence or a data point in a time series.
Hidden State: The hidden state of an RNN is a vector that represents the network's memory
of previous inputs. The hidden state is updated at each time step by combining the current
input vector with the previous hidden state.
Recurrent Connection: The recurrent connection is a loop that connects the hidden state at
one time step to the hidden state at the next time step. This connection allows the network
to maintain a memory of previous inputs.
Activation Function: The activation function is a non-linear function that is applied to the
combined input and hidden state at each time step. The activation function allows the
network to model complex relationships between the input and output.
Output: The output of an RNN is a sequence of vectors that represent the network's
prediction for each element in the input sequence.
Loss Function: The loss function is used to measure the error between the predicted output
and the true output. The goal of training an RNN is to minimize the loss function by
adjusting the network's parameters, such as the weights and biases.
There are different types of RNNs, such as Simple RNN, LSTM (Long Short-Term Memory),
and GRU (Gated Recurrent Unit), which have different variations in the way the hidden state
is updated and the activation function is applied. These variations have been introduced to
address the problem of vanishing gradients, which can occur when training RNNs with long
input sequences.
16. Vanishing gradients problem in RNN
The vanishing gradient problem is a common issue that can arise when training Recurrent
Neural Networks (RNNs), particularly those with long input sequences. It occurs when the
gradients of the loss function with respect to the parameters of the network become very
small as they are backpropagated through time, making it difficult to update the parameters
and learn meaningful representations of the input data.
In RNNs, the hidden state at each time step is computed by combining the current input
with the previous hidden state using a set of weights. During backpropagation, the gradient
of the loss function with respect to these weights is calculated and used to update the
weights. However, when the weights are small, or when the activation function used in the
network saturates (i.e., approaches the minimum or maximum value), the gradient can
become very small, making it difficult to learn long-term dependencies in the input
sequence.
17. Why long sequence inputs are not preferred for RNN.
Long sequence inputs are not preferred for RNNs due to the vanishing gradient problem,
which can make it difficult to effectively learn long-term dependencies in the input
sequence.
As the length of the input sequence increases, the gradient that is backpropagated through
time to update the weights of the network becomes smaller and smaller. This can make it
difficult for the network to effectively learn the relationships between the inputs at different
time steps, particularly if the inputs are separated by a large number of time steps.
The vanishing gradient problem is particularly pronounced in standard RNN architectures,
such as Simple RNN, which have a single recurrent connection that is repeated over multiple
time steps. This architecture can cause the gradients to become very small as they are
backpropagated through time, making it difficult to learn long-term dependencies in the
input sequence.
However, there are other types of RNN architectures that have been developed to address
the vanishing gradient problem, such as Long Short-Term Memory (LSTM) and Gated
Recurrent Units (GRU). These architectures use more complex connections between the
recurrent units to regulate the flow of information and prevent the gradients from
vanishing.
Despite these advances, long sequence inputs can still pose challenges for RNNs. In practice,
it is often necessary to truncate or split the input sequence into shorter segments to make it
more manageable for the network to learn from. Alternatively, other architectures such as
Convolutional Neural Networks (CNNs) or Transformers may be used to process long
sequences more effectively.
18. Explain the working of the LSTM.
(Ans with the diagram as in ppt)
Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) that is
designed to address the vanishing gradient problem and enable the network to effectively
learn long-term dependencies in the input sequence.
LSTM networks consist of a series of LSTM cells, each of which contains a memory cell, an
input gate, an output gate, and a forget gate. These gates are responsible for controlling the
flow of information into and out of the memory cell, and the memory cell is used to store
information over time.
The working of LSTM can be explained as follows:
Forget gate: The first step in an LSTM cell is the forget gate, which determines which
information from the previous hidden state and the current input should be discarded. The
forget gate takes as input the previous hidden state h(t-1) and the current input x(t) and
outputs a value between 0 and 1 for each element in the memory cell, indicating how much
of the previous information to keep.
Input gate: The input gate determines which new information to add to the memory cell.
The input gate takes as input the previous hidden state h(t-1) and the current input x(t), and
passes them through a sigmoid function to generate a value between 0 and 1 for each
element in the memory cell, indicating the importance of the new information.
Candidate value: The candidate value is a vector of new information that could be added to
the memory cell. The candidate value is calculated using a tanh function that squashes the
values between -1 and 1.
Update the memory cell: The next step is to update the memory cell by combining the
values from the forget gate and the input gate. The forget gate determines which values to
discard from the previous memory cell, while the input gate determines which values to add
from the candidate value.
Output gate: The output gate determines which information to output from the current
hidden state. The output gate takes as input the previous hidden state h(t-1), the current
input x(t), and the updated memory cell c(t), and passes them through a sigmoid function to
generate a value between 0 and 1 for each element in the hidden state, indicating how
much of the information to output.
Hidden state: The final step is to generate the new hidden state using the updated memory
cell and the output gate. The hidden state is calculated using a tanh function that squashes
the values between -1 and 1, and is then multiplied element-wise with the output gate to
determine which information to output.
This process is repeated for each time step in the input sequence, allowing the LSTM to
learn and remember long-term dependencies in the input sequence. By selectively retaining
or discarding information from previous time steps, the LSTM can effectively handle
vanishing gradients and make use of the relevant information in the input sequence.
18. Discuss GAN with its applications.
Generative Adversarial Networks (GANs) are a type of deep learning algorithm that can
learn to generate realistic-looking data by training two neural networks in competition with
each other. The first network, known as the generator, produces new samples of data that
are intended to be indistinguishable from real data. The second network, known as the
discriminator, tries to distinguish between the real data and the fake data produced by the
generator.
The GAN training process involves iteratively updating the generator and discriminator
networks, with the generator trying to produce increasingly realistic data and the
discriminator trying to become better at distinguishing between real and fake data. Over
time, the generator learns to produce data that is more and more difficult for the
discriminator to distinguish from real data, resulting in a model that can generate high-
quality synthetic data.

One of the key advantages of GANs is their ability to generate large quantities of realistic-
looking data without requiring any explicit knowledge of the underlying data distribution.
This makes them particularly useful in a wide range of applications, including:
1. Image and video generation: GANs can be used to generate high-quality images and
videos that are virtually indistinguishable from real ones. This has applications in fields such
as computer graphics, virtual reality, and gaming.
2. Data augmentation: GANs can be used to generate new data samples that can be used to
augment existing datasets, which can improve the performance of machine learning models
trained on limited data.
3. Style transfer: GANs can be used to transfer the style of one image to another, resulting in
a new image that combines the content of one image with the style of another. This has
applications in fields such as art and design.
4. Anomaly detection: GANs can be used to identify anomalous data samples that do not
conform to the expected data distribution, which has applications in fields such as fraud
detection and cybersecurity.
5. Drug discovery: GANs can be used to generate new drug molecules with specific
properties, which can accelerate the drug discovery process and lead to the development of
new treatments for various diseases.
Overall, GANs are a powerful tool for generating realistic-looking data, and their applications
span a wide range of fields. While they have shown great promise, however, there are still
many challenges to be addressed, including the need for better training algorithms, more
efficient architectures, and improved methods for evaluating the quality of generated data.
19. A working model of GAN
The working model of a Generative Adversarial Network (GAN) involves two neural
networks - a generator and a discriminator - that are trained in competition with each other.
The generator is trained to generate realistic-looking data, while the discriminator is trained
to distinguish between real data and the fake data produced by the generator.
Here's how the GAN model works in more detail:
Initialization: Both the generator and discriminator are initialized with random weights.
Generator: The generator takes a random noise vector as input and produces a new data
sample as output. This data sample is intended to be indistinguishable from real data.
Discriminator: The discriminator takes as input either a real data sample or a fake data
sample produced by the generator, and outputs a probability that the input is real.
Adversarial training: The generator and discriminator are trained in competition with each
other. The generator tries to produce fake data that is indistinguishable from real data,
while the discriminator tries to become better at distinguishing between real and fake data.
Loss function: The generator and discriminator are trained using a loss function that
measures how well they are performing their respective tasks. The generator's loss is based
on how well the discriminator is fooled by its fake data, while the discriminator's loss is
based on how well it can distinguish between real and fake data.
Update weights: The weights of both the generator and discriminator are updated using
backpropagation and stochastic gradient descent, in order to minimize their respective loss
functions.
Repeat: Steps 2-6 are repeated for a fixed number of epochs, or until some stopping
criterion is met.
Through this adversarial training process, the generator learns to produce increasingly
realistic-looking data that can fool the discriminator, while the discriminator learns to
become better at distinguishing between real and fake data. Over time, the generator
produces data that is virtually indistinguishable from real data, resulting in a model that can
generate high-quality synthetic data.
20. What is reinforcement learning and its applications
Refer to your notes and ppts

You might also like