Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

New L J Institute of Engineering and Technology Semester: VII (2023)

NEW L J INSTITUTE OF ENGINEERING & TECHNOLOGY,

BODAKDEV, AHMEDABAD

35 QUESTIONS FOR IMPROVEMENT TEST

B.E. SEMESTER-VII [CSE-AIML]

SUBJECT: Deep Learning: Principles and Practices (3174201)

SYLLABUS
Unit : 1 Introduction to Deep Learning

Unit : 2 Neural Network

Unit : 3 Neural Network Based Fuzzy Systems

Unit : 4 Tensorflow

Unit : 5 Deep Learning Algorithms

Deep Learning Principles and Practices (3174201) Page| 1


New L J Institute of Engineering and Technology Semester: VII (2023)

Important Questions

Unit-1 Introduction to Deep Learning


Q.1 What are the real time applications of Deep Learning? List and explain.
Ans. Deep learning has a wide range of applications across multiple industries and fields.

Some of the most common applications include:

1. Speech recognition: Deep learning is used in speech recognition, voice identification,


and voice synthesis. Applications include voice-controlled assistants, voice-enabled
devices and voice-controlled robots.
2. Predictive analytics: Deep learning is used to analyze historical data and make
predictions about future events. Applications include fraud detection, customer churn
prediction, and demand forecasting.
3. Recommender systems: Deep learning is used to analyze patterns in data to
recommend items to users. Applications include movie and music recommendations,
news recommendations, and product recommendations.
4. Healthcare: Deep learning is used to analyze medical images and patient data, to
improve diagnosis and treatment, and to identify potential health risks. Applications
include cancer diagnosis, drug discovery, and personalized medicine.
5. Finance: Deep learning is used to detect fraudulent transactions, to identify potential
risks, and to make predictions about stock prices. Applications include credit fraud
detection, algorithmic trading, and risk management.
6. Marketing: Deep learning is used to analyze customer data, to predict customer
behavior and to personalize marketing campaigns. Applications include customer
segmentation, customer lifetime value prediction, and personalization
7. Gaming: Deep learning is used to train agents to play games, and to develop intelligent
game-playing algorithms. Applications include game bots, game-playing AI, and
adaptive game design.
8. Robotics: Deep learning is used to enable robots to learn from experience and adapt to
their environment. Applications include autonomous vehicles, drones, and industrial
robots.
9. Cyber Security: Deep learning is used to detect patterns in network traffic, and to
identify and respond to cyber threats. Applications include intrusion detection and
prevention, and malware detection.
10. Visual Recognition

Deep Learning Principles and Practices (3174201) Page| 2


New L J Institute of Engineering and Technology Semester: VII (2023)
Deep Learning models can learn complex data representations, enabling them to
achieve state-of-the-art performance in tasks such as image classification, object
detection, and face recognition.

11. Pixel Restoration

Deep learning algorithms have been applied to various tasks in the pixel restoration
industry, including image denoising, super-resolution, and in painting. These
algorithms have proven to be very effective at restoring images that have been degraded
by noise or other imperfections.

12. Computer vision

Deep learning is used in image and video recognition, object detection, semantic
segmentation, and other computer vision tasks. Applications include self-driving cars,
security cameras, and image recognition for mobile devices.

13. Composing Music

Deep learning algorithms have created systems that can automatically compose music.
These systems use a long short-term memory (LSTM) neural network to generate
musical sequences. LSTM networks are well-suited to music composition because they
can learn complex dependencies and remember long-term information.

14. Natural language processing and pattern recognition

Deep Learning algorithms have revolutionized Natural Language Processing by making


it possible to automatically extract meaning from text. Deep learning is used in natural
language understanding, machine translation, Question answering, sentiment analysis,
and other natural language processing tasks. Applications include chatbots, virtual
assistants, and language-based search engines.

15. Detecting Developmental Delay in Children

Deep learning is a type of machine learning that is well-suited for detecting patterns in
data. It can be used to detect a developmental delay in children by looking for patterns
in data that indicate a delay in development. Deep learning can detect these patterns by
learning from large amounts of data, and this makes it an effective tool for detecting
developmental delays in children.

16. Fraud Detection, News Aggregation and Fraud News Detection, Image recognition
and processing

Deep Learning Principles and Practices (3174201) Page| 3


New L J Institute of Engineering and Technology Semester: VII (2023)
A deep learning approach to image recognition can involve the use of a convolutional
neural network to automatically learn relevant features from sample images and
automatically identify those features in new images.

Q.2 Pen down advantages of Deep Learning.


Ans. Deep learning has several advantages over traditional machine learning methods, some
of the main ones include:
1. Automatic feature learning: Deep learning algorithms can automatically learn features
from the data, which means that they don’t require the features to be hand-engineered.
This is particularly useful for tasks where the features are difficult to define, such as
image recognition.
2. Handling large and complex data: Deep learning algorithms can handle large and
complex datasets that would be difficult for traditional machine learning algorithms to
process. This makes it a useful tool for extracting insights from big data.
3. Improved performance: Deep learning algorithms have been shown to achieve state-
of-the-art performance on a wide range of problems, including image and speech
recognition, natural language processing, and computer vision.
4. Handling non-linear relationships: Deep learning can uncover non-linear relationships
in data that would be difficult to detect through traditional methods.
5. Handling structured and unstructured data: Deep learning algorithms can handle
both structured and unstructured data such as images, text, and audio.
6. Predictive modeling: Deep learning can be used to make predictions about future events
or trends, which can help organizations plan for the future and make strategic decisions.
7. Handling missing data: Deep learning algorithms can handle missing data and still
make predictions, which is useful in real-world applications where data is often
incomplete.
8. Handling sequential data: Deep learning algorithms such as Recurrent Neural
Networks (RNNs) and Long Short-term Memory (LSTM) networks are particularly
suited to handle sequential data such as time series, speech, and text. These algorithms
have the ability to maintain context and memory over time, which allows them to make
predictions or decisions based on past inputs.
9. Scalability: Deep learning models can be easily scaled to handle an increasing amount
of data and can be deployed on cloud platforms and edge devices.
10. Generalization: Deep learning models can generalize well to new situations or contexts,
as they are able to learn abstract and hierarchical representations of the data.
Deep learning has several advantages over traditional machine learning methods,
including automatic feature learning, handling large and complex data, improved
performance, handling non-linear relationships, handling structured and unstructured
data, predictive modeling, handling missing data, handling sequential data, scalability
and generalization ability.

Deep Learning Principles and Practices (3174201) Page| 4


New L J Institute of Engineering and Technology Semester: VII (2023)
Q.3 What are disadvantages of Deep Learning?
Ans. While deep learning has many advantages, there are also some disadvantages to consider:
1. High computational cost: Training deep learning models requires significant
computational resources, including powerful GPUs and large amounts of memory. This
can be costly and time-consuming.
2. Overfitting: Overfitting occurs when a model is trained too well on the training data and
performs poorly on new, unseen data. This is a common problem in deep learning,
especially with large neural networks, and can be caused by a lack of data, a complex
model, or a lack of regularization.
3. Lack of interpretability: Deep learning models, especially those with many layers, can
be complex and difficult to interpret. This can make it difficult to understand how the
model is making predictions and to identify any errors or biases in the model.
4. Dependence on data quality: Deep learning algorithms rely on the quality of the data
they are trained on. If the data is noisy, incomplete, or biased, the model’s performance
will be negatively affected.
5. Data privacy and security concerns: As deep learning models often rely on large
amounts of data, there are concerns about data privacy and security. Misuse of data by
malicious actors can lead to serious consequences like identity theft, financial loss and
invasion of privacy.
6. Lack of domain expertise: Deep learning requires a good understanding of the domain
and the problem you are trying to solve. If the domain expertise is lacking, it can be
difficult to formulate the problem and select the appropriate algorithm.
7. Unforeseen consequences: Deep learning models can lead to unintended consequences,
for example, a biased model can discriminate against certain groups of people, leading to
ethical concerns.
8. Limited to the data its trained on: Deep learning models can only make predictions
based on the data it has been trained on. They may not be able to generalize to new
situations or contexts that were not represented in the training data.
9. Black box models: some deep learning models are considered as “black-box” models, as
it is difficult to understand how the model is making predictions and identifying the factors
that influence the predictions.

Q.4 What are some of the major challenges in the field of deep learning?
Ans. 1. Lots and lots of data

Deep learning algorithms are trained to learn progressively using data. Large data sets are
needed to make sure that the machine delivers desired results. As human brain needs a lot
of experiences to learn and deduce information, the analogous artificial neural network
requires copious amount of data.

2. Overfitting

Deep Learning Principles and Practices (3174201) Page| 5


New L J Institute of Engineering and Technology Semester: VII (2023)
Overfitting occurs when a model learns to perform well on the training data but fails to
generalize to unseen data. It remains a significant challenge in deep learning, especially
when dealing with limited data.
3. Hyperparameter Optimization
Hyperparameters are the parameters whose value is defined prior to the commencement of
the learning process. Changing the value of such parameters by a small amount can invoke
a large change in the performance of your model.
4. Computational Resources
Training deep learning models, especially large ones like deep neural networks and
Transformers, demands substantial computational resources, including GPUs and TPUs.
Access to these resources can be a barrier for many researchers and organizations.
5. Robustness and Adversarial Attacks
Deep learning models can be vulnerable to adversarial attacks, where small, imperceptible
changes to input data can lead to incorrect predictions. Ensuring the robustness of models
against such attacks is an ongoing challenge.
6. Generalization
While deep learning models often perform well on specific tasks, achieving good
generalization across a wide range of tasks with a single model remains a challenge.
7. Transfer Learning and Few-Shot Learning
Developing techniques that allow models to transfer knowledge from one task or domain to
another with limited data (transfer learning) or with very few examples (few-shot learning)
is an active area of research.
8. Real-time Inference
Deploying deep learning models in real-time applications with low latency requirements,
such as autonomous vehicles or robotics, presents challenges in terms of model efficiency
and speed.
9. Long-Term Memory and Reasoning
Enabling deep learning models to perform reasoning over long sequences or make decisions
based on long-term context is an ongoing research challenge.

Unit-2 Neural Network


Q.5 What is an Artificial Neural Network (ANN)? How do you calculate the output of a neuron
in an ANN?
Ans.  Artificial Neural networks represent deep learning using artificial intelligence. Certain
application scenarios are too heavy or out of scope for traditional machine learning
algorithms to handle. As they are commonly known, Neural Network pitches in such
scenarios and fills the gap. Also, enrol in the neural networks and deep learning course
and enhance your skills today.
 Artificial neural networks are inspired by the biological neurons within the human body
which activate under certain circumstances resulting in a related action performed by the
body in response. Artificial neural nets consist of various layers of interconnected
artificial neurons powered by activation functions that help in switching them ON/OFF.
Like traditional machine algorithms, here too, there are certain values that neural nets
learn in the training phase.

Deep Learning Principles and Practices (3174201) Page| 6


New L J Institute of Engineering and Technology Semester: VII (2023)
 Briefly, each neuron receives a multiplied version of inputs and random weights, which
is then added with a static bias value (unique to each neuron layer); this is then passed to
an appropriate activation function which decides the final value to be given out of the
neuron. There are various activation functions available as per the nature of input values.
Once the output is generated from the final neural net layer, loss function (input vs output)
is calculated, and backpropagation is performed where the weights are adjusted to make
the loss minimum.

Elements of a Neural Network


 Input Layer: This layer accepts input features. It provides information from the outside
world to the network, no computation is performed at this layer, nodes here just pass on
the information(features) to the hidden layer.
 Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of
the abstraction provided by any neural network. The hidden layer performs all sorts of
computation on the features entered through the input layer and transfers the result to the
output layer.
 Output Layer: This layer bring up the information learned by the network to the outer
world.
 Finding optimal values of weights is what the overall operation focuses around. Please
refer to the following for better understanding.
 Training data teach neural networks and help improve their accuracy over time. Once the
learning algorithms are fined-tuned, they become powerful computer science and AI tools
because they allow us to very quickly classify and cluster data. Using neural networks,
speech and image recognition tasks can happen in minutes instead of the hours they take
when done manually. Google’s search algorithm is a well-known example of a neural
network.
 Weights are numeric values that are multiplied by inputs. In backpropagation, they are
modified to reduce the loss. In simple words, weights are machine learned values from
Neural Networks. They self-adjust depending on the difference between predicted
outputs vs training inputs.
 Activation Function is a mathematical formula that helps the neuron to switch ON/OFF.

Deep Learning Principles and Practices (3174201) Page| 7


New L J Institute of Engineering and Technology Semester: VII (2023)

How do neural networks work?


Think of each individual node as its own linear regression model, composed of input data,
weights, a bias (or threshold), and an output. The formula would look something like this:
Σwixi + bias = w1x1 + w2x2 + w3x3 + bias
output = f(x) = 1 if Σw1x1 + b>= 0; 0 if Σw1x1 + b < 0
x1 and x2 are the two inputs. They could be an integer , float etc. Here , for example , we assume
them as 1 and 0 respectively.
When these inputs pass through the connections ( through W1 and W2 ) , they are adjusted
depending on the connection weights.
Let us assume that W1 = 0.5 and W2 = 0.6 and W3 = 0.2 , then for adjusting the weights , we
use ,
x1 * W1 + x2 * W2 = 1 * 0.5 + 0 * 0.6 = 0.5
Here we multiply the input with its weight and add them together. Hence , 0.5 is the value
adjusted by the weights of the connection.The connections are assumed to be the dendrites in an
artificial neuron.
Now , for processing the information we need an activation function ( here , it can be assumed
as the soma ) . Here , I have used the simple sigmoid function .
f ( x ) = f ( 0.5 ) = 0.6224 4

The above value could be assumed as the output of the neuron ( axon ). This value needs to be
multiplied by W3 .
0.6224 * W3 = 0.6224 * 0.2 = 0.12448
Now finally we apply the activation function to the above value ,
f ( x ) = f ( 0.12448 ) = 0.5310
Hence y ( final prediction) = 0.5310

In this example the weights were randomly generated. Artificial Neural Network is a supervised
machine learning algorithm usually used for regression problems.

Q.6 Describe the concept of backpropagation and its significance in training neural networks.
Ans. Backpropagation is a fundamental algorithm for training Artificial Neural Networks (ANNs).
It is an abbreviation for "backward propagation of errors," and it is used to update the
network's weights and biases so that the network can learn to make better predictions. Here's
how the backpropagation algorithm works:

1. Initialization:
 Initialize the weights and biases in the network with small random values.
 Define a learning rate (a hyperparameter that controls the step size during weight
updates).
2. Forward Pass:

Deep Learning Principles and Practices (3174201) Page| 8


New L J Institute of Engineering and Technology Semester: VII (2023)
 Input data is fed forward through the network.
 For each layer, calculate the weighted sum of inputs and apply an activation function
to produce the output of that layer. This output is often referred to as the "activation."
3. Compute Error (Loss):
 Calculate the error (often called the loss) between the network's predictions and the
true target values. Common loss functions include Mean Squared Error (MSE) for
regression tasks and Cross-Entropy Loss for classification tasks.
4. Backward Pass (Backpropagation):
 Start from the output layer and move backward through the layers of the network.
 Compute the gradient of the loss with respect to the activations and the weights for
each layer. This is done using the chain rule of calculus.
For each layer:
 Calculate the gradient of the loss with respect to the layer's activations (dL/da), where
L is the loss and a is the activation.
 Use this gradient to calculate the gradient of the loss with respect to the weighted sum
of inputs (dL/dz), where z is the weighted sum.
 Calculate the gradients of the loss with respect to the layer's weights (dL/dw) and
biases (dL/db) using dL/dz.
 Update the layer's weights and biases using these gradients and the learning rate:
 w_new = w_old - (learning_rate * dL/dw)
 b_new = b_old - (learning_rate * dL/db)
5. Repeat:
 Repeat steps 2 to 4 for a specified number of iterations (epochs) or until the loss
converges to a satisfactory level.
6. Training Evaluation:
 Periodically evaluate the trained network on a validation dataset to monitor its
performance and decide when to stop training (early stopping) to prevent overfitting.
7. Inference:
 After training, the network can be used for making predictions on new, unseen data by
performing a forward pass through the trained network.

Deep Learning Principles and Practices (3174201) Page| 9


New L J Institute of Engineering and Technology Semester: VII (2023)

Deep Learning Principles and Practices (3174201) Page| 10


New L J Institute of Engineering and Technology Semester: VII (2023)

Deep Learning Principles and Practices (3174201) Page| 11


New L J Institute of Engineering and Technology Semester: VII (2023)

Deep Learning Principles and Practices (3174201) Page| 12


New L J Institute of Engineering and Technology Semester: VII (2023)

Q. 7 What is activation function in Neural Networks? What is the role of activation function
in Neural Networks?

Ans. In deep learning, an activation function is a mathematical function that determines the output
of a neuron in a neural network, given its input. Activation functions introduce non-linearity to
the network, enabling it to learn and approximate complex relationships in data. Each neuron
applies its activation function to the weighted sum of its inputs and passes the result to the next
layer.

Activation functions serve two primary purposes in neural networks:

Commonly used activation functions include:

Introducing Non-Linearity: Without non-linear activation functions, neural networks would


be limited to representing linear transformations of the input data. Non-linear activation
functions allow the network to capture intricate patterns and relationships within the data.

Enabling Complex Transformations: Activation functions introduce non-linearities that


allow neural networks to approximate any continuous function. This property is essential for
the network's ability to model and solve a wide range of tasks, from image recognition to natural
language processing.

Activation functions play a crucial role in shaping the network's behaviour and determining its
ability to learn complex patterns. Choosing the right activation function depends on the task,
architecture, and characteristics of the data being used.

Deep Learning Principles and Practices (3174201) Page| 13


New L J Institute of Engineering and Technology Semester: VII (2023)
Q. 8 List different activation functions. Explain any two in detail along with formula and
graph.

Ans. Common Activation Functions


 Sigmoid: Emphasize its range between 0 and 1 and how it's useful for binary classification
tasks.
 S-shaped curve, smooth transition.
 ReLU (Rectified Linear Unit): Explain its simplicity and ability to mitigate vanishing gradient
problems.
 Linear for positive values, zero for negative values.
 Tanh (Hyperbolic Tangent): Discuss its range between -1 and 1 and how it's useful for zero-
centered data.
 Similar to sigmoid, centered at zero.
 Softmax: Explain its role in multi-class classification by converting scores into probabilities.
 Converts scores to probabilities, emphasizes the highest score.

Sigmoid Activation Function


The sigmoid activation function is one of the fundamental activation functions used in deep
learning. It is a non-linear function that maps any input value to a value between 0 and 1. The
sigmoid function is defined as:

where x is the input value.

In the graph, the x-axis represents the input


values, and the y-axis represents the output
values after applying the sigmoid function.
As you can see, the sigmoid function maps
input values from negative infinity to
positive infinity onto the range [0, 1]. As
the input becomes more positive, the output
approaches 1, and as the input becomes
more negative, the output approaches 0.
This characteristic makes the sigmoid
function suitable for producing
probabilities and modeling binary classification problems.

Here's a breakdown of the sigmoid activation function in the context of deep learning:

1. Purpose of Sigmoid Function: The sigmoid function is commonly used to introduce non-
linearity to the outputs of neural network layers. It converts the output of a neuron into a

Deep Learning Principles and Practices (3174201) Page| 14


New L J Institute of Engineering and Technology Semester: VII (2023)
probability-like value, making it suitable for tasks involving binary classification or situations
where you want to model a probability distribution.

2. Range and Properties: The sigmoid function maps input values from negative infinity to
positive infinity onto the range [0, 1]. As the input becomes more positive, the output
approaches 1, and as the input becomes more negative, the output approaches 0. This
characteristic makes the sigmoid function suitable for producing probabilities.

3. Gradient and Vanishing Gradient Problem: While the sigmoid function introduces non-
linearity, it suffers from the vanishing gradient problem. For very large or very small inputs,
the gradient of the sigmoid becomes close to zero, leading to slow convergence during training.
This can result in slow learning and difficulty training deep networks.

4. Use Cases: The sigmoid activation function is often used in the following scenarios:

 Output Layer for Binary Classification: In binary classification problems, the sigmoid function
is applied to the output layer to convert the network's raw output into a probability score.
 Hidden Layers in Simple Models: It can be used in hidden layers of shallow networks or simpler
models where the vanishing gradient problem is less pronounced.

5. Drawbacks and Alternatives:

 Vanishing Gradient: The vanishing gradient problem can lead to slow convergence and
difficulty training deep networks.

 Not Suitable for Multi-Class Classification: While sigmoid can handle binary classification, it's
not directly suitable for multi-class classification tasks. The sigmoid activation function has its
uses, other activation functions like ReLU, which mitigate the vanishing gradient problem, are
often preferred in deep networks due to their better training properties.
Softmax Activation Function
The softmax activation function is commonly used in neural networks for multi-class
classification problems. It takes as input a vector of real numbers and transforms them into a
probability distribution over multiple classes. The formula for the softmax function is as
follows:

For a vector of input values z=[z1,z2,…,zk] where k is the number of classes, the softmax
activation for class i is given by:

Deep Learning Principles and Practices (3174201) Page| 15


New L J Institute of Engineering and Technology Semester: VII (2023)

 e represents the base of the natural logarithm (Euler's number).


 zi is the input value for class i.
 The denominator is the sum of exponentials of all input values across all classes.

The softmax function takes the input values and exponentiates them, making them positive. It
then normalizes them by dividing each exponentiated value by the sum of all exponentiated
values, ensuring that the resulting values are between 0 and 1 and that they sum up to 1. These
resulting values can be interpreted as probabilities, with each value indicating the probability
of the input belonging to a specific class.

Range: The output of the softmax activation function for each class is in the range [0, 1], and
the sum of all class probabilities is equal to 1.

ReLU activation function


The Rectified Linear Unit (ReLU) activation function is a widely used non-linear activation
function in neural networks. It introduces non-linearity by allowing the network to learn
complex relationships in the data. The formula for the ReLU activation function is as follows:

 x is the input to the function.


 The function returns x if x is greater than or equal
to 0; otherwise, it returns 0.

Deep Learning Principles and Practices (3174201) Page| 16


New L J Institute of Engineering and Technology Semester: VII (2023)

In mathematical notation, you can represent the ReLU function as:

Range: The range of the ReLU activation function


is [0, ∞), which means that for any input value
greater than or equal to 0, the output will be the
same as the input, and for any input value less than 0, the output will be 0. In other words, it
transforms negative values to 0 and leaves non-negative values unchanged.

Q. 9 Explain Feed Forward Neural Network in detail.

Ans. A feedforward neural network (FFNN) is a type of artificial neural network (ANN) in which
the connections between nodes do not form a cycle. This means that information flows in one
direction only, from the input layer to the output layer, without any feedback loops.

FFNNs are the simplest type of ANN, and they are widely used in a variety of deep learning
applications, such as image classification, object detection, and natural language processing.

FFNN Architecture

An FFNN is typically composed of three layers:

1. Input layer: This layer receives the input data, which can be images, text, or other types of data.

2. Hidden layer(s): These layers perform the computation of the FFNN. They are composed of
neurons, which are connected to each other in a weighted manner. The weights are learned
during the training process.

Deep Learning Principles and Practices (3174201) Page| 17


New L J Institute of Engineering and Technology Semester: VII (2023)
3. 3. Output layer: This layer produces the output of the FFNN, which can be a prediction, a
classification, or a regression value.

FFNN Training

FFNNs are trained using a supervised learning algorithm. This means that they are trained on a
dataset of labeled examples, where the input data is paired with the desired output. The FFNN
learns to predict the output for a given input by minimizing the error between its predictions
and the desired outputs.

The most common training algorithm for FFNNs is backpropagation. Backpropagation is a


gradient descent algorithm that updates the weights of the FFNN in order to minimize the error.

FFNN Applications

FFNNs are used in a wide variety of deep learning applications, including:

 Image classification: FFNNs can be trained to classify images into different categories, such as
cats, dogs, and cars.
 Object detection: FFNNs can be trained to detect objects in images, such as cars, pedestrians,
and traffic signs.
 Natural language processing: FFNNs can be trained to perform natural language processing
tasks, such as machine translation, text summarization, and sentiment analysis.

Advantages of FFNNs

FFNNs have several advantages over other types of ANNs:

 They are relatively simple to implement and train.


 They are very efficient and can be scaled to large datasets.
 They are able to learn complex relationships in the data.

Disadvantages of FFNNs

FFNNs also have some disadvantages:

 They can be susceptible to overfitting, which is when the model learns the training data too
well and is unable to generalize to new data.

Deep Learning Principles and Practices (3174201) Page| 18


New L J Institute of Engineering and Technology Semester: VII (2023)
 They can be difficult to interpret, as it can be difficult to understand how the model makes its
predictions.

Overall, FFNNs are a powerful and versatile tool for deep learning. They are used in a wide
variety of applications and have several advantages over other types of ANNs.

Q. 10 What are deep neural networks (DNNs), and how do they differ from shallow networks?

Ans. Deep Neural Networks (DNNs) are a class of artificial neural networks that consist of multiple
interconnected layers of artificial neurons, also known as nodes or units. These networks are
characterized by their depth, meaning they have a substantial number of hidden layers
between the input and output layers. The depth of DNNs allows them to learn and represent
complex, hierarchical features and patterns in data, making them particularly powerful for
various machine learning tasks.

 The main difference between deep neural networks (DNNs) and shallow neural networks
(SNNs) is the number of hidden layers. DNNs have multiple hidden layers, while SNNs
have only one or two hidden layers.

 Another difference is that DNNs are able to learn more complex patterns in the data than
SNNs. This is because DNNs can learn hierarchical representations of the data, which
means that they can learn to decompose the data into simpler parts and then learn how
these parts interact with each other.

 DNNs are also more robust to noise and uncertainty in the data than SNNs. This is
because DNNs can learn to extract the important information from the data and ignore
the noise.

 Here are some examples of tasks that DNNs are well-suited for:

Deep Learning Principles and Practices (3174201) Page| 19


New L J Institute of Engineering and Technology Semester: VII (2023)
1. Image classification
2. Object detection
3. Natural language processing
4. Machine translation
5. Speech recognition

 Here are some examples of tasks that SNNs are well-suited for:

1. Simple pattern recognition


2. Classification of low-dimensional data
3. Control systems

 Overall, DNNs are more powerful and versatile than SNNs. However, SNNs can be a
good choice for simple tasks or when computational resources are limited.

Q. 11 In what scenarios might Binary Neural Networks be particularly useful?


Ans.  Binary Neural Networks are particularly advantageous in scenarios where computational
efficiency, reduced memory usage, and fast inference are essential.
 They are commonly considered for deployment on resource-constrained devices, edge
computing, and embedded systems.
 BNNs use binary activations and/or binary weights, which can lead to more efficient
computations and reduced memory requirements. Binary operations (like multiplication
and accumulation) are simpler and faster to execute on hardware.
 BNNs can require less memory because binary values use fewer bits compared to
floating-point numbers.
 BNNs are much smaller than traditional neural networks because they use binary values
instead of floating-point numbers. This makes them more suitable for deployment on
mobile devices and other resource-constrained devices.
 BNNs can be inferred much faster than traditional neural networks because binary
operations are much faster than floating-point operations. This makes them suitable for
real-time applications.
 BNNs consume less power than traditional neural networks because binary operations
require less energy than floating-point operations. This makes them suitable for battery-
powered devices.
 BNNs are more robust to noise than traditional neural networks because binary values
are less susceptible to small changes in the input. This makes them suitable for noisy
environments.

Deep Learning Principles and Practices (3174201) Page| 20


New L J Institute of Engineering and Technology Semester: VII (2023)
Unit-3: Neural Network Based Fuzzy Systems
Q. 12 What is a Neural Network Based Fuzzy System (NN-FS), and what is its primary purpose
in deep learning?
Ans. A Neural Network Based Fuzzy System (NN-FS) is a hybrid computational model that
combines the concepts of neural networks and fuzzy logic to perform various tasks,
primarily in the domain of machine learning and control systems. Its primary purpose is to
enhance the model's ability to handle uncertainty, make decisions, or perform data-driven
tasks effectively.

Primary Purposes in Deep Learning:

1. Uncertainty Handling: The primary purpose of a Neural Network Based Fuzzy System
is to handle uncertainty and imprecision in data effectively.
2. Enhanced Decision-Making: By incorporating fuzzy logic concepts, NN-FS enhances the
decision-making capabilities of neural networks
3. Pattern Recognition: In deep learning, NN-FS can be used for pattern recognition tasks
where fuzzy representations and reasoning help capture complex relationships in data,
especially when dealing with noisy or ambiguous information.
4. Interpretability: NN-FS models can be more interpretable than traditional neural
networks, making them useful for applications where explainability and transparency are
essential, such as medical diagnosis or finance.
Control Systems: NN-FS is applied in control systems where it can make intelligent
decisions and adjustments based on qualitative sensor data.
Q. 13 How does an NN-FS combine fuzzy logic and neural network techniques to solve complex
problems?
Ans. Fuzzy systems can be integrated with neural networks to create hybrid models, often referred
to as Neural Network-Based Fuzzy Systems (NN-FS). This integration combines the strengths
of fuzzy logic, which handles uncertainty and qualitative reasoning, with the learning and
adaptive capabilities of neural networks.

The integration of fuzzy systems with neural networks creates a powerful approach for handling
complex, real-world problems. It combines the qualitative reasoning of fuzzy logic with the
data-driven learning capabilities of neural networks, resulting in adaptive, interpretable, and
high-performance systems. The choice of architecture and training method can vary based on
the specific requirements of the application.
Here's a step by step process on how these two can be integrated:

1. Input Fuzzification:

Deep Learning Principles and Practices (3174201) Page| 21


New L J Institute of Engineering and Technology Semester: VII (2023)
The process begins with fuzzifying input data using fuzzy membership functions. This step
converts crisp input data into fuzzy sets, representing the degree to which each input belongs
to predefined linguistic terms (e.g., "low," "medium," "high").

2. Fuzzy Rule Base:

Fuzzy rules define the relationships between the fuzzy input variables and fuzzy output
variables. These rules capture the expert knowledge or decision-making logic. For example, "If
Temperature is Cold and Humidity is Low, Then Increase Heater."

3. Fuzzy Inference:

Fuzzy inference is performed using the fuzzy rules. It calculates the degree to which each rule
is satisfied based on the fuzzified input data. This involves operations such as fuzzy AND,
fuzzy OR, and aggregation.

4. Neural Network Integration:

 The neural network, which can be a feedforward network, recurrent network, or other
architectures, is used to model the relationships between the fuzzy conditions (input variables)
and the fuzzy conclusions (output variables).
 The output of the fuzzy inference, which is a set of fuzzy values, serves as the input to the
neural network.

5. Neural Network Learning:

 The neural network is trained using supervised learning methods. Training data consists of
input-output pairs, where the inputs are fuzzy inference results and the outputs are the desired
targets.
 The neural network learns to approximate the mapping between fuzzy input conditions and
fuzzy conclusions.

6. Defuzzification:

After the neural network has been trained, the fuzzy conclusions derived from the neural
network are typically defuzzified to obtain crisp (non-fuzzy) output values. Various
defuzzification methods, such as centroid or weighted average, can be used.

7. Learning and Adaptation:

Neural networks within the NN-FS can adapt and improve their performance over time through
ongoing learning. This adaptability allows the system to adjust to changing conditions and
optimize rule sets.

Deep Learning Principles and Practices (3174201) Page| 22


New L J Institute of Engineering and Technology Semester: VII (2023)

8. Interpretability and Transparency:

NN-FS models can be designed to maintain the interpretability and transparency of traditional
fuzzy logic systems. This means that the rules and linguistic variables are human-readable and
can be understood and validated by domain experts.

9. Hybrid Modeling and Decision-Making:

The integration of fuzzy systems and neural networks allows for robust and flexible decision-
making in applications where imprecise or uncertain data is prevalent.

Q. 14 Describe how neural networks can be used to realize basic fuzzy logic operators like AND,
OR, and NOT.

Ans. In neural networks, basic fuzzy logic operators (fuzzy AND, fuzzy OR, and fuzzy NOT) can
be realized using various network architectures and activation functions. These operators are
essential components for building Neural Network-Based Fuzzy Systems (NN-FS).

Fuzzy AND Operator:


The fuzzy AND operator computes the intersection of two or more fuzzy sets. In neural
networks, this can be achieved using an element-wise minimum operation. Here's how to realize
a fuzzy AND operator:
def fuzzy_and(a, b):
# Use element-wise minimum to compute the fuzzy AND
return tf.minimum(a, b)

Fuzzy OR Operator:
The fuzzy OR operator computes the union of two or more fuzzy sets. In neural networks, this
can be achieved using an element-wise maximum operation. Here's how to realize a fuzzy OR
operator:
def fuzzy_or(a, b):
# Use element-wise maximum to compute the fuzzy OR
return tf.maximum(a, b)

Fuzzy NOT Operator:


The fuzzy NOT operator computes the complement of a fuzzy set. In neural networks, this can
be realized using an element-wise complement operation.
Here's how to realize a fuzzy NOT operator:

def fuzzy_not(a):
# Use element-wise complement to compute the fuzzy NOT
return 1.0 – a

Deep Learning Principles and Practices (3174201) Page| 23


New L J Institute of Engineering and Technology Semester: VII (2023)

These basic fuzzy logic operators can be implemented as custom activation functions within a
neural network. You can incorporate them into the network architecture when building NN-FS
models to perform fuzzy reasoning and inference.

For more complex fuzzy operations, such as fuzzy implication (IF-THEN rules), you can design
custom neural network layers or modules to capture the relationships between fuzzy conditions
and conclusions. These custom layers can be trained using appropriate loss functions that align
with the desired fuzzy reasoning behavior. The neural network will learn to approximate the
fuzzy logic operations based on the training data.
To realize basic fuzzy logic operators (AND, OR, and NOT) in a neural network context, you
can implement custom activation functions or layers. Below is Python code that demonstrates
how to implement these operators using TensorFlow. These operators are applied element-wise
on tensors, allowing you to integrate them into your neural network models.
Q. 15 Explain the concept of neural network-based fuzzy logic inference. How does it work to
make decisions based on fuzzy rules?
Ans. Neural Network-Based Fuzzy Logic Inference (NN-FS) is a hybrid approach that combines
fuzzy logic and neural networks to perform reasoning and decision-making in situations
involving uncertainty and imprecision. It integrates the qualitative reasoning of fuzzy logic
with the learning and adaptability of neural networks. Here's a detailed explanation of NN-
FS:

1. Fuzzification:
The process starts by fuzzifying the input data. Fuzzification involves converting crisp,
numerical input values into fuzzy sets using membership functions. These membership
functions represent the degree to which an input belongs to different linguistic terms (e.g.,
"low," "medium," "high"). Fuzzification allows the model to handle imprecise and vague
input information.
2. Fuzzy Rules and Rule Base:
 Fuzzy rules are defined to capture expert knowledge or decision-making logic. Each
fuzzy rule is typically in the form of "IF [antecedent] THEN [consequent]." The
antecedent part of the rule consists of conditions based on the fuzzified inputs, and
the consequent part represents the output or decision.
 A rule base is a collection of these fuzzy rules that govern how the system behaves
under different conditions. For example, a rule base for a climate control system
might include rules like "IF Temperature is Cold AND Humidity is Low THEN
Increase Heater."
3. Fuzzy Inference:
 Fuzzy inference involves evaluating the degree to which each rule is satisfied
based on the fuzzified input data. Common methods for fuzzy inference include:

Deep Learning Principles and Practices (3174201) Page| 24


New L J Institute of Engineering and Technology Semester: VII (2023)
1. Mamdani Fuzzy Inference: This method computes the minimum degree of
membership for each antecedent condition within a rule and applies the
minimum operator for aggregation.
2. Sugeno Fuzzy Inference: In this method, the antecedent conditions are
combined with a weighted average to obtain a crisp output.
 The output of the fuzzy inference step is a collection of fuzzy sets representing
the degree of truth for each rule's consequent. These fuzzy sets are often combined
in the next step.
4. Aggregation of Rule Outputs:
The fuzzy sets obtained from the inference step are aggregated to form a combined fuzzy
output. Common aggregation operators include maximum (union) and weighted average. The
result represents the overall fuzzy output that combines the influence of multiple rules.
Q. 16 Describe the architecture of Neural Network based Fuzzy System (NN-FS).
Ans. The structure of an Adaptive Neuro-Fuzzy Inference System can be represented by the
following figure:

ANFIS structure

The ANFIS system consists of five layers:

Layer 1- Input Layer: The input layer consists of neurons that represent the input variables to
the system.

Deep Learning Principles and Practices (3174201) Page| 25


New L J Institute of Engineering and Technology Semester: VII (2023)
Layer 2 – Fuzzification Layer: The fuzzification layer consists of neurons that convert the
input variables into fuzzy sets.

Layer 3 – Relational Layer: The relational layer represents the neurons that represent the
fuzzy rules of the system.

Layer 4 – Aggregation Layer: The aggregation layer aggregates the fuzzy sets produced by the
relational layer to produce a single fuzzy set.

Layer 5 – Output Layer: The output layer defuzzifies the fuzzy set produced by the aggregation
layer to produce a crisp output.

The ANFIS system works by first fuzzifying the input variables. This means that the input
variables are converted into fuzzy sets, which represent the degree to which the input variables
belong to different linguistic categories. For example, the input variable "temperature" could
be fuzzified into the fuzzy sets "cold", "cool", "warm", and "hot".

Once the input variables have been fuzzified, they are passed to the rule layer. The rule layer
applies the fuzzy rules to the fuzzy sets to produce a fuzzy output. The fuzzy rules are typically
defined in terms of the linguistic categories of the input and output variables. For example, the
following fuzzy rule could be used to model the relationship between the temperature and the
fan speed:

IF temperature IS cold THEN fan speed IS slow

The normalization layer normalizes the firing strengths of the fuzzy rules. This ensures that the
sum of the firing strengths of all of the fuzzy rules is equal to 1.

Finally, the output layer defuzzifies the fuzzy output to produce a crisp output. This means that
the fuzzy output is converted into a single numerical value.

Q. 17 How are NN-FS used in the design of neuro fuzzy controllers, and what are their
advantages in control systems?

Ans. NFCs are a powerful tool for the control of complex systems. They are able to learn from data,
adapt to changes in the environment, achieve high accuracy, and are interpretable. This makes
them well-suited for a wide range of control applications.

NFCs are typically composed of two main components:

Fuzzy inference system (FIS): The FIS is responsible for applying fuzzy rules to the inputs of
the controller to generate a fuzzy output.

Deep Learning Principles and Practices (3174201) Page| 26


New L J Institute of Engineering and Technology Semester: VII (2023)
Neural network: The neural network is used to learn the fuzzy rules and the parameters of the
membership functions of the FIS.

1. Hybrid Modeling: NN-FS controllers integrate the symbolic reasoning and linguistic
variables of fuzzy logic with the adaptive learning capabilities of neural networks. This hybrid
approach leverages the strengths of both paradigms to create robust control systems.

2. Adaptability: Neural networks in NN-FS controllers can adapt to changing dynamics and
disturbances in the controlled system. This adaptability is crucial for control systems operating
in uncertain or dynamic environments.

3. Learning from Data: Neural networks can learn control strategies from historical or real-
time data. This learning capability allows the controller to continuously improve its
performance and adapt to evolving conditions.

4. Nonlinear Control: Control systems often need to handle nonlinear dynamics. Neural
networks excel at modeling nonlinear relationships, making them well-suited for controlling
systems with complex and nonlinear behaviors.

5. Real-Time Control: NN-FS controllers can provide real-time decision-making, which is


vital in applications such as robotics, process control, and autonomous vehicles where rapid
adjustments are necessary.

6. Noise Tolerance: Neural network-based fuzzy controllers are robust to noisy input data,
which is essential in practical control systems where sensor measurements may be imprecise or
subject to interference.

7. Improved Performance: NN-FS controllers can outperform traditional PID (Proportional-


Integral-Derivative) controllers in systems with complex or unknown dynamics. They can learn
optimal control policies even in cases where human-designed rules may be challenging to
develop.

8. Interpretable Decisions: By using fuzzy logic, NN-FS controllers produce human-readable


and interpretable control rules. This is valuable in control systems where transparency and the
ability to validate decisions are critical.

9. Adaptive Tuning: The parameters of the fuzzy rules and neural network weights can be
fine-tuned during operation, allowing the controller to continually optimize its performance
based on feedback and changing conditions.

Deep Learning Principles and Practices (3174201) Page| 27


New L J Institute of Engineering and Technology Semester: VII (2023)
10. Applications in Various Domains: NN-FS controllers are applicable to a wide range of
control systems, including robotics, automotive control, industrial automation, and process
control, among others.

Q. 18 Provide examples of recent real-world applications of NN-FS, such as in robotics, finance,


or healthcare.

Ans. Neural Network-Based Fuzzy Systems (NN-FS) have found applications in various domains
due to their ability to handle uncertainty and enhance decision-making in complex systems.
Here are some recent real-world applications of NN-FS:

1. Medical Diagnosis and Decision Support:


NN-FS has been applied in medical diagnosis systems. For instance, it has been used to
assist in diagnosing diseases like diabetes, heart diseases, and cancer by handling uncertain
medical data and providing interpretable diagnostic results.
2. Energy Management and Smart Grids:
In smart grid systems, NN-FS is used to optimize energy distribution and consumption. It
helps in making real-time decisions regarding load balancing, energy generation, and fault
detection while considering uncertain factors such as weather conditions and energy
demand.
3. Natural Language Processing (NLP):
NN-FS can enhance sentiment analysis and language understanding in NLP applications.
It's used to handle linguistic variables and imprecise language constructs, making sentiment
analysis more nuanced and interpretable.
4. Industrial Automation and Control Systems:
In industrial automation, NN-FS is employed to control processes and make decisions in
uncertain environments. It can optimize parameters, control systems, and predict system
behavior while accommodating uncertain factors in manufacturing.
5. Autonomous Vehicles and Robotics:
Autonomous vehicles and robotics use NN-FS for decision-making in dynamic and
uncertain environments. It aids in obstacle avoidance, path planning, and decision-making
under conditions where data may be imprecise or incomplete.
6. Environmental Monitoring:
NN-FS is used for environmental monitoring to assess air quality, predict weather
conditions, and manage resources. It can handle uncertain data sources and provide more
accurate predictions and recommendations.
7. Supply Chain Management:
In supply chain management, NN-FS can optimize inventory management and demand
forecasting. It considers variables like market demand, supplier reliability, and
transportation uncertainties to improve decision-making.
8. Financial Risk Assessment:

Deep Learning Principles and Practices (3174201) Page| 28


New L J Institute of Engineering and Technology Semester: VII (2023)
NN-FS is used in financial risk assessment to evaluate investment risks and make trading
decisions in stock markets. It accommodates uncertain market conditions and financial
data.
9. Agricultural Decision Support:
In agriculture, NN-FS helps in decision-making related to crop management, irrigation,
and pest control. It accounts for uncertainties in weather patterns and soil conditions.
10. Smart Healthcare Systems:
In smart healthcare, NN-FS can be used for personalized treatment recommendation
systems that consider individual patient data, medical history, and uncertainty in medical
conditions.
11. Quality Control in Manufacturing:
In manufacturing industries, NN-FS aids in quality control by analyzing sensor data and
making decisions about product quality while considering noisy or uncertain data.
12. Traffic Management and Control:
NN-FS is applied in traffic management systems to optimize traffic flow, signal timings,
and route planning in urban areas with unpredictable traffic patterns.

These real-world applications demonstrate the versatility of NN-FS in handling uncertainty


and enhancing decision-making across various domains, making it a valuable tool for
addressing complex problems in today's data-driven world.
Q. 19 Can NN-FS be integrated with deep learning architectures like convolutional neural
networks (CNNs) or recurrent neural networks (RNNs)? If so, how?

Ans. Integrating a Neural Network Based Fuzzy System (NN-FS) with deep learning architectures
like Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs) can be a
valuable approach for enhancing the ability of deep learning models to handle uncertainty and
make more interpretable decisions. Here are some ways to integrate NN-FS with CNNs and
RNNs:

1. Preprocessing Using Fuzzy Logic:

 Use fuzzy logic to preprocess the input data before feeding it to a deep learning model. For
example, you can represent data as linguistic variables with fuzzy sets and apply fuzzy inference
to fuzzify and preprocess the data.

2. Fuzzy Activation Functions:

 Replace the standard activation functions in certain layers of the deep learning model with fuzzy
activation functions. These fuzzy activation functions can take into account the uncertainty in
the inputs and provide more robust, fuzzy outputs.

3. Hybrid Models:

Deep Learning Principles and Practices (3174201) Page| 29


New L J Institute of Engineering and Technology Semester: VII (2023)
 Create hybrid models that combine deep learning layers with NN-FS components. For example,
in a CNN, you can incorporate a layer that uses fuzzy logic to evaluate and interpret the features
extracted by the convolutional layers. This can be particularly useful in computer vision tasks.

4. Fuzzy Rules in RNNs:

 Integrate fuzzy rules or reasoning into RNNs, especially in natural language processing tasks.
The fuzzy reasoning can help the model make decisions based on linguistic or imprecise input.

5. Uncertainty Modeling:

 Use NN-FS for uncertainty modeling. Deep learning models, especially in regression tasks, may
benefit from including fuzzy sets to represent output uncertainty. For instance, in a regression
task, the output could be a fuzzy number or a fuzzy set, providing a range of possible values.

6. Enhanced Interpretability:

 Utilize NN-FS to make deep learning models more interpretable. You can introduce fuzzy rules
and linguistic variables that explain the model's decisions in a human-understandable manner.

7. Combination of Predictions:

 Combine predictions from the deep learning model and NN-FS. For instance, in a classification
task, you can use the deep learning model for feature extraction and prediction and then apply
NN-FS to further refine or validate the results based on fuzzy rules.

8. Fine-Tuning and Post-Processing:

 After training a deep learning model, use NN-FS for fine-tuning or post-processing. The fuzzy
system can help improve the model's predictions by considering the certainty or ambiguity of its
outputs.

9. Task-Specific Integration:

 The integration of NN-FS with deep learning models should be task-specific. Depending on the
application, you may choose different levels of integration, from simple preprocessing to creating
complex hybrid models.

10. Hyperparameter Optimization: - NN-FS can be used to optimize the hyperparameters of


deep learning models. Fuzzy logic can help in tuning parameters and learning rates based on
qualitative rules and linguistic variables.

Deep Learning Principles and Practices (3174201) Page| 30


New L J Institute of Engineering and Technology Semester: VII (2023)
Unit-4 Tensorflow
Q. 20 What is TensorFlow, and why is it widely used in deep learning and machine learning?

Ans. TensorFlow is an open-source machine learning framework developed by the Google Brain team.
It has gained widespread popularity in the fields of deep learning and machine learning for
several compelling reasons:

Flexibility and Versatility: TensorFlow is a highly flexible framework that allows developers
to create a wide range of machine learning models, from traditional machine learning algorithms
to complex deep learning models. It supports various machine learning tasks, including
classification, regression, clustering, natural language processing, and computer vision.

Deep Learning Capabilities: TensorFlow offers built-in support for deep learning. It includes
high-level APIs like Keras for building deep neural networks and lower-level functionalities for
advanced users, making it suitable for both beginners and experts in deep learning.

Scalability: TensorFlow can scale from running on a single machine to distributed computing
environments, allowing the training of deep learning models on large datasets across multiple
GPUs and TPUs (Tensor Processing Units).

Community and Ecosystem: TensorFlow has a vast and active user community, which results
in extensive documentation, tutorials, and a wealth of pre-built models and code examples. This
ecosystem simplifies the development of machine learning applications.

Production-Ready: TensorFlow is designed to facilitate the transition from model development


to deployment in real-world applications. TensorFlow Serving and TensorFlow Lite are tools
that enable the deployment of models to production environments, such as mobile devices,
servers, and the cloud.

Hardware Compatibility: TensorFlow is hardware-agnostic and can run on various platforms,


including CPUs, GPUs, TPUs, and even mobile and embedded devices. This flexibility ensures
efficient utilization of available hardware resources.

Support for Multiple Programming Languages: While TensorFlow is primarily associated


with Python, it offers interfaces for other languages, such as TensorFlow.js for JavaScript,
TensorFlow Lite for mobile and embedded devices, and TensorFlow for C++ and Java, making
it accessible to a broad audience.

Visualization Tools: TensorFlow includes TensorBoard, a suite of data visualization tools that
help users understand the behavior and performance of their models. It is crucial for debugging
and monitoring training processes.

Integration with Other Libraries: TensorFlow integrates well with other popular machine
learning libraries, including scikit-learn and popular deep learning frameworks like PyTorch.

Deep Learning Principles and Practices (3174201) Page| 31


New L J Institute of Engineering and Technology Semester: VII (2023)
Customization: TensorFlow allows users to define and customize their own operations and
layers. This flexibility is valuable for implementing novel models and experimental architectures.
Community and Corporate Support: TensorFlow is actively maintained and supported by
Google, which provides ongoing updates, improvements, and support. Its robust ecosystem
benefits from contributions from both individuals and organizations.

Q. 21 Discuss the trade-off between mini batch gradient descent and stochastic gradient descent
(SGD) in deep learning optimization.
Ans. The trade-off between Gradient Descent and mini batch Gradient Descent in deep learning
optimization primarily concerns how gradients are computed and parameter updates are made
during training. Here's a discussion of this trade-off:

Gradient Descent:

 Pros:
 Simplicity: Gradient Descent (GD) is conceptually straightforward. It computes the gradient
using the entire training dataset, making it easy to understand and implement.
 Robust Convergence: GD provides a smoother convergence path due to the use of the entire
dataset. It's less prone to erratic updates.
 Cons:
 Slow Convergence: Because GD processes the entire dataset in each iteration, it can be
computationally expensive and slow, especially for large datasets.

Deep Learning Principles and Practices (3174201) Page| 32


New L J Institute of Engineering and Technology Semester: VII (2023)
 Memory Intensive: It requires holding the entire dataset in memory, which can be a limitation
for very large datasets.

Mini Batch Gradient Descent :

 Pros:
 More Stable: BGD computes the gradient based on a batch of data, which provides a more stable
estimate of the true gradient than using a single data point (as in Stochastic Gradient Descent, or
SGD).
 Speed: BGD is faster to converge than GD for large datasets, as it processes a smaller batch of
data in each iteration.

 Cons:
 Slower Convergence Compared to SGD: While BGD is faster than GD for large datasets, it's
slower to converge compared to Stochastic Gradient Descent (SGD) because it doesn't provide
as many updates per unit of time.
 Memory Usage: Like GD, BGD requires memory for the entire dataset but is less memory-
intensive than GD because it processes smaller batches.

Trade-off Considerations:

 Convergence Speed: GD provides smoother convergence due to the use of the entire dataset but
is the slowest. BGD provides a balance between smooth convergence and speed. If fast
convergence is desired, stochastic methods like SGD might be preferred.
 Memory Usage: GD is the most memory-intensive, requiring the entire dataset in memory. BGD
is more memory-efficient but still requires a reasonable amount of memory. In contrast,
stochastic methods (like SGD) are the most memory-efficient.
 Regularization: BGD, like GD, can provide some implicit regularization due to the averaging
effect over the batch. This can help prevent overfitting, especially when using smaller batch sizes.
 Learning Rate: The choice of learning rate becomes critical for BGD as well. It needs to be
tuned carefully to balance the trade-off between convergence speed and stability.
In practice, Mini-Batch Gradient Descent (using a small batch of data) is often a popular choice
as it combines some of the benefits of both GD and SGD. Mini-batch GD strikes a balance
between convergence speed and smooth optimization while being memory-efficient and is the

Deep Learning Principles and Practices (3174201) Page| 33


New L J Institute of Engineering and Technology Semester: VII (2023)
preferred method for training deep neural networks. The specific choice of optimization
technique depends on factors like the dataset size, available computational resources, and the
optimization requirements of the task at hand.
Q. 22 Explain the concept of overfitting in the context of deep learning optimization. How can
regularization techniques, such as L1 and L2 regularization, help combat overfitting in
deep learning?

Ans. Overfitting in Deep Learning Optimization

Overfitting is a common issue in deep learning optimization that occurs when a model learns the
training data too well, including the noise and random fluctuations. This can lead to poor
performance on unseen data, as the model is unable to generalize to new examples.

Causes of Overfitting

There are several factors that can contribute to overfitting in deep learning, including:

 Model Complexity: A model with too many parameters is more likely to overfit the training data.
 Small Training Dataset: A training dataset that is too small may not contain enough
representative examples to generalize well to unseen data.
 Noisy Training Data: Training data that contains a lot of noise or outliers can lead the model to
learn these patterns instead of the underlying relationships in the data.

Regularization Techniques to Combat Overfitting

Regularization techniques are used to prevent overfitting by penalizing the model for having too
many complex parameters. This can be done in a number of ways, including:

 L1 Regularization: L1 regularization adds a penalty to the absolute value of the model


parameters. This encourages the model to learn sparse parameters, where most of the parameters
are zero.
 L2 Regularization: L2 regularization adds a penalty to the squared value of the model
parameters. This encourages the model to learn smaller parameters.

How Regularization Works

Regularization works by adding a penalty term to the loss function of the model. The penalty
term is proportional to the complexity of the model, so a more complex model will have a larger

Deep Learning Principles and Practices (3174201) Page| 34


New L J Institute of Engineering and Technology Semester: VII (2023)
penalty. This encourages the model to learn simpler parameters that are more likely to generalize
to unseen data.

Benefits of Regularization

Regularization can help to improve the generalization performance of deep learning models. This
is because regularization forces the model to learn more general representations of the data,
which are less likely to overfit to the training data.

L1 vs. L2 Regularization

L1 and L2 regularization are both effective at reducing overfitting in deep learning. However,
they have different properties:

 L1 regularization: L1 regularization tends to produce sparser models, where more of the


parameters are zero. This can be beneficial for problems where there are a lot of irrelevant
features.
 L2 regularization: L2 regularization tends to produce smoother models, where the parameters are
smaller. This can be beneficial for problems where the data is noisy.

Choosing the Right Regularization Technique

The best regularization technique to use will depend on the specific problem and dataset. In
general, it is a good idea to try both L1 and L2 regularization to see which one works best.

Q. 23 Explain the concept of transfer learning and how it can be used to improve optimization in
deep learning.

Ans. In transfer learning, the knowledge of an already trained machine learning model is applied to
a different but related problem. For example, if you trained a simple classifier to predict whether
an image contains a backpack, you could use the knowledge that the model gained during its
training to recognize other objects like sunglasses.

With transfer learning, we basically try to exploit what has been learned in one task to improve
generalization in another. We transfer the weights that a network has learned at “task A” to a
new “task B.”

The general idea is to use the knowledge a model has learned from a task with a lot of available
labeled training data in a new task that doesn't have much data. Instead of starting the learning
process from scratch, we start with patterns learned from solving a related task.

Deep Learning Principles and Practices (3174201) Page| 35


New L J Institute of Engineering and Technology Semester: VII (2023)
Transfer learning is mostly used in computer vision and natural language processing tasks like
sentiment analysis due to the huge amount of computational power required.

How it can be used to improve optimization

How Transfer Learning Works

Transfer learning typically involves the following steps:

Choose a Pre-trained Model: Select a pre-trained model that is relevant to the new task. The
pre-trained model should have been trained on a large dataset and have demonstrated good
performance on a similar task.

Freeze the Pre-trained Model: Freeze the weights of the pre-trained model to prevent them from
being updated during training. This allows the model to retain the general features it has learned
from the previous task.

Add New Layers: Add new layers to the pre-trained model that are specific to the new task.
These new layers will learn the task-specific features.

Fine-tune the Model: Fine-tune the weights of the new layers and the top layers of the pre-
trained model using the training data for the new task. This allows the model to adapt the pre-
trained features to the new task.

Q. 24 What is the role of hardware accelerators (e.g., GPUs and TPUs) in speeding up the
optimization of deep neural networks?

Ans.  TensorFlow provides a high-level API for GPU acceleration through CUDA and cuDNN
libraries. This allows TensorFlow to take advantage of the computational power of NVIDIA
GPUs. For TPUs (Tensor Processing Units), TensorFlow supports TPU-specific hardware
accelerators provided by Google. You can use Google Colab or the TensorFlow Cloud TPU
to access TPUs for deep learning tasks, often resulting in significantly faster training times.
 TensorFlow has native support for GPUs, which are specialized hardware designed for
parallel processing, making them ideal for deep learning tasks. Here's how TensorFlow
enables GPU acceleration:
 GPU-Compatible Operations: TensorFlow automatically identifies operations that can be
executed on a GPU and schedules them accordingly. This is done through the use of CUDA
(Compute Unified Device Architecture) and cuDNN (CUDA Deep Neural Network)
libraries.
 TensorFlow is designed to work seamlessly with Google's TPUs, which are custom hardware
accelerators optimized for machine learning workloads. Here's how TensorFlow enables
TPU acceleration

Deep Learning Principles and Practices (3174201) Page| 36


New L J Institute of Engineering and Technology Semester: VII (2023)
 Native TPU Support: TensorFlow has a specific library called "TPU" that provides native
support for TPUs. You can use the tf.distribute.TPUStrategy to distribute training across
TPUs, and the operations will be automatically compiled to run on TPU hardware.

Cloud TPU Support: Google Cloud offers TPUs as a service, and TensorFlow provides direct
integration with Google Cloud TPUs, allowing you to train and deploy models on these
dedicated TPU accelerators.

Q. 25 What is ‘adam’ optimizer deep learning? Explain formula and mathematical equations of
it.

Ans. The Adam (Adaptive Moment Estimation) optimization algorithm is an advanced gradient-
based optimization technique used to update the weights and biases of neural networks
during training. Adam addresses several challenges of traditional gradient descent methods
like stochastic gradient descent (SGD) and its variants. Here's how Adam works and how
it mitigates some of these challenges:

How Adam Works:

1. Initialization: Adam initializes the parameters, including the learning rate (α), first
moment vector (m), and second moment vector (v), for each weight and bias in the neural
network.
2. Mini-Batch Processing: Like other gradient-based optimizers, Adam processes the
training data in mini-batches, typically chosen randomly from the dataset.
3. Gradient Computation: For each mini-batch, Adam computes the gradient of the loss
function with respect to the network's parameters.
4. First Moment (m): Adam calculates the first moment (m) as an exponentially decaying
average of past gradients. This moving average captures the direction of the gradient
descent and is similar to the momentum term in some other optimization algorithms.
5. Second Moment (v): Adam computes the second moment (v) as an exponentially decaying
average of the squared gradients. This moving average helps in adapting the learning rates
for each parameter based on the magnitudes of the gradients.
6. Bias Correction: To address the bias introduced by initializing m and v with zeros, Adam
applies bias correction. It adjusts m and v by scaling them with a factor that depends on the
current iteration.
7. Update Parameters: Finally, Adam uses the computed first moment (m) and second
moment (v) to update the parameters of the network. It adjusts the learning rate for each
parameter individually based on the history of gradients and squared gradients. This
adaptability allows for faster convergence and stable training.

Advantages of Adam:

Deep Learning Principles and Practices (3174201) Page| 37


New L J Institute of Engineering and Technology Semester: VII (2023)
1. Adaptive Learning Rates: One of the primary advantages of Adam is its adaptive learning
rate. It scales the learning rate for each parameter based on the historical gradients, ensuring
that parameters with large gradients receive smaller updates, and parameters with small
gradients receive larger updates. This adaptability speeds up convergence and reduces the
need for manual learning rate tuning.
2. Momentum-like Updates: Adam's first moment term (m) acts like a momentum term. It
helps in smoothing out the update direction, which can lead to faster convergence and
escape from local minima.
3. Efficient Memory Usage: Adam efficiently maintains moving averages of the gradients
(m and v), allowing for more stable training without requiring an excessive amount of
memory.
4. Convergence Speed: Adam often converges faster than traditional gradient descent
methods, especially in scenarios where the loss landscape has irregular topography.
5. Robustness: Adam is robust to noisy gradients and variations in learning rates, which can
be problematic for other optimizers.

Challenges Addressed:

Adam addresses several challenges compared to traditional gradient descent methods:

1. Adaptive Learning Rate: Adam adapts the learning rate for each parameter individually,
reducing the risk of divergence due to a fixed learning rate.
2. Convergence Speed: By using moving averages of gradients, Adam speeds up
convergence, especially in deep networks.
3. Memory Efficiency: Adam maintains efficient memory usage while using historical
gradient information.
4. Robustness: It is robust to noisy gradients and can handle noisy data effectively.
Q. 26 Explain RMSProp optimizer in detail.

Ans.
RMSProp, which stands for Root Mean Square Propagation, is an adaptive learning rate
optimization algorithm commonly used in deep learning. It was introduced by Geoff Hinton
in 2012 and aimed to address the limitations of gradient descent algorithms, particularly in
dealing with varying gradients.

Background on Gradient Descent

Gradient descent is a widely used optimization algorithm for minimizing a loss function. It
works by iteratively updating the model parameters in the direction of the negative gradient.
The gradient represents the direction of steepest descent, and the learning rate determines the
step size taken in that direction.

Deep Learning Principles and Practices (3174201) Page| 38


New L J Institute of Engineering and Technology Semester: VII (2023)
However, gradient descent can encounter issues when dealing with varying gradients. For
instance, if some gradients are significantly larger than others, the model may take large steps
in those directions, leading to oscillations or even divergence. This can hinder the optimization
process and prevent the model from converging to the optimal solution.

Introducing RMSProp

RMSProp addresses this issue by adapting the learning rate for each parameter individually.
It maintains a moving average of the squared gradients (hence the name RMSProp), which
serves as an estimate of the variance of the gradient. This moving average is updated at each
iteration, and the learning rate for each parameter is divided by the square root of its
corresponding moving average.

This adaptive approach helps to stabilize the learning rate and prevent overly large steps in
directions with large gradients. It also allows for faster updates in directions with smaller
gradients, accelerating the overall optimization process.

Deep Learning Principles and Practices (3174201) Page| 39


New L J Institute of Engineering and Technology Semester: VII (2023)
Formula and Components of RMSProp

The RMSProp update rule can be expressed as:

E_t[g_t] = β * E_{t-1}[g_t] + (1 - β) * g_t^2


g_t_corrected = g_t / √(E_t[g_t] + ε)
Δθ_t = -η * g_t_corrected

where:

 E_t[g_t] is the moving average of squared gradients at time step t


 g_t is the gradient at time step t
 β is the decay rate for the moving average, typically set to 0.9
 ε is a small constant to avoid division by zero
 g_t_corrected is the normalized gradient
 Δθ_t is the update to the model parameter
 η is the learning rate

Benefits of RMSProp

RMSProp offers several advantages over traditional gradient descent algorithms:

 Adaptive learning rate: RMSProp's adaptive learning rate helps to stabilize the optimization
process and prevent oscillations or divergence.

 Handling varying gradients: RMSProp effectively deals with gradients of varying magnitudes,
preventing large steps in directions with large gradients and allowing for faster updates in
directions with smaller gradients.

 Faster convergence: RMSProp often leads to faster convergence compared to traditional


gradient descent, especially for complex tasks with varying gradients.

Drawbacks of RMSProp

While RMSProp is a powerful optimizer, it also has some drawbacks:

 Hyperparameter tuning: RMSProp requires careful tuning of hyperparameters, such as the


learning rate and decay rate, to achieve optimal performance.

Deep Learning Principles and Practices (3174201) Page| 40


New L J Institute of Engineering and Technology Semester: VII (2023)
 Memory overhead: RMSProp maintains moving averages for squared gradients, which can
increase memory consumption, especially for large models.

 Computational cost: The additional computation for maintaining moving averages can slightly
increase the computational cost compared to traditional gradient descent.

Applications of RMSProp

 RMSProp is widely used in various deep learning applications, including:

 Image classification: RMSProp is frequently used to optimize neural networks for image
classification tasks.

 Natural language processing (NLP): RMSProp is commonly used for training language
models and other NLP tasks.

 Speech recognition: RMSProp can be used to improve the accuracy of speech recognition
systems.

Q. 27 Explain the Word2Vec model and its use in generating word embeddings.

Ans. Word2Vec is a neural network-based technique for generating word embeddings. Word
embeddings are numerical representations of words that capture their semantic and syntactic
relationships. Word2Vec is a powerful tool for natural language processing (NLP) tasks, as it
allows us to represent words in a way that is more meaningful than simply using their one-hot
encodings.

How Word2Vec Works

Word2Vec works by training a neural network to predict the context words of a given word.
The context words are the words that appear within a certain window of the target word in a
text corpus. The neural network learns to predict the context words by learning a representation
of the target word that captures its semantic and syntactic relationships.

Two Word2Vec Architectures

There are two main architectures of Word2Vec: Continuous Bag-of-Words (CBOW) and Skip-
gram.

 CBOW: The CBOW architecture takes the context words as input and predicts the target word.

Deep Learning Principles and Practices (3174201) Page| 41


New L J Institute of Engineering and Technology Semester: VII (2023)
 Skip-gram: The Skip-gram architecture takes the target word as input and predicts the context
words.

Training with Negative Sampling

Both CBOW and Skip-gram are trained using a negative sampling algorithm. Negative
sampling involves training the model to distinguish between real context words and negative
context words. Negative context words are words that are randomly sampled from the
vocabulary and are less likely to be the real context words of the target word.

Generating Word Embeddings

Once the Word2Vec model is trained, the word embeddings can be extracted from the hidden
layer of the neural network. The word embeddings can then be used for a variety of NLP tasks,
such as:

 Word similarity: Word embeddings can be used to measure the similarity between two words
by computing the cosine similarity of their embeddings.
 Analogical reasoning: Word embeddings can be used to perform analogical reasoning tasks by
finding the word that is most similar to the sum of two other word embeddings.
 Machine translation: Word embeddings can be used to improve the performance of machine
translation systems by providing a more meaningful representation of the words in the input
sentence.
 Text classification: Word embeddings can be used to improve the performance of text
classification systems by providing a more meaningful representation of the words in the input
text.

Word2Vec Applications

Word2Vec is a versatile tool that can be used for a variety of NLP tasks. Here are a few
examples of how Word2Vec is used in practice:

 Recommender systems: Word2Vec can be used to improve the performance of recommender


systems by providing a more meaningful representation of user queries and items.
 Chatbots: Word2Vec can be used to improve the performance of chatbots by providing a more
natural and engaging way for users to interact with the bot.
 Sentiment analysis: Word2Vec can be used to perform sentiment analysis on text data by
identifying the sentiment of individual words and phrases.

Deep Learning Principles and Practices (3174201) Page| 42


New L J Institute of Engineering and Technology Semester: VII (2023)
 Spam detection: Word2Vec can be used to detect spam messages by identifying patterns in the
language used in the messages.

Unit-5: Deep Learning Algorithms


Q. 28 What is Convolutional Neural Network (CNN)? Explain in detail about various layers of
CNN.

Ans. Convolutional Neural Network (CNN) architecture for image processing is depicted below.
Here's a simplified representation of a CNN architecture:

1. Input Image: The input image is the initial data that the CNN processes. It could be a
grayscale image or a multi-channel (color) image.
2. Convolutional Layer:
 Convolutional filters (kernels) slide over the input image.
 Each filter extracts specific features, like edges or textures.
 Multiple filters create multiple feature maps.
 Activation function (ReLU) is applied to introduce non-linearity.
3. Pooling Layer:
 Max pooling is commonly used (e.g., 2x2 window with the maximum value).
 Reduces spatial dimensions and retains important features.
4. Convolutional Layer (Optional):
 Additional convolutional layers capture higher-level features.
 Each layer can learn more complex patterns.
5. Pooling Layer (Optional):
 Further reduces spatial dimensions.
6. Flattening:
 The feature maps are flattened into a 1D vector.
7. Fully Connected Layers:
 Traditional neural network layers.
 Each neuron is connected to every neuron in the previous layer.
 Neurons learn complex combinations of features.

Deep Learning Principles and Practices (3174201) Page| 43


New L J Institute of Engineering and Technology Semester: VII (2023)
8. Output Layer:
 Activation function depends on the task.
 For binary classification, sigmoid is common.
 For multi-class classification, softmax is used.
9. Training:
 Backpropagation and gradient descent update weights.
 Loss function measures the difference between predictions and actual labels.
 Weights are adjusted to minimize the loss.
10. Regularization and Optimization:
 Dropout, batch normalization prevent overfitting.
 Optimization algorithms (Adam, SGD) update weights efficiently.
11. Transfer Learning (Optional):
 Pre-trained CNNs on large datasets can be fine-tuned.
 Useful when labeled data is limited for the target task.

The actual architecture can vary significantly based on the specific task, dataset, and
desired performance. CNNs can be quite deep with many layers, and recent advancements
have led to architectures like ResNet, Inception, and more, which incorporate various
techniques to improve learning and performance.
Q. 29 What is backpropagation through time (BPTT), and how is it used to train RNNs?

Ans. Backpropagation Through Time (BPTT) is a variation of the backpropagation algorithm


used to train Recurrent Neural Networks (RNNs) for sequence-based tasks. It extends the
traditional backpropagation algorithm to account for the temporal nature of sequential data
and the recurrent connections within RNNs. Here's how BPTT works and how it's used to
train RNNs:

Basic Steps followed in BPTT:

 Forward Pass: Execute the forward pass of the RNN for each time step, processing the
sequence from beginning to end. Compute the predictions and loss at each time step.
 Backward Pass: Starting from the last time step, calculate gradients of the loss with
respect to the model's parameters (weights and biases) and the hidden state at that time
step. Then, move to the previous time step and calculate gradients again, using the
gradients from the next time step to compute the gradients for the current step. Repeat
this process until reaching the first time step.
 Weight Updates: After computing gradients for all time steps, update the model's
parameters using gradient-based optimization algorithms to minimize the loss.

Deep Learning Principles and Practices (3174201) Page| 44


New L J Institute of Engineering and Technology Semester: VII (2023)

 We train it at a particular time "t" as well as all that has happened before time "t" like t-
1, t-2, t-3.
 S1, S2, S3 are the hidden states at time t1, t2, t3, respectively, and Ws is the associated
weight matrix.
 x1, x2, x3 are the inputs at time t1, t2, t3, respectively, and Wx is the associated weight
matrix.
 Y1, Y2, Y3 are the outcomes at time t1, t2, t3, respectively, and Wy is the associated
weight matrix.
 At time t0, we feed input x0 to the network and output y0. At time t1, we provide input
x1 to the network and receive an output y1. From the figure, we can see that to calculate
the outcome. The network uses input x and the cell state from the previous timestamp.
To calculate specific Hidden state and output at each step, here is the formula:

Backpropagation Through Time


To calculate the error in RNN, It is important to note that Ws, Wx, and Wy do not
change across the timestamps, which means that for all inputs in a sequence, the values
of these weights are the same.

Now to calculate the error gradient concerning Ws, Wx, and Wy. It is relatively easy to
calculate the loss derivative concerning Wy as the derivative only depends on the
current timestamp values.
Formula:

But when calculating the derivative of loss concerning Ws and Wx, it becomes tricky.
Formula:

Deep Learning Principles and Practices (3174201) Page| 45


New L J Institute of Engineering and Technology Semester: VII (2023)

The value of s3 depends on s2, which is a function of Ws. Therefore we cannot calculate the
derivative of s3, taking s2 as constant. In RNN networks, the derivative has two parts,
implicit and explicit. We assume all other inputs as constant in the explicit part, whereas we
sum over all the indirect paths in the implicit part.

The general expression can be written as:

Similarly, for Wx, it can be written as:

Now that we have calculated all three derivatives, we can easily update the weights. This
algorithm is known as Backpropagation through time, as we used values across all the
timestamps to calculate the gradients.
Q. 30 Explain the architecture of an LSTM unit, including its input, output, and internal
components.

Ans. LSTM stands for long short-term memory networks, used in the field of Deep Learning. It is a
variety of recurrent neural networks (RNNs) that are capable of learning long-term
dependencies, especially in sequence prediction problems. LSTM has feedback connections,
i.e., it is capable of processing the entire sequence of data, apart from single data points such as
images. This finds application in speech recognition, machine translation, etc. LSTM is a
special kind of RNN, which shows outstanding performance on a large variety of problems.

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN)
architecture designed to overcome the vanishing gradient problem and capture long-term
dependencies in sequential data. They have gained significant popularity in various tasks

Deep Learning Principles and Practices (3174201) Page| 46


New L J Institute of Engineering and Technology Semester: VII (2023)
involving sequential data, such as natural language processing, speech recognition, and time
series forecasting. Here's an explanation of the architecture of LSTM networks:

1. Cell State (C_t):


The cell state is a vector that runs through the entire chain of LSTM units with minor linear
interactions. It acts as a conveyor belt, carrying information across time steps.

It can theoretically store and propagate information over long sequences, which helps LSTM
networks capture long-term dependencies.

2. Hidden State (h_t):


The hidden state is another vector that also runs through the LSTM chain, but it selectively
carries relevant information.

Deep Learning Principles and Practices (3174201) Page| 47


New L J Institute of Engineering and Technology Semester: VII (2023)
It's responsible for capturing and summarizing the relevant features from the current input and
previous hidden state.

3. Three Gates:
LSTMs have three gates that regulate the flow of information through the cell state and the
hidden state:
a) Forget Gate (f_t): It determines which information from the previous cell state should be
thrown away or kept. It takes the previous hidden state (h_(t-1)) and the current input (x_t) as
inputs and produces an output between 0 and 1 for each element in the cell state.
b) Input Gate (i_t): It decides what new information should be added to the cell stateThe input
gate consists of two parts:
 Update Gate (u_t): This gate decides what values will be added to the cell state candidate
(C~_t). It looks at the previous hidden state and the current input.
 New Candidate Values (~C_t): These are the values that the LSTM is considering to add to
the cell state.
Mathematically, the new candidate values (~C_t) are calculated as:

~C_t = tanh(W_c * [h_(t-1), x_t] + b_c)


Here, W_c represents the weight matrix for the candidate values, [h_(t-1), x_t] represents the
concatenation of the previous hidden state (h_(t-1)) and the current input (x_t), and b_c is the
bias term.

New candidate value (~C_t) is the result of applying a hyperbolic tangent activation function
to a linear combination of the previous hidden state and the current input. It represents the
information that the LSTM is considering to add to the cell state, and this information is
controlled by the input gate (i_t). The update gate (u_t) further modulates how much of this
new information is incorporated into the cell state.

c) Output Gate (o_t): It controls what information from the cell state should be used to compute
the hidden state output. The output gate takes the previous hidden state (h_(t-1)) and the current
input (x_t) and produces an output between 0 and 1.

4. Cell State Update:


The new candidate values (~C_t) are scaled by the update gate (u_t) to determine which parts
of the candidate values should be added to the cell state. The update is done element-wise:

C_t = (f_t * C_(t-1)) + (i_t * ~C_t)5.


5. Final Output:
The final output of the LSTM cell is the hidden state (h_t) at the current time step. This output
can be used for making predictions, and it also becomes the previous hidden state for the next
time step.

Deep Learning Principles and Practices (3174201) Page| 48


New L J Institute of Engineering and Technology Semester: VII (2023)
LSTM networks can have multiple LSTM cells stacked on top of each other, forming a deep
LSTM architecture. The output from one LSTM cell becomes the input to the next one. This
stacking allows the network to capture complex hierarchical patterns in sequential data.
Q. 31 Describe the two main components of a GAN and their respective roles.

Ans. There are two main components of GAN:

Generator:

 The generator is a neural network that takes random noise (usually sampled from a simple
distribution like Gaussian) as input and generates data samples.
 It consists of several layers of neurons, typically using convolutional or fully connected
layers, depending on the type of data being generated (e.g., images, text, audio).
 The generator's output is often a tensor that matches the dimensionality and format of the
real data.
 Generated images are then fed to the Discriminator Model.

 The main goal of the Generator is to fool the Discriminator by generating images that look like
real images and thus makes it harder for the Discriminator to classify images as real or fake.

2. Discriminator:

 The discriminator is another neural network that takes data (generated by generator or real-
from the training set) as input and attempts to distinguish between real and fake data.
 Generated data come from the Generator and the real data come from the training data.
 The discriminator's output is a single scalar value, which represents the probability that the
input data is real (1) or fake (0).
 The discriminator model is the simple binary classification model.

Q. 32 Explain the SOM architecture, including its input layer, weight vectors, and topology.

Deep Learning Principles and Practices (3174201) Page| 49


New L J Institute of Engineering and Technology Semester: VII (2023)
Ans. Self Organizing Map (or Kohonen Map or SOM) is a type of Artificial Neural Network
which is also inspired by biological models of neural systems from the 1970s. It follows an
unsupervised learning approach and trained its network through a competitive learning
algorithm. SOM is used for clustering and mapping (or dimensionality reduction) techniques
to map multidimensional data onto lower-dimensional which allows people to reduce
complex problems for easy interpretation. SOM has two layers, one is the Input layer and the
other one is the Output layer.
The architecture of the Self Organizing Map with two clusters and n input features of any
sample is given below:

How do SOM works?


Let’s say an input data of size (m, n) where m is the number of training examples and n is the
number of features in each example. First, it initializes the weights of size (n, C) where C is
the number of clusters. Then iterating over the input data, for each training example, it updates
the winning vector (weight vector with the shortest distance (e.g Euclidean distance) from
training example). Weight updation rule is given by :
wij = wij(old) + alpha(t) * (x ik - wij(old))
where alpha is a learning rate at time t, j denotes the winning vector, i denotes the i th feature
of training example and k denotes the k th training example from the input data. After training
the SOM network, trained weights are used for clustering new examples. A new example falls
in the cluster of winning vectors.

Algorithm

Training:
Step 1: Initialize the weights w ij random value may be assumed. Initialize the learning rate
α.
Step 2: Calculate squared Euclidean distance.
D(j) = Σ (wij – xi)^2 where i=1 to n and j=1 to m

Deep Learning Principles and Practices (3174201) Page| 50


New L J Institute of Engineering and Technology Semester: VII (2023)
Step 3: Find index J, when D(j) is minimum that will be considered as winning index.
Step 4: For each j within a specific neighborhood of j and for all i, calculate the new weight.
wij(new)=wij(old) + α[xi – wij(old)]
Step 5: Update the learning rule by using :
α(t+1) = 0.5 * t
Step 6: Test the Stopping Condition

Q. 33 Write a short note on RBM.

Ans. It is a type of artificial neural network that is used for unsupervised learning. It is a type of
generative model that is capable of learning a probability distribution over a set of input data.

What are Boltzmann Machines?


It is a network of neurons in which all the neurons are connected to each other. In this
machine, there are two layers named visible layer or input layer and hidden layer. The visible
layer is denoted as v and the hidden layer is denoted as the h. In Boltzmann machine, there
is no output layer. Boltzmann machines are random and generative neural networks capable
of learning internal representations and are able to represent and (given enough time) solve
tough combinatoric problems.
What are Restricted Boltzmann Machines (RBM)?
A restricted term refers to that we are not allowed to connect the same type layer to each
other. In other words, the two neurons of the input layer or hidden layer can’t connect to
each other. Although the hidden layer and visible layer can be connected to each other.
As in this machine, there is no output layer so the question arises how we are going to
identify, adjust the weights and how to measure the that our prediction is accurate or not.
All the
How do Restricted Boltzmann Machines work?
In RBM there are two phases through which the entire RBM works:
1st Phase: In this phase, we take the input layer and using the concept of weights and biased
we are going to activate the hidden layer. This process is said to be Feed Forward Pass. In
Feed Forward Pass we are identifying the positive association and negative association.
Feed Forward Equation:
 Positive Association — When the association between the visible unit and the hidden unit
is positive.
 Negative Association — When the association between the visible unit and the hidden unit
is negative.

2nd Phase: As we don’t have any output layer. Instead of calculating the output layer, we
are reconstructing the input layer through the activated hidden state. This process is said to
be Feed Backward Pass. We are just backtracking the input layer through the activated

Deep Learning Principles and Practices (3174201) Page| 51


New L J Institute of Engineering and Technology Semester: VII (2023)
hidden neurons. After performing this we have reconstructed Input through the activated
hidden state. So, we can calculate the error and adjust weight in this way:
Feed Backward Equation:
 Error = Reconstructed Input Layer-Actual Input layer
 Adjust Weight = Input*error*learning rate (0.1)
After doing all the steps we get the pattern that is responsible to activate the hidden neurons.
To understand how it works:
Let us consider an example in which we have some assumption that V1 visible unit activates
the h1 and h2 hidden unit and V2 visible unit activates the h2 and h3 hidden. Now when
any new visible unit let V5 has come into the machine and it also activates the h1 and h2
unit. So, we can back trace the hidden units easily and also identify that the characteristics
of the new V5 neuron is matching with that of V1. This is because V1 also activated the
same hidden unit earlier.

Q. 34 Write a short note on DBN.

Ans. DBN Introduction

 DBN is a Unsupervised Probabilistic Deep learning algorithm.

 DBN is a generative hybrid graphical model. Top two layers are undirected. Lower layers have
directed connections from layers above.
 Deep Belief Networks (DBNs) are a type of deep learning architecture combining unsupervised
learning principles and neural networks.
 They are composed of layers of Restricted Boltzmann Machines (RBMs), which are trained
one at a time in an unsupervised manner. The output of one RBM is used as the input to the

Deep Learning Principles and Practices (3174201) Page| 52


New L J Institute of Engineering and Technology Semester: VII (2023)
next RBM, and the final output is used for supervised learning tasks such as classification or
regression.

Deep Belief Network

DBNs have been used in various applications, including image recognition, speech
recognition, and natural language processing. Since they don't use raw inputs like RBMs,
DBNs also vary from other deep learning algorithms like autoencoders and restricted
Boltzmann machines (RBMs). They instead operate on an input layer with one neuron for each
input vector and go through numerous levels before arriving at the final layer, where outputs
are produced using probabilities acquired from earlier layers.

Architecture of DBN

The basic structure of a DBN is composed of several layers of RBMs.

It is a stack of Restricted Boltzmann Machine(RBM) or Autoencoders.

Top two layers of DBN are undirected, symmetric memory.

The connections between all lower layers are directed, with the arrows pointed toward the layer
that is closest to the data. Lower Layers have directed acyclic connections that convert
associative memory to observed variables. The lowest layer or the visible units receives the
input data. Input data can be binary or real.

Deep Learning Principles and Practices (3174201) Page| 53


New L J Institute of Engineering and Technology Semester: VII (2023)

 There are no intra layer connections likes RBM


 Hidden units represents features that captures the correlations present in the data
 Two layers are connected by a matrix of symmetrical weights W.
 Every unit in each layer is connected to every unit in the each neighboring layer
 Each RBM in a DBN is trained independently using contrastive divergence, which is an
unsupervised learning method. The gradient of the log-likelihood of the data for the RBM's
parameters can be approximated using this method. The output of one trained RBM is then used
as the input for the subsequent RBM, which is done by stacking the trained RBMs on top of
one another.
 After the DBN has been trained, supervised learning tasks can be performed on it by adjusting
the weights of the final layer using a supervised learning technique like backpropagation. This
fine-tuning process can improve the DBN's performance on the specific task it was trained for.

Q. 35 Describe the basic architecture of an MLP, including the input layer, hidden layers, and
output layer.

Ans. A multilayer perceptron (MLP) Neural network belongs to the feedforward neural network. It
is an Artificial Neural Network in which all nodes are interconnected with nodes of different
layers.
The word Perceptron was first defined by Frank Rosenblatt in his perceptron program.
Perceptron is a basic unit of an artificial neural network that defines the artificial neuron in the
neural network. It is a supervised learning algorithm that contains nodes’ values, activation
functions, inputs, and node weights to calculate the output.
The Multilayer Perceptron (MLP) Neural Network works only in the forward direction. All
nodes are fully connected to the network. Each node passes its value to the coming node only
in the forward direction. The MLP neural network uses a Backpropagation algorithm to increase
the accuracy of the training model.

Deep Learning Principles and Practices (3174201) Page| 54


New L J Institute of Engineering and Technology Semester: VII (2023)
Structure of MultiLayer Perceptron Neural Network
This netwok has three main layers that combine to form a complete Artificial Neural Network.
These layers are as follows:
Input Layer
It is the initial or starting layer of the Multilayer perceptron. It takes input from the training data
set and forwards it to the hidden layer. There are n input nodes in the input layer. The number
of input nodes depends on the number of dataset features. Each input vector variable is
distributed to each of the nodes of the hidden layer.
Hidden Layer
It is the heart of all Artificial neural networks. This layer comprises all computations of the
neural network. The edges of the hidden layer have weights multiplied by the node values. This
layer uses the activation function.
There can be one or two hidden layers in the model.
Several hidden layer nodes should be accurate as few nodes in the hidden layer make the model
unable to work efficiently with complex data. More nodes will result in an overfitting problem.
Output Layer
This layer gives the estimated output of the Neural Network. The number of nodes in the output
layer depends on the type of problem. For a single targeted variable, use one node. N
classification problem, ANN uses N nodes in the output layer.
Working of MultiLayer Perceptron Neural Network
 The input node represents the feature of the dataset.
 Each input node passes the vector input value to the hidden layer.
 In the hidden layer, each edge has some weight multiplied by the input variable. All the
production values from the hidden nodes are summed together. To generate the output
 The activation function is used in the hidden layer to identify the active nodes.
 The output is passed to the output layer.
 Calculate the difference between predicted and actual output at the output layer.
 The model uses backpropagation after calculating the predicted output.

Deep Learning Principles and Practices (3174201) Page| 55

You might also like