DL Endsem 2024 FlyHigh Services

Q.
Explain Pooling Layer with its need

and different types. → Pooling layers
reduce spatial dimensions of feature maps
while retaining important information.
Need: 1. Dimensionality reduction for
computational efficiency and controlling
overfitting, balancing model complexity
and performance. 2. Achieve translation
invariance by summarizing local features,
ensuring robustness to shifts in input data
distribution. 3. Preserve essential features
while reducing spatial dimensions,
ensuring efficient representation learning.
Types of Pooling Layer: 1. Max Pooling:
- Retains maximum value, capturing
prominent features, commonly employed
for down sampling in CNNs. - Discards
non-maximal information, simplifying
subsequent computations, albeit potentially
losing finer details. - Facilitates efficient
reduction of spatial dimensions, crucial for
scalable feature extraction in deep Working of CNN OR CNN architecture
networks. - Introduces slight translation OR features of CNN : 1. Convolutional
invariance, aiding robust feature extraction Layer: The core building block of a CNN
across varying spatial locations. 2. Average is the convolutional layer. It applies a set of
Pooling: - Computes average, preserving learnable filters (also known as kernels) to
overall statistical information, useful for the input image. Each filter is a small grid
simple dimensionality reduction. - May that slides over the input image, performing
blur features by averaging, reducing element-wise multiplication with the region
discriminative power, suitable for noise it covers and then summing up the results to
reduction in feature maps. - Provides produce a single value in the output feature
smoother downsampling, offering a more map. 2. Activation Function: After the
gradual transition between feature convolution operation, an activation
representations. - Less prone to overfitting function is applied element-wise to
due to its smoothing effect on features, introduce non-linearity into the network.
enhancing generalization performance. 3. Common choices for activation functions
Min Pooling: 1. Min pooling is a pooling include ReLU (Rectified Linear Unit),
operation used in convolutional neural which replaces negative values with zero,
networks (CNNs) for down-sampling helping the network learn faster and
feature maps. 2. It aggregates information preventing the vanishing gradient problem.
by selecting the minimum value within 3. Pooling Layer: Pooling layers are often
each pooling region. 3. Unlike max pooling, inserted between successive convolutional
which emphasizes dominant features, min layers to reduce the spatial dimensions of
pooling highlights the smallest features. the feature maps while retaining important
information. Max pooling is a commonly
used pooling operation, which extracts the
maximum value from a small region of the Q. ReLU Layer: ReLU, or Rectified
input feature map. It helps in reducing Linear Unit, is an activation function
computational complexity and controlling commonly used in neural networks,
overfitting. 4. Fully Connected Layer: including Convolutional Neural Networks
After several convolutional and pooling (CNNs). Mathematically, ReLU is defined
layers, the high-level reasoning in the as: f(x)=max(0,x) Advantages of ReLU
neural network is done via fully connected over Sigmoid: 1. Sparse Activation: One
layers. These layers connect every neuron significant advantage of ReLU over
in one layer to every neuron in the next sigmoid is that it induces sparsity in the
layer, similar to traditional neural networks. network. When the input to a ReLU neuron
They take the high-level filtered features is negative, the neuron output is zero. This
from the convolutional layers and use them means that fewer neurons are activated,
to classify the input image into various leading to sparser representations in the
classes. 5. Loss Function: CNNs are network. Sparse activations can help reduce
trained using a loss function that measures computational complexity and memory
the difference between the predicted output requirements. 2. Avoidance of Vanishing
and the true labels of the training data. Gradient: Sigmoid activation functions
Common loss functions for classification can suffer from the vanishing gradient
tasks include cross-entropy loss. The goal problem, especially in deep networks. As
during training is to minimize this loss the network learns, gradients can become
function using optimization algorithms like very small, making it difficult for lower
Stochastic Gradient Descent (SGD) or layers to update their weights effectively.
Adam. ReLU mitigates this problem since it does
not saturate in the positive region, allowing
gradients to flow more freely during
backpropagation and speeding up
convergence. 3. Computationally
Efficient: ReLU is computationally more
efficient compared to sigmoid activation.
This is because the ReLU function involves
simpler operations (e.g., max and
comparison) compared to the exponential
calculation involved in the sigmoid
function.
Q. Explain all the features of pooling regularization, reducing the risk of the
layer →Pooling Window: This defines the model memorizing the training data. 8.
size of the grid that the pooling operation is Dropout is particularly effective in deep
applied to across the feature map. Common CNNs with a large number of parameters.
sizes are 2x2 or 3x3, but can be larger 9. It enhances network resilience to noisy
depending on the application. Strides: This inputs and perturbations during inference.
controls how much the pooling window 10. Dropout layers are essential for building
moves across the feature map after each robust and generalizable convolutional
pooling operation. A stride of 1 means it neural networks.
moves one step at a time, while a stride of 2
means it jumps two steps. Padding:
Padding can be applied to the edges of the
feature map to control the output size. This
is useful to avoid shrinking the feature map
too drastically, especially when using large
strides. There are different padding options
like "same" or "valid" which determine how
much padding is added to maintain a
specific output size. Pooling Operation:
This is the core function that determines
how the values within the pooling window
are summarized. The two main types are:
Max Pooling: Takes the maximum value
within the window. This captures the most
dominant feature in that region. Average
Q. Explain stride Convolution with
Pooling: Calculates the average value
example →1. Stride convolution skips
within the window, providing a
pixels during filtering, moving with a
summarized representation of the features
defined step size known as the stride. 2. For
in that area.
example, a stride of 2 means the filter
Q. Explain Dropout Layer in moves two pixels at a time. 3. Consider a
Convolutional Neural Network. → 1. 5x5 input image and a 3x3 filter with a
Dropout layer randomly deactivates stride of 2. 4. The filter starts at the top-left
neurons during training, preventing corner, applies convolution, then moves 2
overfitting in CNNs. 2. It improves network pixels to the right. 5. It repeats this process
generalization by encouraging neurons to until it reaches the end of the row, then
learn more robust features. 3. Dropout moves 2 pixels down. 6. Stride convolution
introduces noise, forcing the network to reduces the output size compared to regular
rely on diverse features for classification. 4. convolution with a stride of 1. 7. It aids in
During each training iteration, dropout downsampling and reducing computational
randomly sets a fraction of neurons' outputs complexity in convolutional neural
to zero. 5. This prevents co-adaptation of networks. 8. Stride convolution helps
neurons, encouraging each neuron to learn control the spatial dimensions of feature
independently useful features. 6. Dropout maps, crucial in network design. 9. This
effectively acts as an ensemble technique, technique plays a vital role in efficient
training multiple models simultaneously feature extraction and dimensionality
with shared parameters. 7. It imposes reduction.
Q. Explain Padding and its types → learning ability. - By normalizing responses
Padding is a technique used in within local neighborhoods, LRN prevents
convolutional neural networks (CNNs) to saturation and promotes more effective
preserve the spatial dimensions of feature learning. - It enhances the contrast between
maps across convolutional layers. It different features in the feature maps,
involves adding additional pixels or values making it easier for subsequent layers to
around the input image or feature map distinguish between them. - LRN acts as a
before applying the convolution operation. form of regularization by introducing
Types of padding: 1. Valid Padding (No competition between neurons, which helps
Padding): - In valid padding, no padding is prevent overfitting and improves the
added to the input image or feature map. - generalization ability of the network.
With valid padding, the convolution
Q. What are the applications of
operation is applied only to the valid
Convolution with examples? → 1. Image
positions where the filter and the input
Processing: - Image Blurring:
overlap completely. - As a result, the
Convolution can be used to apply blurring
spatial dimensions of the output feature
effects to images. For example, a Gaussian
map are reduced compared to the input. -
blur filter is applied to reduce noise or
This type of padding is often used when we
smooth images. - Image Sharpening:
want to reduce the spatial dimensions of the
Convolution can enhance edges and details
feature maps, such as in down sampling
in images by applying sharpening filters. 2.
operations. 2. Same Padding: - In same
Computer Vision: - Object Detection:
padding, padding is added to the input
Convolutional Neural Networks (CNNs)
image or feature map in such a way that the
use convolutional layers to detect objects in
output feature map has the same spatial
images. - Image Classification: CNNs use
dimensions as the input. - To achieve this,
convolutional layers to extract features
the number of rows and columns of padding
from images and classify them into
added to the input image or feature map is
different categories. 3. Natural Language
calculated based on the size of the
Processing (NLP): - Text Classification:
convolutional kernel (filter) used and the
Convolutional Neural Networks can be
desired output size. - Same padding
applied to text data by treating words as
ensures that the convolutional operation
sequences of vectors. 4. Audio
covers the entire input image or feature
Processing: - Speech Recognition:
map, including its borders.
Convolutional Neural Networks can
Q. Explain Local response normalization process audio signals for tasks like speech
and need of it → Local Response recognition by treating audio spectrograms
Normalization (LRN) is a technique used in as 2D images and applying convolutional
convolutional neural networks (CNNs) to layers to extract features. 5. Biomedical
normalize the responses at each location in Imaging: - Medical Image Analysis:
the feature maps produced by convolutional Convolutional networks are used for tasks
layers. - LRN operates on a local like tumor detection, organ segmentation,
neighborhood of activations within each and disease diagnosis in medical images
feature map. Needs for LRN : - LRN such as X-rays, CT scans, and
addresses the issue of some neurons histopathological images.
becoming highly activated while others
remain inactive, which can lead to
saturation and diminish the network's
Typical Settings for Convolutional The Interleaving between Layers: 1.
Network: 1. Convolutional layers typically Interleaving refers to the arrangement of
use small filters like 3x3 or 5x5 to extract convolutional, pooling, and fully connected
local features. 2. Pooling layers, often max layers. 2. Convolutional layers extract
pooling, follow convolutional layers to features from input images using learnable
downsample feature maps. 3. Activation filters. 3. Pooling layers reduce spatial
functions like ReLU are commonly used to dimensions, aiding in translation invariance
introduce non-linearity after convolution. 4. and computational efficiency. 4. Fully
Stride size of 1 is common for connected layers process flattened feature
convolutional layers, preserving spatial vectors for classification or regression
dimensions. 5. Batch normalization is tasks. 5. The typical interleaving order
frequently applied to stabilize and starts with convolutional layers and
accelerate training. 6. Dropout alternates with pooling layers. 6. The final
regularization is sometimes used to prevent layers often include fully connected layers
overfitting during training. 7. Increasing followed by a softmax layer for
depth and width of convolutional layers classification. 7. Activation functions like
aids in learning hierarchical features. 8. ReLU are commonly applied after each
Padding, often "same" padding, is used to layer to introduce non-linearity. 8.
maintain spatial dimensions after Regularization techniques like dropout may
convolution. 9. Learning rate decay be interleaved between fully connected
schedules help in optimizing training layers.
convergence and performance.
Training a Convolutional Network: 1.
Fully Connected Layers: 1. Fully Convolutional networks are trained using
connected layers connect every neuron in gradient-based optimization algorithms like
one layer to every neuron in the next. 2. stochastic gradient descent (SGD). 2.
They're typically added after convolutional During training, input images and their
and pooling layers for classification tasks. corresponding labels are passed through the
3. Input from the last convolutional layer is network. 3. The network computes
flattened into a vector before entering fully predictions for each input and compares
connected layers. 4. Activation functions them with the true labels using a loss
like ReLU are commonly applied to fully function. 4. Common loss functions for
connected layers for non-linearity. 5. classification tasks include cross-entropy
Dropout regularization is often used in fully loss. 5. Backpropagation is then used to
connected layers to prevent overfitting. 6. compute gradients of the loss with respect
The number of neurons in the last fully to the network parameters. 6. These
connected layer matches the number of gradients are used to update the parameters
classes for classification. 7. Weight in the direction that minimizes the loss. 7.
regularization techniques like L2 Training is typically performed over
regularization may be applied to fully multiple iterations or epochs, where the
connected layers. 8. Bias terms are often entire dataset is processed. 8.
included along with weights in fully Hyperparameters such as learning rate,
connected layers. 9. Fully connected layers batch size, and optimizer choice are tuned
are usually followed by a softmax layer for to optimize training performance.
classification. 10. They're responsible for
combining extracted features and making
final predictions.
Unit - 4 Long Short-Term Memory (LSTM) and
GRU: - Variants like LSTM and GRU
Q. What is RNN? What is need of RNN? introduce gating mechanisms to control the
Explain in brief about working of RNN flow of information and address issues like
(Recurrent Neural Network) → vanishing gradients, enabling RNNs to
Recurrent Neural Networks (RNNs) are a learn long-range dependencies more
class of artificial neural networks designed effectively.
to handle sequential data by introducing
loops within the network architecture, Q.RNN architecture: Basic Components:
allowing information to persist across Input Layer: Receives the elements of the
different time steps. - RNNs share the same sequence one at a time. Hidden Layer(s):
set of weights across all time steps, This is where the magic happens. It
enabling them to learn from sequential data contains special recurrent units that process
and capture long-term dependencies. the current input along with information
from the previous element(s). This
Q. Need of RNN: - Sequential Data "memory" allows the network to
Processing: - RNNs are essential for tasks understand the context of the sequence.
where the order of input elements matters, Output Layer: Generates the output for the
such as time series prediction, speech current element, which can be a prediction,
recognition, and natural language classification, or another element in the
processing. - Temporal Dependencies: - sequence itself. Recurrent Unit: 1.
RNNs are capable of capturing temporal Processes current input: It receives the
dependencies and patterns in sequential current element from the input layer. 2.
data, enabling them to model complex Combines with past information: It
relationships over time. - Variable Length combines the current input with the output
Inputs: - RNNs can handle input from the previous recurrent unit in the
sequences of variable length, making them sequence. This output carries the
suitable for tasks with inputs of different information from past elements. 3.
lengths, such as text processing and speech Activation Function: Applies an activation
recognition. function (like tanh or ReLU) to transform
the combined information. 4. Output:
Q. Working of RNN: - Temporal
Generates an output that considers both the
Processing: - At each time step, an RNN
current input and the context from previous
receives an input and the hidden state from
elements.
the previous time step, combines them
using activation functions, and produces an
output and a new hidden state. - Parameter
Sharing: - RNNs share the same set of
weights across all time steps, enabling them
to learn from sequential data by capturing
temporal dependencies and patterns. -
Backpropagation Through Time
(BPTT): - RNNs are trained using the
backpropagation through time (BPTT)
algorithm, which unfolds the network
through time and applies backpropagation
to calculate gradients and update weights. -
Types of RNNs (Recurrent Neural sequences. LSTMs introduce a concept
Networks): Vanilla RNN: The simplest called a "cell" that controls the flow of
RNN, but prone to the vanishing gradient information through the network.
problem limiting its ability to learn long-
Working of LSTM OR Architecture of
term dependencies. Long Short-Term
LSTM: 1. Input Gate: Decides which
Memory (LSTM): Introduces gating
information from the current input and the
mechanisms to control information flow
previous cell's output is relevant. 2. Forget
and effectively learn long-term
Gate: Determines what information to
dependencies in sequences. Well-suited for
forget from the previous cell's output. 3.
tasks like speech recognition and machine
Cell State: Acts as the memory of the cell,
translation. Bidirectional RNN (BiRNN):
storing the relevant information. 4. Output
Processes data in both forward and
Gate: Controls what information from the
backward directions simultaneously,
cell state is passed on as the output to the
capturing context from both sides of a
next cell. LSTM Uses: 1. Machine
sequence. Useful for tasks like sentiment
translation 2. Speech recognition 3. Text
analysis and text summarization.
generation 4. Time series forecasting
Training RNNs: 1. Prepare your data: -
Split your data into sequences (e.g.,
sentences for text data). - Decide on the
input and output sequence lengths. 2.
Choose your RNN architecture: Select the
type of RNN (e.g., LSTM) based on your
task and data characteristics. 3. Define the
loss function: This function measures how
well the RNN's predictions deviate from the
actual targets. 4. Backpropagation:
Similar to feed-forward networks, errors
are propagated backward through the
unfolded computational graph to update the
network weights. However, the recurrent
connections introduce additional Bidirectional LSTM : Bidirectional
complexity in calculating gradients. 5. LSTMs are an extension of LSTMs that
Optimization: Use optimization process data in both forward and backward
algorithms like Adam or RMSProp to adjust directions simultaneously. This allows the
the weights based on the calculated BiLSTM to capture dependencies that
gradients. 6. Iteration: Repeat steps 3-5 for might be missed by a standard LSTM that
multiple epochs (full passes through the only processes data in one direction.
training data) until the RNN converges and Bidirectional LSTM works: 1. It consists
achieves desired performance. of two separate LSTM layers. 2. One layer
processes the input sequence in the forward
LSTM (Long Short-Term Memory): direction (left to right). 3. The other layer
LSTMs are a type of Recurrent Neural processes the reversed input sequence in the
Network (RNN) designed to address the backward direction (right to left). 4. The
vanishing gradient problem that plagues outputs from both layers are then combined
standard RNNs. This problem limits RNNs' (usually by concatenation) at each time
ability to learn long-term dependencies in step. Bidirectional LSTM Uses: 1.
Sentiment analysis 2. Machine translation learning for tasks that involve converting
3. Text summarization 4. Question one sequence of data to another sequence.
answering It's particularly useful for problems like
machine translation, text summarization,
Q. Explain Unfolding computational
and code generation. Here's how it works:
graphs with example → Unfolding
Encoder: The encoder takes an input
computational graphs is a technique used to
sequence (e.g., a sentence in one language).
visualize and understand the computations
It typically consists of recurrent neural
performed by recurrent neural networks
networks (RNNs) like LSTMs, which
(RNNs) over multiple time steps. It
process the input sequence element by
involves expanding the network
element. Decoder: The decoder receives
architecture across time, creating a graph
the context vector from the encoder. It also
that represents the flow of information
uses another RNN to generate the output
through the network over a sequence of
sequence one element at a time (e.g., a
input steps. Example of Unfolding
translated sentence in another language).
computational graphs: Suppose we have
Training: The network is trained on paired
an RNN with one input neuron, one hidden
examples of input and desired output
neuron, and one output neuron. We'll unfold
sequences. During training, the model
this RNN over three time steps to predict
learns to map the input sequences to their
the next item in a sequence based on the
corresponding output sequences by
previous items. 1. Initial State: - At time
minimizing the difference between the
step t = 0 , the RNN starts with an initial
predicted and actual outputs.
hidden state h_{0}, which is typically
initialized to zeros. 2. Processing Time
Steps: - At each time step t = 1, 2, 3, the
RNN receives an input x_{t} and produces
an output y_{t} based on the current input
and the previous hidden state. - The hidden
state h_{t} at each time step is calculated
using the input x_{t} and the previous
hidden state h_{t-1}, along with the
network's parameters (weights and biases).
3. Unfolding: - To unfold the RNN, we
replicate the network architecture across
time steps, creating a sequence of
interconnected layers representing each
time step. - Each layer represents one time
step in the sequence, with connections
between the input, hidden, and output
neurons.
Q. Explain Encoder-Decoder Sequence
to Sequence architecture with its
application. → Encoder-Decoder
Sequence to Sequence (Seq2Seq)
Architecture: The encoder-decoder
architecture is a powerful tool in deep
Applications of Encoder-Decoder (Xt) and the previous cell state (Ct-1) are
Seq2Seq: 1. Machine Translation: This is passed through a fully connected layer
a classic application where the encoder followed by a sigmoid activation function.
processes a sentence in one language, and This operation is denoted as: It = σ(Wi [Ct-
the decoder generates the corresponding 1, Xt] + bi). 3. Candidate Cell State (Ct'):
translation in another language. 2. Text A new candidate value (Ct') is created based
Summarization: The encoder reads a long on the current input (Xt) and a fully
document, and the decoder condenses it connected layer with a hyperbolic tangent
into a shorter summary capturing the key (tanh) activation function. This operation is
points. 3. Chatbots: Encoders can process denoted as: Ct' = tanh(Wc [Ct-1, Xt] + bc)
user queries, and decoders can generate 4. Output Gate: Similar to the previous
natural language responses for chatbots. 4. gates, the current cell state (Ct) and the
Code Generation: Encoders can analyze previous cell state (Ct-1) are passed through
code descriptions or comments, and a fully connected layer with a sigmoid
decoders can generate the corresponding activation function. This operation is
code. denoted as: Ot = σ(Wo [Ct, Xt] + bo) 5.
Updating Cell State and Hidden State:
Q. Differentiate between Recurrent and
This operation is denoted as: Ct = Ft Ct-1
Recursive Neural Network → Recurrent
+ It Ct'. Finally, the output vector (Ot) is
Neural Network (RNN): 1. Handles
element-wise multiplied with the new cell
sequential data like time series and text. 2.
state (Ct) to get the output of the current cell
Processes data step-by-step, with recurrent
(ht).
connections. 3. Maintains memory of past
states for temporal dependencies. 4. Q. Justify RNN is better suited to treat
Suitable for tasks like language modeling sequential data than a feed forward
and speech recognition. 5. Unfolds neural network. → Feedforward
network architecture over time for Network Limitations: Independent
processing. 6. Examples: vanilla RNNs, Processing: Standard feedforward
LSTM, GRUs. Recursive Neural Network networks process each data point in
(ReNN): 1. Handles hierarchical data like isolation. They lack internal memory to
trees and graphs. 2. Processes data consider past information, which is crucial
recursively, capturing hierarchical for understanding sequential data.. Fixed-
relationships. 3. Captures relationships Length Inputs: Feedforward networks
between different parts of the structure. 4. typically require a fixed input size. This
Suitable for tasks like parsing and becomes a limitation for sequences of
sentiment analysis in trees. 5. Constructs a varying lengths, like sentences or time
tree-like structure mirroring input series data. RNN Advantages for
hierarchy. 6. Examples: RNTNs, Recursive Sequential Data: Internal Memory:
Autoencoders, Tree-LSTM networks. RNNs have recurrent connections that
allow them to store information from
Q. Explain how the memory cell in the
previous elements in the sequence.
LSTM is implemented computationally?
Variable Length Inputs: RNNs can handle
→ 1. Forget Gate: We have the previous
sequences of varying lengths by processing
cell state (Ct-1) and the current input (Xt)
them element by element. Modeling
as vectors.This operation is denoted as: Ft =
Dependencies: RNNs can capture long-
σ(Wf [Ct-1, Xt] + bf) 2. Input Gate:
term dependencies within sequences.
Similar to the forget gate, the current input
Deep Recurrent: 1. Deep recurrent neural RNNs maintain long-term memory by
networks (RNNs) have multiple layers, allowing information to persist over time. 2.
allowing for more complex temporal They address vanishing gradient issues by
dependencies modeling. 2. They can preserving information flow through the
capture intricate patterns in sequential data, network. 3. Other strategies include using
making them suitable for tasks like gated units like LSTMs and GRUs to
language modeling and time series manage multiple time scales. 4. These units
prediction. 3. Deep RNNs enable regulate the flow of information,
hierarchical feature learning, extracting facilitating the capture of long-term
high-level representations from raw input dependencies. 5. Strategies for handling
sequences. 4. Training deep RNNs can be multiple time scales improve the ability of
challenging due to vanishing and exploding RNNs to model complex sequential data.
gradients, requiring careful initialization
Optimization for Long-Term
and regularization techniques.
Dependencies: 1. Optimizing RNNs for
The Challenge of Long-Term long-term dependencies involves
Dependencies: 1. Long-term dependencies addressing vanishing and exploding
refer to relationships between distant gradient problems. 2. Techniques like
elements in a sequence. 2. Traditional gradient clipping and careful weight
RNNs struggle to capture long-term initialization help stabilize training. 3.
dependencies due to vanishing gradient Architectural modifications like skip
problems. 3. This limits their ability to connections and highway networks
remember information over extended time facilitate information flow over longer
horizons. 4. Addressing long-term sequences. 4. Adaptive learning rate
dependencies is crucial for tasks like speech algorithms such as AdaGrad and RMSProp
recognition and machine translation. 5. help in optimizing RNN training. 5.
Architectural modifications and specialized Optimization for long-term dependencies
training algorithms are used to mitigate the aims to enhance the ability of RNNs to
challenge of long-term dependencies in capture and retain information over
RNNs. extended time periods.
Echo State Networks: 1. Echo State Explicit Memory: 1. Explicit memory
Networks (ESNs) are a type of recurrent mechanisms in RNNs enable the model to
neural network with a fixed random hidden store and retrieve information over time. 2.
layer. 2. They leverage reservoir Memory cells like those in LSTM and GRU
computing, where the dynamics of the architectures maintain long-term
reservoir capture temporal information. 3. dependencies. 3. Attention mechanisms
ESNs are trained by adjusting only the focus on relevant parts of the input
output weights, simplifying optimization. sequence, aiding in memory recall. 4.
4. They are particularly suited for tasks External memory modules, such as the
requiring processing of temporal data with Neural Turing Machine, provide additional
nonlinear dynamics. 5. ESNs have been storage capacity. 5. Explicit memory
successfully applied in areas such as time enhances RNNs' ability to handle tasks
series prediction and signal processing requiring the retention and manipulation of
tasks. information over extended sequences.
Leaky Units and Other Strategies for
Multiple Time Scales: 1. Leaky units in
Performance Metrics: 1. For RNNs, in RNNs. 2. Hyperparameters include
common performance metrics include learning rate, batch size, number of layers,
accuracy, loss, and perplexity for language and hidden units. 3. Cross-validation helps
modeling tasks. 2. In sequence generation assess hyperparameter performance across
tasks, metrics like BLEU score and different subsets of data. 4. Bayesian
ROUGE score evaluate output quality. 3. optimization methods efficiently search the
For time series prediction, metrics such as hyperparameter space to find optimal
mean squared error (MSE) and mean configurations. 5. Hyperparameter tuning
absolute error (MAE) are used. 4. impacts RNN model performance,
Classification tasks in RNNs often utilize requiring careful consideration and
metrics like precision, recall, and F1-score. experimentation.
5. Performance metrics help assess RNN
models' effectiveness in capturing temporal Unit - 5
dependencies and making accurate
Q. Boltzmann machine → A Boltzmann
predictions.
machine is a type of stochastic recurrent
Default Baseline Models: 1. Simple RNNs neural network composed of interconnected
serve as baseline models for many binary neurons. These neurons are arranged
sequential tasks due to their simplicity and in a bipartite graph, with visible neurons
interpretability. 2. LSTM and GRU representing input data and hidden neurons
networks are commonly used as baseline capturing higher-order features. The
models for tasks requiring memory network learns to generate data by adjusting
retention. 3. Basic recurrent architectures connection weights through unsupervised
without advanced features are employed to learning, aiming to model the underlying
establish performance benchmarks. 4. probability distribution of the input data.
These baseline models provide a starting Boltzmann machines utilize a probabilistic
point for comparing the performance of approach based on the Boltzmann
more complex RNN architectures. distribution, where the probability of a
configuration is determined by its energy
Determining Whether to Gather More relative to the system's temperature.
Data: 1. Assessing model performance on
existing data helps determine if additional Q. Architecture of Boltzmann machine.
data is needed. 2. Techniques like learning → Layers: Visible Layer: This layer
curves and validation performance analysis consists of visible units or nodes that
aid in evaluating data sufficiency. 3. If the represent the input data. Hidden Layer:
model exhibits high variance or instability, This layer consists of hidden units or nodes
gathering more diverse data can improve that are not directly observable. They play
generalization. 4. Domain-specific a crucial role in learning complex patterns
considerations, such as data imbalance or within the data. Connections: Full
data quality, influence the decision to Connectivity: Unlike some neural
collect more data. 5. Balancing the trade-off networks, BM architectures have full
between data collection costs and potential connectivity. This means every unit in the
performance improvements guides the visible layer is connected to every unit in
decision-making process. the hidden layer, and vice versa. Weight
Symmetry: Symmetric Weights: The
Selecting Hyperparameters: 1. Grid connections between visible and hidden
search and random search are common units have symmetric weights. Neurons:
approaches for selecting hyperparameters
Stochastic Units: Unlike traditional neural
networks with activation functions, BM
units are stochastic. This means they don't
use deterministic activation functions to
produce outputs.
Q. Explain GAN (Generative

Adversarial Network) architecture with Q. Do GANs (Generative Adversarial
an example → A Generative Adversarial Network) find real or fake images? If yes
Network (GAN) is a deep learning explain it in detail → Yes, Generative
architecture that uses two neural networks Adversarial Networks (GANs) are used to
competing against each other to create new generate fake images that are realistic
data. Imagine it as a game between a enough to be mistaken for real images.
counterfeiter and a detective. GAN How GANs work to generate realistic
Architecture: Generator: This network fake images: 1. Generator: - The
acts like the counterfeiter. It takes random generator takes random noise as input and
noise as input and tries to generate new generates images from this noise. -
data, like images or text, that are similar to Initially, the generator produces random
the real data from a training dataset. noise that resembles nothing like real
Discriminator: This network acts like the images. - Over time, as it is trained, the
detective. It analyzes both the real data generator learns to generate images that
from the training set and the generated data increasingly resemble real images through
from the generator. The discriminator's job feedback from the discriminator. 2.
is to determine if the data is real or fake. Discriminator: - The discriminator is a
Example: Generating New Cat Images: binary classifier trained to distinguish
The generator would take random noise as between real and fake images. - Initially,
input and try to create a cat image. The the discriminator is trained on a dataset of
discriminator would then analyze this real images and is quite good at
generated image and a real image from the distinguishing them from fake ones. - As
dataset. The discriminator would try to training progresses, the discriminator learns
identify which image is the real cat and to differentiate between real and fake
which one is the fake created by the images more accurately.
generator.
Generative Model: 1. Generates new data Deep Generative Model: 1. Deep
samples. 2. Handled by the generator generative models are neural networks
network. 3. Learns the underlying data designed to learn and generate complex
distribution. 4. Minimizes the difference data distributions. 2. They capture high-
between real and generated data dimensional data like images, text, and
distributions. 5. Produces synthetic data audio by modeling the underlying
samples. Discriminative Model: 1. probability distribution. 3. Variational
Discriminates between real and fake data Autoencoders (VAEs) and Generative
samples. 2. Handled by the discriminator Adversarial Networks (GANs) are popular
network. 3. Learns to classify data into real architectures in this category. 4. VAEs learn
or fake categories. 4. Maximizes the latent representations of data, allowing for
probability of correctly classifying real and probabilistic inference and generation. 5.
generated data. 5. Outputs a binary GANs consist of a generator and a
classification (real or fake) for each input discriminator trained adversarially to
data sample. generate realistic samples. 6. These models
enable tasks like image generation, style
Q. Applications of GANs: 1. Image
transfer, and data augmentation. 7. Deep
Generation: - One of the most prominent
generative models facilitate unsupervised
applications of GANs is in image
and semi-supervised learning by learning
generation, where they are used to create
meaningful representations. 8. They
realistic images from random noise. -
provide a framework for exploring and
GANs can generate high-resolution images
understanding complex data distributions.
of human faces, animals, landscapes, and
more with remarkable detail and realism. 2.
Image-to-Image Translation: - GANs can
perform image-to-image translation, where
they learn to transform images from one
domain to another while preserving
important characteristics. - For example,
GANs can convert satellite images to maps,
sketches to realistic images, black and
white photos to color, and low-resolution
images to high-resolution. 3. Super
Resolution: - GANs can be used for super-
resolution tasks, where they generate high-
resolution images from low-resolution
inputs. - By learning to fill in missing
details and upscaling images, GANs can
enhance the quality of images beyond their
original resolution.4. Text-to-Image
Synthesis: - GANs can synthesize images
from textual descriptions, enabling users to Deep Belief Networks: 1. Deep Belief
generate images based on natural language Networks (DBNs) are probabilistic
input. - By learning the correspondence graphical models with multiple layers of
between textual descriptions and visual latent variables. 2. They consist of a stack
features, GANs can create images that of Restricted Boltzmann Machines (RBMs)
match the semantics of the input text. or a combination of RBMs and fully
connected layers. 3. DBNs are trained layer can provide a more accurate estimate of the
by layer in an unsupervised manner, true gradient by averaging over more data
followed by fine-tuning using points. This can, in some cases, lead to
backpropagation. 4. They learn hierarchical smoother convergence during training.
representations of data, capturing complex Smaller Batch Size: Stability and
patterns and correlations. 5. DBNs are used Exploration: Smaller batches can lead to
for tasks like feature learning, more stable training, especially in the
classification, and dimensionality beginning. The discriminator has less data
reduction. 6. They leverage the generative to learn from, allowing the generator more
power of RBMs and the discriminative room to explore and experiment. Lower
power of neural networks. 7. DBNs have Memory Requirements: Smaller batches
been applied in various domains, including require less memory, making them suitable
image recognition, speech processing, and for training on hardware with limited
natural language processing. 8. Training resources.
DBNs can be computationally intensive,
Q. Explain different types of GAN. → 1.
requiring large datasets and specialized
Vanilla GAN:- The original GAN
hardware.
architecture proposed by Ian Goodfellow in
2014 consists of a generator and a
discriminator network. - The generator
generates fake samples, and the
discriminator tries to distinguish between
real and fake samples. 2. Conditional GAN
(cGAN): - Conditional GANs extend
vanilla GANs by conditioning both the
generator and discriminator on additional
information. - They generate samples
conditioned on auxiliary information, such
as class labels or input images. 3. Deep
Convolutional GAN (DCGAN): -
DCGANs use deep convolutional neural
networks in both the generator and
discriminator architectures. - They leverage
convolutional layers to learn hierarchical
features and generate high-quality images
with greater stability. 4. Wasserstein GAN
(WGAN): - WGANs improve the training
Q. How does GAN training scale with
stability of GANs by using Wasserstein
batch size? → Larger Batch: Faster
distance (also known as Earth Mover's
Training: With a larger batch size, you
distance) as the training objective instead of
update the networks with more information
Jensen-Shannon divergence or Kullback-
in each iteration, potentially leading to
Leibler divergence. 5. Progressive GAN
faster training in terms of wall-clock time.
(ProgGAN): - Progressive GANs generate
This is because you're making fewer total
high-resolution images by progressively
updates for the same number of epochs
growing both the generator and
(passes through the entire dataset).
discriminator architectures.
Improved Gradients: A larger batch size
Unit - 6 make decisions by interacting with an
environment. 2. Deep Learning: - Deep
Q. Explain dynamic programming learning is a subset of machine learning that
algorithms for reinforcement learning → focuses on learning hierarchical
1. Policy Evaluation: - DP algorithms are representations of data using deep neural
often used for policy evaluation, where the networks. 3. Integration of Deep
goal is to estimate the value function V(s) Learning and Reinforcement Learning: -
for a given policy pi . - The Bellman Deep reinforcement learning combines the
expectation equation is used as the basis for principles of reinforcement learning with
iterative algorithms such as the iterative deep learning techniques to handle high-
policy evaluation and the value iteration dimensional input spaces and complex
algorithm. - These algorithms update the decision-making tasks. - Deep neural
value function for each state based on the networks are used to approximate the value
expected return obtained by following the function, policy, or action-value function in
current policy. 2. Policy Improvement: - reinforcement learning algorithms. - By
DP algorithms can also be used for policy leveraging deep neural networks, DRL
improvement, where the goal is to improve algorithms can learn directly from raw
the current policy based on the estimated sensory inputs, such as images or raw
value function. - The policy improvement sensor data, without the need for
theorem states that if a policy pi' is greedy handcrafted features.
with respect to the value function V_{pi}
Q. Explain Simple reinforcement
then pi' is equal to or better than pi - By
learning for Tic-Tac-Toe. → 1.
iteratively evaluating and improving the
Environment: The environment is the Tic-
policy, DP algorithms can converge to an
Tac-Toe board represented as a 3x3 grid. 2.
optimal policy that maximizes the expected
States: A list of 9 elements, where each
return. 3. Value Iteration: - Value iteration
element is 'X', 'O', or empty. A dictionary
is a DP algorithm used to find the optimal
where keys are cell positions and values are
value function V^{}(s) and the optimal
'X', 'O', or empty. 3. Actions: The agent's
policy pi^{} iteratively. - It combines both
actions are the available moves it can make
policy evaluation and policy improvement
on the board. A valid action specifies the
steps in each iteration. - Value iteration
empty cell where the agent wants to place
updates the value function using the
its 'X'. 4. Rewards: Win Reward: +1 for
Bellman optimality equation until
winning the game. Loss Reward: -1 for
convergence to the optimal value function.
losing the game. Draw Reward: 0 for a
Q. What is deep reinforcement learning? draw. 5. Q-Learning Algorithm: We can
Explain in detail. → Deep reinforcement use a simple Q-learning algorithm to train
learning (DRL) is a subfield of machine the agent. Q-learning is an off-policy
learning and artificial intelligence that algorithm, meaning the agent can learn
combines reinforcement learning (RL) from experience even if it's not following
techniques with deep learning methods, the optimal policy yet. 6. Learning
particularly deep neural networks, to solve Process: 1. Start 2. Agent's Move 3. Take
complex decision-making tasks. Action 4. Opponent's Move 5. Reward 6.
Explanation of deep reinforcement Update Q-value 7. Repeat
learning: 1. Reinforcement Learning: -
Reinforcement learning is a type of
machine learning where an agent learns to
Q. Markov Decision Process → A Markov Q. What are the challenges of
Decision Process (MDP) is a mathematical reinforcement learning? Explain any
framework used to model sequential four in detail.→ 1. Sample Efficiency: RL
decision-making problems in the presence algorithms typically learn from interacting
of uncertainty. 1. States (S): - A Markov with the environment, which can be time-
Decision Process consists of a set of states, consuming and resource-intensive,
denoted by S, representing all possible especially in real-world applications where
configurations or situations in which the each interaction may incur costs or risks.
system can be. - States can represent Example: In robotics, training an RL agent
physical locations, game board to perform complex tasks such as grasping
configurations, or any other relevant objects or navigating through cluttered
situation in the problem domain. 2. Actions environments may require thousands or
(A): - For each state s in S the agent can even millions of interactions with the
choose an action a in A(s) from a set of physical robot, which can be time-
available actions - Actions represent the consuming and expensive. 2. Exploration
decisions or choices that the agent can make vs. Exploitation: RL agents must balance
at each state. 3. Transition Probabilities exploration (trying new actions to discover
(T): - Transition probabilities describe the potentially better strategies) and
likelihood of transitioning from one state to exploitation (leveraging known strategies
another after taking a particular action. 4. to maximize immediate rewards), which
Rewards (R): - At each state-action pair (s, can be challenging, especially in
a) , the agent receives an immediate reward environments with unknown dynamics or
R(s, a) . - Rewards represent the immediate stochastic rewards. Example: In
benefit or cost associated with taking a recommendation systems, an RL agent
particular action in a specific state. 5. must explore different options (e.g.,
Policy (π): - A policy pi is a mapping from recommending new items to users) to
states to actions, specifying the agent's discover user preferences while exploiting
decision-making strategy. - The policy known preferences to maximize user
determines which action to take in each satisfaction and engagement. 3.
state to maximize the expected cumulative Generalization and Transfer Learning:
reward. RL agents often struggle to generalize
learned policies to new environments or
tasks, requiring extensive retraining when
the environment changes or when
transferring learned policies to similar but
different tasks. Example: An RL agent
trained to play a specific video game may
struggle to adapt its learned policy to play a
similar but slightly different game,
requiring additional training or fine-tuning.
4. Reward Design and Sparse Rewards:
Designing appropriate reward functions
that effectively guide the learning process
towards desired behaviors is challenging,
particularly in complex environments with
sparse or deceptive rewards.
Q-Learning: 1. Q-Learning is a Deep Q-Networks (DQN): 1. Deep Q-
reinforcement learning algorithm used to Networks (DQN) extend Q-learning to
learn optimal policies in Markov decision handle high-dimensional state spaces by
processes (MDPs). 2. It estimates the value using deep neural networks. 2. They
of taking a particular action in a given state approximate the Q-value function using a
by considering future rewards. 3. The Q- neural network parameterized by weights.
value function represents the maximum 3. DQN employs experience replay, storing
expected cumulative reward achievable transitions in a replay buffer for more
from a state-action pair. 4. Q-learning efficient learning. 4. Target networks are
iteratively updates Q-values based on used to stabilize training by decoupling the
observed rewards and transitions. 5. It uses target and online Q-networks. 5. DQN uses
the Bellman equation to update Q-values gradient descent to minimize the mean
towards the optimal policy. 6. Q-learning is squared error between predicted and target
model-free and can handle environments Q-values. 6. It has been successfully
with stochastic transitions and rewards. 7. applied to challenging tasks such as Atari
Exploration strategies like epsilon-greedy games and robotic control. 7. DQN suffers
are employed to balance exploration and from overestimation bias, which can be
exploitation. 8. It is well-suited for discrete mitigated using techniques like Double
action spaces and environments with a DQN. 8. Dueling DQN separates value
finite number of states. 9. Q-learning can be estimation from action selection, improving
unstable when dealing with large state sample efficiency. 9. Rainbow DQN
spaces or continuous action spaces. 10. combines various improvements to achieve
Extensions like Double Q-learning and state-of-the-art performance. 10. DQN is a
Prioritized Experience Replay address foundational algorithm in deep
some of Q-learning's limitations. reinforcement learning and continues to be
a focus of research.
Deep Q Recurrent Networks: 1. Deep Q
Recurrent Networks combine Q-learning
with recurrent neural networks (RNNs) to
handle sequential decision-making tasks. 2.
They extend DQN to capture temporal
dependencies in sequential data. 3. RNNs,
like Long Short-Term Memory (LSTM) or
Gated Recurrent Unit (GRU), are used to
model state transitions over time. 4. Deep Q
Recurrent Networks are suitable for tasks
with partial observability and delayed
rewards. 5. They can learn policies for tasks
like navigation, robotics, and video game
playing. 6. Experience replay is adapted to
handle sequences of experiences rather than
individual transitions. 7. Target networks
and other stability techniques used in DQN
are also applied in Deep Q Recurrent
Networks. 8. Hyperparameter tuning and
regularization are crucial for training stable
and effective models. 9. Deep Q Recurrent
Networks require careful consideration of
the trade-offs between memory capacity
and computational efficiency. 10. Despite
challenges, they offer a powerful
framework for learning policies in dynamic
and sequential environments.

DL Endsem 2024 FlyHigh Services

Uploaded by

Copyright:

Available Formats

You might also like

DL Endsem 2024 FlyHigh Services

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL Endsem 2024 FlyHigh Services

Uploaded by

Copyright:

Available Formats

Q.

Explain Pooling Layer with its need

Q. Explain GAN (Generative

You might also like