Unit-2 Adl

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

UNIT-2 ADL

Recent trends in DL architecture


● Self-supervised learning: This approach trains models on unlabeled data,
allowing them to learn useful representations from vast amounts of
information without needing human-annotated labels for everything . This is a
big deal because labeling data can be expensive and time-consuming.
● Neuroscience-inspired architectures: Researchers are increasingly
drawing inspiration from the human brain to develop more efficient and
powerful deep learning models. This includes exploring new network
structures and learning algorithms that mimic how the brain processes
information.
● Vision Transformers (ViT): This is a new architecture that has shown great
promise in computer vision tasks like image classification and object
detection. Unlike traditional CNNs (Convolutional Neural Networks), ViTs
process entire images at once, which can lead to better performance .
● Hybrid model integration: Combining different types of AI models, such as
deep learning and symbolic AI (rule-based systems), is becoming more
common. This allows for the strengths of each approach to be leveraged for
more robust and interpretable systems .
● Focus on explainability (XAI): As deep learning models become more
complex, there's a growing need to understand how they make decisions.
There's a lot of research on developing techniques to make these models
more interpretable, which is crucial for building trust and ensuring fairness

Residual Network
Intro-
Deep Neural Networks are becoming deeper and more complex. It has been proved
that adding more layers to a Neural Network can make it more robust for
image-related tasks. But it can also cause them to lose accuracy. That’s where
Residual Networks come into place.

The tendency to add so many layers by deep learning practitioners is to extract


important features from complex images. So, the first layers may detect edges, and
the subsequent layers at the end may detect recognizable shapes, like tires of a car.
But if we add more than 30 layers to the network, then its performance suffers and
it attains a low accuracy. This is contrary to the thinking that the addition of layers
will make a neural network better. This is not due to overfitting, because in that
case, one may use dropout and regularization techniques to solve the issue
altogether. It’s mainly present because of the popular vanishing gradient problem.
Core idea:
● Traditional deep neural networks
can suffer from vanishing or exploding
gradients during training. This makes it
difficult for the network to learn complex
relationships, especially in very deep
architectures.
● ResNets address this by
introducing residual connections, also
called skip connections. These
connections bypass some layers in the
network and directly add the input to the
output of a later layer.

● The building block of a ResNet is


the residual block, which typically consists of convolutional layers, activation
functions (like ReLU), and batch normalization. The input to the block is
added to the output of the convolutional layers within the block.
The skip connections are shown below:

A residual block has a 3 x 3 convolution layer


followed by a batch normalization layer and a ReLU
activation function. This is again continued by a 3 x
3 convolution layer and a batch normalization layer.
The skip connection basically skips both these
layers and adds directly before the ReLU activation
function. Such residual blocks are repeated to form
a residual network
Residual Network: In order to solve the problem of the vanishing/exploding
gradient, this architecture introduced the concept called Residual Blocks. In
this network, we use a technique called skip connections. The skip
connection connects activations of a layer to further layers by skipping some
layers in between. This forms a residual block. Resnets are made by stacking
these residual blocks together.
The approach behind this network is instead of layers learning the underlying
mapping, we allow the network to fit the residual mapping. So, instead of say
H(x), initial mapping, let the network fit,

The advantage of adding this type of skip connection is that if any layer hurt
the performance of architecture then it will be skipped by regularization. So,
this results in training a very deep neural network without the problems
caused by vanishing/exploding gradient.
.

What are Skip Connections?

Skip Connections (or Shortcut Connections) as the name suggests skips some of the layers
in the neural network and feeds the output of one layer as the input to the next layers.
Skip Connections were introduced to solve different problems in different
architectures. In the case of ResNets, skip connections solved the degradation
problem that we addressed earlier whereas, in the case of DenseNets, it ensured
feature reusability.

The primary difference between ResNets and DenseNets is that


DenseNets concatenates the output feature maps of the layer with the next
layer rather than a summation.

Coming to Skip Connections, DenseNets uses Concatenation whereas


ResNets uses Summation.

1.resnet- explained above

2.desnet

The idea behind the concatenation is to use features that are learned from
earlier layers in deeper layers as well. This concept is known as Feature
Reusability. So, DenseNets can learn mapping with fewer parameters than
a traditional CNN as there is no need to learn redundant maps.

Image Denoising using Deep Learning

Image denoising is the process of removing noise from images. Deep learning has

revolutionized this field by offering powerful techniques that can achieve impressive

results. Here's a breakdown of how deep learning tackles image denoising:

The Challenge of Noise

Images can be corrupted by noise during acquisition, transmission, or compression.

This noise can come in various forms, like Gaussian noise (random variations in

pixel intensity) or salt-and-pepper noise (randomly occurring black or white pixels).

Noise makes images appear grainy, blurry, or distorted, reducing their quality and

usefulness.
Traditional Denoising Methods

Traditional image denoising methods often rely on filters or statistical techniques.

These methods can be effective for certain types of noise, but they may struggle to

preserve image details while removing noise completely.

Deep Learning for Denoising

Deep learning models, particularly convolutional neural networks (CNNs), have

emerged as powerful tools for image denoising. Here's how they work:

● Training on Denoised Image Pairs: Deep learning models are trained on

large datasets of noisy and clean image pairs. The model learns to identify the

noise patterns in the noisy image and map it to the corresponding clean

image.

● Network Architectures: Common architectures used for denoising include

autoencoders and U-Nets. Autoencoders learn to compress the image into a

lower-dimensional representation and then reconstruct it, effectively denoising

it in the process. U-Nets combine features from different network layers to

capture both high-level and low-level details, allowing for better noise removal

while preserving image features.

● Advantages of Deep Learning: Deep learning models offer several

advantages over traditional methods:

○ Learning Complex Noise Patterns: Deep learning models can learn

complex noise patterns that might be difficult to capture with traditional

filters.
○ Preserving Image Details: Deep learning models can be trained to

remove noise while preserving important image details like edges and

textures.

○ Adaptability: These models can be adapted to different types of noise

by training on specific noise models.

Here are some additional points to consider:

● Challenges: Training deep learning models for denoising requires large

datasets and significant computational resources. Additionally, interpreting

how these models remove noise can be challenging.

● Applications: Deep learning-based denoising has applications in various

fields, including:

○ Medical imaging: Denoising medical images can improve visualization

and diagnosis.

○ Astronomy: Denoising telescope images can reveal fainter objects

and improve clarity.

○ Microscopy: Denoising microscopic images can enhance the

visualization of biological structures.

Overall, deep learning has become a powerful tool for image denoising, offering

significant advancements over traditional methods. As research continues, we can

expect even more sophisticated models that can achieve even better denoising

performance.

SEMANTIC SEGMENTATION

Semantic segmentation is a deep learning algorithm assigning a label


or category to every pixel in an image. This technique is employed to
identify groups of pixels representing distinct categories. For instance,
in autonomous vehicles, semantic segmentation is crucial for
recognizing vehicles, pedestrians, traffic signs, pavement, and other
road features.

Semantic segmentation is a natural step in the progression from


coarse to fine inference:The origin could be located at classification,
which consists of making a prediction for a whole input.The next step
is localization / detection, which provide not only the classes but also
additional information regarding the spatial location of those
classes.Finally, semantic segmentation achieves fine-grained
inference by making dense predictions inferring labels for every pixel,
so that each pixel is labeled with the class of its enclosing object ore
region.

What are the existing Semantic Segmentation


approaches?

A general semantic segmentation architecture can be broadly thought


of as an encoder network followed by a decoder network:

● The encoder is usually is a pre-trained classification network


like VGG/ResNet followed by a decoder network.
● The task of the decoder is to semantically project the
discriminative features (lower resolution) learnt by the encoder
onto the pixel space (higher resolution) to get a dense
classification.
Unlike classification where the end result of the very deep network is
the only important thing, semantic segmentation not only requires
discrimination at pixel level but also a mechanism to project the
discriminative features learnt at different stages of the encoder onto
the pixel space. Different approaches employ different mechanisms as
a part of the decoding mechanism. Let’s explore the 3 main
approaches:

1 — Region-Based Semantic Segmentation

The region-based methods generally follow the “segmentation using


recognition” pipeline, which first extracts free-form regions from an
image and describes them, followed by region-based classification. At
test time, the region-based predictions are transformed to pixel
predictions, usually by labeling a pixel according to the highest scoring
region that contains it.

R-CNN
Architecture
R-CNN (Regions with CNN feature) is one representative work for the
region-based methods. It performs the semantic segmentation based
on the object detection results. To be specific, R-CNN first utilizes
selective search to extract a large quantity of object proposals and
then computes CNN features for each of them. Finally, it classifies
each region using the class-specific linear SVMs. Compared with
traditional CNN structures which are mainly intended for image
classification, R-CNN can address more complicated tasks, such as
object detection and image segmentation, and it even becomes one
important basis for both fields. Moreover, R-CNN can be built on top of
any CNN benchmark structures, such as AlexNet, VGG, GoogLeNet,
and ResNet.

For the image segmentation task, R-CNN extracted 2 types of


features for each region: full region feature and foreground feature,
and found that it could lead to better performance when concatenating
them together as the region feature. R-CNN achieved significant
performance improvements due to using the highly discriminative
CNN features. However, it also suffers from a couple of drawbacks for
the segmentation task:

● The feature is not compatible with the segmentation task.


● The feature does not contain enough spatial information for
precise boundary generation.
● Generating segment-based proposals takes time and would
greatly affect the final performance.
Due to these bottlenecks, recent research has been proposed to
address the problems, including SDS, Hypercolumns, Mask R-CNN.

2 — Fully Convolutional Network-Based Semantic


Segmentation

The original Fully Convolutional Network (FCN) learns a mapping from


pixels to pixels, without extracting the region proposals. The FCN
network pipeline is an extension of the classical CNN. The main idea
is to make the classical CNN take as input arbitrary-sized images. The
restriction of CNNs to accept and produce labels only for specific
sized inputs comes from the fully-connected layers which are fixed.
Contrary to them, FCNs only have convolutional and pooling layers
which give them the ability to make predictions on arbitrary-sized
inputs.

FCN
Architecture
One issue in this specific FCN is that by propagating through several
alternated convolutional and pooling layers, the resolution of the
output feature maps is down sampled. Therefore, the direct
predictions of FCN are typically in low resolution, resulting in relatively
fuzzy object boundaries. A variety of more advanced FCN-based
approaches have been proposed to address this issue, including
SegNet, DeepLab-CRF, and Dilated Convolutions.

3 — Weakly Supervised Semantic Segmentation

Most of the relevant methods in semantic segmentation rely on a large


number of images with pixel-wise segmentation masks. However,
manually annotating these masks is quite time-consuming, frustrating
and commercially expensive. Therefore, some weakly supervised
methods have recently been proposed, which are dedicated to
fulfilling the semantic segmentation by utilizing annotated bounding
boxes.

Boxsup
Training

For example, Boxsup employed the bounding box annotations as a


supervision to train the network and iteratively improve the estimated
masks for semantic segmentation. Simple Does It treated the weak
supervision limitation as an issue of input label noise and explored
recursive training as a de-noising strategy. Pixel-level Labeling
interpreted the segmentation task within the multiple-instance learning
framework and added an extra layer to constrain the model to assign
more weight to important pixels for image-level classification.

Object Detection

Object detection with deep learning is a powerful technique for identifying and
locating objects within images and videos. It's a crucial component in many computer
vision applications like self-driving cars, facial recognition, and medical image
analysis.

● Two-Stage Detectors:
○ This approach involves two stages: a region proposal stage and a
classification stage.
○ In the first stage, the model proposes candidate regions where objects
might be present.
○ Then, in the second stage, the model classifies these proposed regions
and refines the bounding boxes around the objects.
○ Examples of two-stage detectors include R-CNN (Regions with CNN
features) and its variants like Fast R-CNN and Faster R-CNN.
● Single-Stage Detectors:
○ This approach is faster and simpler than two-stage detectors.
○ The model directly predicts bounding boxes and class labels for objects
in a single step.
○ Single-stage detectors are often preferred for real-time applications
due to their speed.
○ Popular single-stage detectors include YOLO (You Only Look Once)
and SSD (Single Shot MultiBox Detector).

Benefits of Deep Learning for Object Detection

-High Accuracy

-Real-Time Capability

-Adaptability

Applications of Object Detection:selfdriving cars, facial recognition,Object


Recognition and Tracking,medical imaging

Neural Attention Models

Attention mechanisms have become a fundamental concept in deep learning,

particularly for tasks involving sequences like natural language processing (NLP)

and computer vision. Here's a breakdown of what they are and how they

revolutionized deep learning models:

Focus Like a Human: Attention in Deep Learning

● Unlike traditional deep learning models that process all parts of the input data

equally, attention models allow the network to focus on specific, relevant parts

of the input.
● This is similar to how humans focus their attention when reading a sentence

or looking at a scene. We don't pay equal attention to every word or detail, but

rather prioritize the information that's most important for understanding the

context.

How Attention Works

There are different ways to implement attention mechanisms, but the core idea

involves three steps:

1. Calculating Scores: The model assigns a score to each element in the input

sequence. This score reflects how relevant that element is to the current

processing step.

2. Softmax Distribution: A softmax function is used to convert these scores into a

probability distribution. This distribution indicates the weight or importance of

each element.

3. Context Vector Creation: A context vector is created by taking a weighted sum

of the elements in the input sequence, using the attention weights calculated

in step 2. This context vector essentially captures the most relevant

information from the input.

Benefits of Attention Models

Attention mechanisms have significantly improved the performance of deep learning

models in various tasks:

● Machine Translation: By focusing on relevant words in the source sentence,

attention models can generate more accurate and nuanced translations.

● Text Summarization: Attention helps identify key points in a document, leading

to more concise and informative summaries.

● Speech Recognition: Attention allows models to focus on the speaker's voice

and ignore background noise, improving recognition accuracy.


● Image Captioning: Attention models can attend to specific objects and regions

in an image, leading to more accurate and descriptive captions.

Beyond Sequences: Attention's Growing Impact

While initially developed for sequential data, attention mechanisms are being

explored for other tasks as well, such as:

● Visual Question Answering: Attention can help models focus on relevant parts

of an image to answer questions about it.

● Time Series Forecasting: Attention can be used to identify important patterns

in past data points that might influence future predictions.

Overall, attention models have become a powerful tool in deep learning, allowing

models to focus on the most critical information and achieve superior performance

on various tasks. As research continues, we can expect even broader applications of

attention mechanisms in the future of deep learning.

Neural Machine Translation

Neural machine translation (NMT) is a cutting-edge approach to machine


translation that leverages the power of deep learning for superior translation
quality compared to traditional methods.
Working:

From Rule-Based to Deep Learning

● Traditional machine translation relied on rule-based systems or statistical

methods. These approaches involved defining complex rules or using

statistical probabilities to translate text.

● NMT takes a different approach. It utilizes deep neural networks, specifically

designed to learn complex relationships between languages.

The Encoder-Decoder Architecture

NMT models typically use an encoder-decoder architecture:

● Encoder: This part takes the source language sentence as input and

processes it through a series of neural network layers. The encoder

essentially captures the meaning and context of the source sentence.

● Decoder: The decoder takes the encoded representation from the encoder

and generates the target language sentence word by word. During this

process, the decoder might attend back to the source sentence encoded by

the encoder to ensure accuracy and fluency.

Learning from Massive Datasets

NMT models are trained on massive datasets of text that have already been

translated by humans. These datasets serve as a reference for the model to learn

how to map sentences in one language to their corresponding translations in

another.

Advantages of Neural Machine Translation

NMT offers several advantages over traditional methods:


● Higher Quality Translations: NMT models can produce more natural-sounding

and accurate translations, especially for complex sentences or phrases.

● Context-Aware Translation: NMT takes context into account when translating,

leading to more nuanced and accurate translations.

● Ability to Learn New Languages: NMT models can be relatively easily adapted

to translate between new languages by training them on corresponding

datasets.

Challenges and Considerations

While powerful, NMT also has some limitations:

● Data Dependency: NMT models heavily rely on large amounts of training

data, which might not be available for all language pairs.

● Explainability: Understanding how NMT models arrive at a specific translation

can be challenging compared to rule-based systems.

● Computational Cost: Training NMT models requires significant computational

resources.

Applications of Neural Machine Translation

NMT is finding its way into various real-world applications:

● Real-time translation: NMT powers features like Google Translate, enabling

communication across language barriers.

● Document translation: Businesses can use NMT to translate documents,

emails, or websites to reach a wider audience.

● Multilingual customer service: NMT can be used to provide customer support

in multiple languages, enhancing customer experience.

Overall, neural machine translation has revolutionized machine translation, offering a

more accurate and natural way to bridge the language gap. As NMT models
continue to develop and training data becomes more available, we can expect even

more impressive translation capabilities in the future.

BASELINE METHODS

In deep learning, baseline models are fundamental for establishing a benchmark to

evaluate the performance of more complex models. They serve as a reference point

to gauge the effectiveness of new architectures or training approaches.

Here's a deeper dive into what baseline models are and why they're important:

Why Baseline Models Matter

Imagine you're developing a new deep learning model for image classification. You

train the model and achieve a certain level of accuracy. But is this accuracy good?

Without a baseline for comparison, it's difficult to assess how well your model is

performing. Here's where baseline models come in:

● Establishing a Benchmark: By training and evaluating a simple baseline

model on the same task and data, you create a reference point. You can then

compare the performance of your new model to the baseline. If your model

significantly outperforms the baseline, it's an indication that your approach is

effective.

● Understanding Data Difficulty: Baseline models can also help you

understand the inherent difficulty of the dataset you're working with. If a

simple model achieves high accuracy, it suggests that the task itself might be

relatively easy. Conversely, a low baseline accuracy indicates a challenging

task where more sophisticated models might be necessary.

● Managing Expectations: Baseline models help set realistic expectations for

the performance of your new model. By understanding the limitations of

simpler approaches, you can focus your efforts on developing models that can

achieve significant improvements over the baseline.


Common Types of Baseline Models in Deep Learning

There are several approaches to creating a baseline model, depending on the

specific task and data:

● Random Guessing: This is the simplest baseline, where the model randomly

predicts a class label or output for each input. The accuracy of random

guessing gives you a lower bound for performance on a classification task.

● Majority Class Classifier: In classification problems, this baseline always

predicts the most frequent class in the training data. This is a good starting

point to see how well a model can learn to differentiate between classes

compared to simply picking the most common one.

● Simple Statistical Models: Linear regression for continuous target variables

or logistic regression for binary classification can be used as baselines. These

models capture basic linear relationships in the data and provide a benchmark

for more complex deep learning models.

● Pre-trained Models on Smaller Datasets: Sometimes, pre-trained models

on smaller datasets related to your task can be a good baseline. These

models can capture some underlying patterns in the data and serve as a

reference for more complex architectures trained on the full dataset.

Choosing the Right Baseline Model

The best type of baseline model depends on the specific task and the complexity of

the data. Here are some general guidelines:

● For simple tasks with well-defined patterns, random guessing or

majority class classification might be sufficient.

● For more complex tasks with intricate relationships, consider using

simple statistical models or pre-trained models as baselines.


Remember, the goal of a baseline model is not to achieve optimal

performance, but to provide a clear starting point for evaluating the

effectiveness of your deep learning models. By incorporating baselines into your

development workflow, you can gain valuable insights into your data, manage

expectations, and ultimately build more powerful and efficient deep learning models.

DATA REQUIREMENTS
Data is the fuel that drives deep learning models. The amount of data you need

depends on several factors, but it's generally true that deep learning models require

significantly more data compared to traditional machine learning algorithms. Here's

a breakdown of why data is so crucial and how much you might need:

Why Deep Learning Needs a Lot of Data

Deep learning models have complex architectures with many parameters. These

parameters act like learnable filters that extract patterns from the data. The more

data you have, the better the model can learn these patterns and generalize well to

unseen examples.

● High Capacity for Complex Patterns: Deep learning models can learn very

complex patterns from data. However, this also means they are prone to

overfitting if they don't have enough data to learn from. Overfitting happens

when the model memorizes the training data too well and fails to perform well

on new data.

● Statistical Learning: Deep learning models rely on statistical learning

techniques. They learn by identifying patterns that appear frequently in the

data. With more data, these patterns become more statistically robust, leading

to better model performance.


How Much Data is Enough?

There's no one-size-fits-all answer to how much data you need. Here are some

factors to consider:

● Model Complexity: More complex models with many parameters typically

require more data to avoid overfitting.

● Data Quality: High-quality, well-labeled data is essential. Noisy or poorly

labeled data can hinder learning, even with a large dataset.

● Task Difficulty: More complex tasks like image recognition with fine-grained

details might require more data than simpler tasks like sentiment analysis.

Here are some general guidelines:

● Small Datasets (1000s-10,000s of data points): This might be sufficient for

very simple tasks or as a starting point for transfer learning (using pre-trained

models on a different task).

● Medium Datasets (100,000s-1,000,000s of data points): This is a common

range for many deep learning tasks, especially with careful model design and

data augmentation techniques (artificially creating more data from existing

data).

● Large Datasets (Millions-Billions of data points): These are often used for

very complex tasks like image recognition with millions of categories or large

language models trained on massive amounts of text data.

Mitigating Data Scarcity

Several techniques can help address data scarcity:

● Transfer Learning: Leverage pre-trained models on a related task with a

large dataset and fine-tune them for your specific task with less data.
● Data Augmentation: Artificially create more data from your existing dataset

through techniques like cropping, flipping, or adding noise. This can help

improve the model's ability to generalize to unseen variations.

● Active Learning: This approach focuses on acquiring data points that are

most informative for the model's learning process.

Conclusion

Data is a critical element for deep learning success. While the amount of data

required can vary greatly depending on the specific task and model, it's safe to say

that deep learning models are data-hungry. By understanding the role of data and

employing techniques to mitigate scarcity, you can effectively train deep learning

models and achieve good performance.

HYPERPARAMETER TUNING

hyperparameter tuning is the process of finding the optimal configuration for a

model's hyperparameters. These hyperparameters are settings that control the

learning process of the model, but unlike regular parameters, they are not learned

from the data itself.

You might also like