Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

DEEP LEARNING

UNIT -I INTRODUCTION: Computer vision overview, Historical context,


Biological Neuron, Idea of computational units, McCulloch–Pitts unit and
Thresholding logic, Linear Perceptron, Perceptron Learning Algorithm, Linear
separability. Convergence theorem for Perceptron Learning Algorithm, Image
classification. Basics of Optimization
Computer vision means the extraction of information from images, text, videos, etc.
Sometimes computer vision tries to mimic human vision. It’s a subset of computer-based
intelligence or Artificial intelligence which collects information from digital images or videos
and analyze them to define the attributes.

The entire process involves image acquiring, screening, analyzing, identifying, and extracting
information. This extensive processing helps computers to understand any visual content and
act on it accordingly.  Computer vision projects translate digital visual content into precise
descriptions to gather multi-dimensional data. This data is then turned into a computer-
readable language to aid the decision-making process. The main objective of this branch of
Artificial intelligence is to teach machines to collect information from images.   
Applications of Computer Vision
 Medical Imaging: Computer vision helps in MRI reconstruction, automatic pathology,
diagnosis, and computer aided surgeries and more.
 AR/VR: Object occlusion, outside-in tracking, and inside-out tracking for virtual and
augmented reality.
 Smartphones: All the photo filters (including animation filters on social media), QR code
scanners, panorama construction, Computational photography, face detectors, image
detectors like (Google Lens, Night Sight) that we use are computer vision applications.
 Internet: Image search, Mapping, photo captioning, Ariel imaging for maps, video
categorization and more.
Historical context: The history of deep learning dates back to 1943 when Warren McCulloch
and Walter Pitts created a computer model based on the neural networks of the human brain.
Warren McCulloch and Walter Pitts used a combination of mathematics and algorithms they
called threshold logic to mimic the thought process. Since then, deep learning has evolved
steadily, over the years with two significant breaks in its development. The development of the
basics of a continuous Back Propagation Model is credited to Henry J. Kelley in 1960. Stuart
Dreyfus came up with a simpler version based only on the chain rule in 1962. The concept of
back propagation existed in the early 1960s but only became useful until 1985.  
Biological Neuron: A biological neural network is a network of neurons that are connected
together by axons and dendrites. The connections between neurons are made by synapses.
In recent years, “deep learning” AI models have often been touted as “working like the brain,” in
that they are composed of artificial neurons mimicking those of biological brains. From the
perspective of a neuroscientist, however, the differences between deep learning neurons and
biological neurons are numerous and distinct. Describing a few key characteristics of biological
neurons, and how they are simplified to obtain deep learning neurons. We’ll then speculate on
how these differences impose limits on deep learning networks, and how movement toward more
realistic models of biological neurons might advance AI as we currently know it.
Biological Neurons: Typical biological neurons are individual cells, each composed of the main
body of the cell along with many tendrils that extend from that body. The body, or soma, houses
the machinery for maintaining basic cell functions and energy processing (e.g., the DNA-
containing nucleus, and organelles for building proteins and processing sugar and oxygen). There
are two types of tendrils: dendrites, which receive information from other neurons and bring it to
the cell body, and axons, which send information from the cell body to other neurons.
Idea of computational units: The computational demands of deep learning applications in five
prominent application areas: Image classification; object detection; question answering;
named entity recognition, and machine translation. It shows that progress in all five is
strongly reliant on increases in computing power. Extrapolating forward, this reliance reveals
that progress along current lines is rapidly becoming economically, technically, and
environmentally unsustainable. Continued progress will require dramatically more
computationally efficient methods, which either will have to come from changes to deep learning
itself, or from moving to other machine learning methods. Even when the first neural networks
were created, performance was limited by available computation. In the past decade, these
constraints have relaxed along with specialized hardware (e.g. GPUs) and a willingness to spend
more on processors. However, because the computational needs of deep learning scale so
rapidly, they are quickly becoming burdensome again.
In Future the computational limits of deep learning will soon be constraining for a range of
applications, making the achievement of important benchmark milestones impossible if current
trajectories hold. Finally, we have discussed the likely impact of these computational limits:
Forcing deep learning toward less computationally intensive methods of improvement, and
pushing machine learning toward techniques that are more computationally efficient than deep
learning. https://www.youtube.com/watch?v=FgOOvDQbJVY
McCulloch–Pitts unit and Thresholding logic: McCulloch Pitts Neuron, Thresholding Log
Linear Perceptron:
The Perceptron

The original Perceptron was designed to take a number of binary inputs, and produce


one binary output (0 or 1).

The idea was to use different weights to represent the importance of each input, and that the sum
of the values should be greater than a threshold value before making a decision
like yes or no (true or false) (0 or 1).

Perceptron Learning Algorithm: Perceptron Learning Algorithm


Linear separability: Linear separability is a concept in machine learning that refers to the
ability to separate data points in binary classification problems using a linear decision boundary.
If the data points can be separated using a line, linear function, or flat hyperplane, they are
considered linearly separable.
Linear Separability implies the existence of a hyperplane separating the two classes. For example,
consider a dataset with two features x1 and x2 in which the points (−1, −1),(1, 1),(−3, −3),(4, 4)
belong to one class and (−1, 1),(1, −1),(−5, 2),(4, −8) belong to the other.
import matplotlib.pyplot as pltc1x1 = [-1, 1, -3, 4]
c1x2 = [-1, 1, -3, 4]
plt.scatter(c1x1, c1x2, label='class1')c2x1 = [-1, 1, -5, 4]
c2x2 = [1, -1, 2, -8]
plt.scatter(c2x1, c2x2, label='class2')plt.xlabel('x1')
plt.ylabel('x2')plt.legend()
plt.show()
This data set is not linearly separable because there is no linear function that can separate class1
and class2.Let’s consider 1-dimensional representation z in terms of x1 and x2 such that the
dataset is linearly separable in terms of 1-dimensional representation corresponding to z. Defining
z = x1x2, given data can be represented 1-dimensional and be linearly separable. Using this
mapping, points of class1 (−1, −1),(1, 1),(−3, −3),(4, 4) become (1), (1), (9), (6) and points of
class2 (−1, 1),(1, −1),(−5, 2),(4, −8) become (-1), (-1), (-10), (-32).
z1 = [1, 1, 9, 6]
y1 = [0, 0, 0, 0]
plt.scatter(z1, y1, label='class1')z2 = [-1, -1, -10, -32]
y2 = [0, 0, 0, 0]
plt.scatter(z2, y2, label='class2')plt.scatter([0], [0], label='hyperplane')
plt.xlabel('z = x1*x2')
plt.legend()
plt.show()

As you can see in the above plot, now z = 0 separates class1 and class2. A non-linearly separable
data can be represented as linearly separable form after applying a non-linear transformation (z
= x1x2 in this example, resulting in data represented as 1D).

Convergence theorem for Perceptron Learning Algorithm: convergence theorem for


perceptron learning algorithm
Image Classification: The task of identifying what an image represents is called image
classification. An image classification model is trained to recognize various classes of images.
For example, you may train a model to recognize photos representing three different types of
animals: rabbits, hamsters, and dogs.
https://www.youtube.com/watch?v=iYmMyKE6FN4
Basics of Optimization: https://www.youtube.com/watch?v=e275iFzTn84
UNIT -II
NEURAL NETWORKS-I: Feedforward Networks: Architecture design,
Multilayer Perceptron, Gradient Descent, Backpropagation, regularization,
autoencoders. Deep Neural Networks: Difficulty of training deep neural networks,
Greedy layer wise training-Better
Feedforward Networks: A Feed Forward Neural Network is an artificial Neural Network in
which the nodes are connected circularly. A feed-forward neural network, in which some routes
are cycled, is the polar opposite of a Recurrent Neural Network. The feed-forward model is the
basic type of neural network because the input is only processed in one direction. The data
always flows in one direction and never backwards/opposite.

The Neural Network advanced from the perceptron, a prominent machine learning algorithm.
Frank Rosenblatt, a physicist, invented perceptrons in the 1950s and 1960s, based on earlier
work by Warren McCulloch and Walter Pitts.
Neural Network Architecture and Operation:
A weight is assigned to each input to an artificial neuron. First, the inputs are multiplied by their
weights, and then a bias is applied to the outcome. After that, the weighted sum is passed via an
activation function, being a non-linear function.

A weight is being applied to each input to an artificial neuron. First, the inputs are multiplied
by their weights, and then a bias is applied to the outcome. This is called the weighted sum.
After that, the weighted sum is processed via an activation function, as a non-linear function.
The first layer is the input layer, which appears to have six neurons but is only the data that is
sent into the neural network. The output layer is the final layer. The dataset and the type of
challenge determine the number of neurons in the final layer and the first layer. Trial and error
will be used to determine the number of neurons in the hidden layers and the number of hidden
layers.

All of the inputs from the previous layer will be connected to the first neuron from the first
hidden layer. The second neuron in the first hidden layer will be connected to all of the preceding
layer’s inputs, and so forth for all of the first hidden layer’s neurons. The outputs of the
previously hidden layer are regarded inputs for neurons in the second hidden layer, and each of
these neurons is coupled to all of the preceding neurons.

What is a Feed-Forward Neural Network and how does it work?

If the sum of the values is more than a predetermined threshold, which is normally set at zero,
the output value is usually 1, and if the sum is less than the threshold, the output value is usually
-1. The single-layer perceptron is a popular feed-forward neural network model that is frequently
used for classification. Single-layer perceptrons can also contain machine learning features.

The neural network can compare the outputs of its nodes with the desired values using a
property known as the delta rule, allowing the network to alter its weights through training to
create more accurate output values. This training and learning procedure results in gradient
descent. The technique of updating weights in multi-layered perceptrons is virtually the same,
however, the process is referred to as back-propagation. In such circumstances, the output values
provided by the final layer are used to alter each hidden layer inside the network.

Multi-layer Perceptron 
Multi-layer perception is also known as MLP. It is fully connected dense layers, which
transform any input dimension to the desired dimension. A multi-layer perception is a neural
network that has multiple layers. To create a neural network we combine neurons together so
that the outputs of some neurons are inputs of other neurons.
Multilayer perceptron (MLP) is a supplement of feed forward neural network. It consists of three
types of layers—the input layer, output layer and hidden layer. The input layer receives the input
signal to be processed. The required task such as prediction and classification is performed by
the output layer. An arbitrary number of hidden layers that are placed in between the input and
output layer are the true computational engine of the MLP. Similar to a feed forward network in
a MLP the data flows in the forward direction from input to output layer. The neurons in the
MLP are trained with the back propagation learning algorithm. MLPs are designed to
approximate any continuous function and can solve problems which are not linearly separable.
The major use cases of MLP are pattern classification, recognition, prediction and
approximation.

Multilayer perceptron (MLP) reflects the organization of the human brain. MLP is also equal to
the feed-forward ANN. MLP has multiple hidden layers between the input and output. The
number of hidden layers is depended on the data mining task. Every neuron in the hidden layer is
connected with the neurons of the next layer. The connecting wires between the neurons are
known as weights whose values are updated with the help of the learning phase. The learning
phase is continuously repeated until the value of the error will be less than the threshold level.
The input layer is the combination of the values of the features. The output layer will predict the
classification which is based on the information which is passed on by the input layer. The
classified output compares with the observed one and calculates the error. According to the error,
network weights are updated from the output layer toward the input layer through the
intermediate layer. Transmitted information can be calculated by the combination of the
connecting weights, node value, and activation function.
Gradient Descent: Gradient descent is an optimization algorithm which is commonly-used to
train machine learning models and neural networks. Training data helps these models learn over
time, and the cost function within gradient descent specifically acts as a barometer, gauging its
accuracy with each iteration of parameter updates.  The learning happens during
the backpropagation while training the neural network-based model. There is a term known
as Gradient Descent, which is used to optimize the weight and biases based on the cost
function. cost function evaluates the difference between the actual and predicted outputs.
A gradient is nothing but a derivative that defines the effects on outputs of the function with a
little bit of variation in inputs. Gradient Descent (GD) is a widely used optimization algorithm
in deep learning that is used to minimize the cost function of a neural network model during
training. It works by iteratively adjusting the weights or parameters of the model in the
direction of the negative gradient of the cost function until the minimum of the cost function is
reached.
Gradient Descent Procedure

The descent procedure starts with initial values for the coefficient or coefficients of the function.
The data scientist evaluates the cost of the coefficient by using the descent gradient function.

The process begins by choosing a derivative or slope of the function at a given point. The slope
indicates the direction to move the coefficient values to reveal a lower cost on the next iteration.
A specified learning rate parameter (alpha) controls how much the coefficients can change on
each update, also known as the step or learning rate. The procedure repeats until the cost of the
coefficients is as close to 0.0 as possible.

The gradient descent varies in terms of the number of training patterns used to calculate errors.
When calculating gradient descent, data scientists choose between different descent
configurations to update their model, but each has its trade-offs in accuracy and efficiency.

There are two main types of gradient descent configurations: batch and stochastic. Each type has
its pros and cons, and the data scientist must understand the differences to be able to select the
best approach for the problem at hand.

Backpropagation:
Backpropagation computes the gradient of a loss function with respect to the weights of the
network for a single input–output example, and does so efficiently, computing the gradient one
layer at a time, iterating backward from the last layer to avoid redundant calculations of
intermediate terms in the chain rule; this can be derived through dynamic programming. Gradient
descent, or variants such as stochastic gradient descent are commonly used.
Strictly the term backpropagation refers only to the algorithm for computing the gradient, not
how the gradient is used; but the term is often used loosely to refer to the entire learning
algorithm – including how the gradient is used, such as by stochastic gradient descent. In
1986 David E. Rumelhart et al. published an experimental analysis of the technique. This
contributed to the popularization of backpropagation and helped to initiate an active period of
research in multilayer perceptrons.
Backpropagation, or backward propagation of errors, is an algorithm that is designed to test for
errors working back from output nodes to input nodes. It is an important mathematical tool for
improving the accuracy of predictions in data mining and machine learning. Essentially,
backpropagation is an algorithm used to calculate derivatives quickly.

There are two leading types of backpropagation networks:

1. Static backpropagation. Static backpropagation is a network developed to map static inputs


for static outputs. Static backpropagation networks can solve static classification problems,
such as optical character recognition (OCR).
2. Recurrent backpropagation. The recurrent backpropagation network is used for fixed-point
learning. Recurrent backpropagation activation feeds forward until it reaches a fixed value.
The key difference here is that static backpropagation offers instant mapping and recurrent
backpropagation does not.

Regularization:
Regularization is a technique used in machine learning and deep learning to prevent overfitting
and improve the generalization performance of a model. It involves adding a penalty term to the
loss function during training.
This penalty discourages the model from becoming too complex or having large parameter
values, which helps in controlling the model’s ability to fit noise in the training data.
Regularization methods include L1 and L2 regularization, dropout, early stopping, and more. By
applying regularization, models become more robust and better at making accurate predictions
on unseen data.
Before we deep dive into the topic, take a look at this image:

As we move towards the right in this image, our model tries to learn too well the details and the
noise from the training data, which ultimately results in poor performance on the unseen data.
In other words, while going towards the right, the complexity of the model increases such that
the training error reduces but the testing error doesn’t.

Different Regularization Techniques in Deep Learning


Now that we have an understanding of how regularization helps in reducing overfitting,
L2 & L1 regularization
L1 and L2 are the most common types of regularization. These update the general cost function
by adding another term known as the regularization term.
Cost function = Loss (say, binary cross entropy) + Regularization term
Due to the addition of this regularization term, the values of weight matrices decrease because it
assumes that a neural network with smaller weight matrices leads to simpler models. Therefore,
it will also reduce overfitting to quite an extent.
However, this regularization term differs in L1 and L2.
In L2, we have:
Here, lambda is the regularization parameter. It is the hyperparameter whose value is optimized
for better results. L2 regularization is also known as weight decay as it forces the weights to
decay towards zero (but not exactly zero).
In L1, we have:

In this, we penalize the absolute value of the weights. Unlike L2, the weights may be reduced to
zero here. 
Hence, it is very useful when we are trying to compress our model. Otherwise, we usually
prefer L2 over it.
In keras, we can directly apply regularization to any layer using the regularizers.
Below is the sample code to apply L2 regularization to a Dense layer.
from keras import regularizers
model.add(Dense(64, input_dim=64,
               kernel_regularizer=regularizers.l2(0.01)
Note: Here the value 0.01 is the value of regularization parameter, i.e., lambda, which we need
to optimize further. We can optimize it using the grid-search method.
Similarly, we can also apply L1 regularization. We will look at this in more detail in a case study
later in this article.
Dropout
This is the one of the most interesting types of regularization techniques. It also produces very
good results and is consequently the most frequently used regularization technique in the field of
deep learning.

Autoencoders:
Autoencoders are very useful in the field of unsupervised machine learning. You can use them to
compress the data and reduce its dimensionality.
The main difference between Autoencoders and Principle Component Analysis (PCA) is that
while PCA finds the directions along which you can project the data with maximum variance,
Autoencoders reconstruct our original input given just a compressed version of it.
If anyone needs the original data can reconstruct it from the compressed data using an
autoencoder.
An Autoencoder consists of three layers:

1. Encoder
2. Code
3. Decoder
The Encoder layer compresses the input image into a latent space representation. It encodes the
input image as a compressed representation in a reduced dimension. 
The compressed image is a distorted version of the original image.
The Code layer represents the compressed input fed to the decoder layer. 
The decoder layer decodes the encoded image back to the original dimension. The decoded
image is reconstructed from latent space representation, and it is reconstructed from the latent
space representation and is a lossy reconstruction of the original image.
Autoencoders have various use-cases like:

 Anomaly detection: autoencoders can identify data anomalies using a loss function that
penalizes model complexity. It can be helpful for anomaly detection in financial markets,
where you can use it to identify unusual activity and predict market trends.
 Data denoising image and audio: autoencoders can help clean up noisy pictures or audio
files. You can also use them to remove noise from images or audio recordings.
 Image inpainting: autoencoders have been used to fill in gaps in images by learning how to
reconstruct missing pixels based on surrounding pixels. For example, if you're trying to
restore an old photograph that's missing part of its right side, the autoencoder could learn how
to fill in the missing details based on what it knows about the rest of the photo.
 Information retrieval: autoencoders can be used as content-based image retrieval systems
that allow users to search for images based on their content.

Deep Neural Networks:


A deep neural network (DNN) is an ANN with multiple hidden layers between the input and
output layers. Similar to shallow ANNs, DNNs can model complex non-linear relationships.
The main purpose of a neural network is to receive a set of inputs, perform progressively
complex calculations on them, and give output to solve real world problems like classification.
We restrict ourselves to feed forward neural networks.
We have an input, an output, and a flow of sequential data in a deep network.

Neural networks are widely used in supervised learning and reinforcement learning problems.
These networks are based on a set of layers connected to each other.
In deep learning, the number of hidden layers, mostly non-linear, can be large; say about 1000
layers.
DL models produce much better results than normal ML networks.
We mostly use the gradient descent method for optimizing the network and minimising the loss
function.
We can use the Imagenet, a repository of millions of digital images to classify a dataset into
categories like cats and dogs. DL nets are increasingly used for dynamic images apart from static
ones and for time series and text analysis.
Training the data sets forms an important part of Deep Learning models. In addition,
Backpropagation is the main algorithm in training DL models.
DL deals with training large neural networks with complex input output transformations.
One example of DL is the mapping of a photo to the name of the person(s) in photo as they do on
social networks and describing a picture with a phrase is another recent application of DL.

Neural networks are


functions that have inputs like x1,x2,x3…that are transformed to outputs like z1,z2,z3 and so on
in two (shallow networks) or several intermediate operations also called layers (deep networks).
The weights and biases change from layer to layer. ‘w’ and ‘v’ are the weights or synapses of
layers of the neural networks.
The best use case of deep learning is the supervised learning problem.Here,we have large set of
data inputs with a desired set of outputs.

Difficulty of training deep neural networks


Data and computation
One of the main challenges of neural networks and deep learning is the need for large amounts of
data and computational resources. Neural networks learn from data by adjusting their parameters
to minimize a loss function, which measures how well they fit the data. However, to achieve
high accuracy and generalization, they often require millions or billions of data points, which
may not be available or accessible for some tasks or domains. Moreover, training and deploying
neural networks can be very expensive and time-consuming, as they involve complex
mathematical operations and multiple layers of neurons. Therefore, data and computation are key
factors that affect the feasibility and scalability of neural networks and deep learning.
Interpretability and explainability
Another challenge of neural networks and deep learning is the lack of interpretability and
explainability of their outputs and decisions. Neural networks are often considered as black
boxes, as it is hard to understand how they process the input data and what features they learn
and use. This can pose problems for applications that require transparency, accountability, and
trust, such as healthcare, finance, or law. For example, how can we trust a neural network that
diagnoses a disease or recommends a treatment, if we do not know how it arrived at that
conclusion? How can we debug or improve a neural network that makes a mistake or fails to
perform as expected? Therefore, interpretability and explainability are essential for ensuring the
reliability and ethics of neural networks and deep learning.
Robustness and security
A related challenge of neural networks and deep learning is the lack of robustness and security
against adversarial attacks and noise. Neural networks are vulnerable to subtle perturbations or
modifications of the input data, which can cause them to produce incorrect or misleading
outputs. For example, adding a small amount of noise or changing a few pixels in an image can
fool a neural network into misclassifying it as a different object. This can have serious
consequences for applications that rely on accurate and consistent recognition, such as face
recognition, autonomous driving, or biometric authentication. Therefore, robustness and security
are crucial for ensuring the safety and integrity of neural networks and deep learning.
Generalization and transfer
A final challenge of neural networks and deep learning is the difficulty of generalizing and
transferring their knowledge and skills to new or different domains or tasks. Neural networks
tend to overfit the data they are trained on, which means they perform well on the training data
but poorly on unseen or novel data. This can limit their ability to adapt to changing or diverse
environments or scenarios. Moreover, neural networks tend to learn specific and low-level
features that may not be relevant or useful for other domains or tasks. This can prevent them
from leveraging their existing knowledge and skills to learn new or related ones. Therefore,
generalization and transfer are important for enhancing the versatility and efficiency of neural
networks and deep learning.

Greedy layer wise training:


In the early days of deep learning, an abundance of resources was not available when training a
deep learning model. In addition, deep learning practitioners suffered from the vanishing
gradients problem and the exploding gradients problem.

This was an unfortunate combination when one wanted to train a model with increasing depth.
What depth would be best? From what depth would we suffer from vanishing and/or exploding
gradients? And how can we try to find out without wasting a lot of resources?

Greedy layer-wise training of a neural network is one of the answers that was posed for solving
this problem. By adding a hidden layer every time the model finished training, it becomes
possible to find what depth is adequate given your training set.
It works really simply. You start with a simple neural network - an input layer, a hidden layer,
and an output layer. You train it for a fixed number of epochs - say, 25. Then, after training,
you freeze all the layers, except for the last one. In addition, you cut it off the network. At the
tail of your cutoff network, you now add a new layer - for example, a densely-connected one.
You then re-add the trained final layer, and you end up with a network that is one layer deeper.
In addition, because all layers except for the last two are frozen, your progress so far will help
you to train the final two better.

The idea behind this strategy is to find an optimum number of layers for training your neural
network.

Implementing greedy layer-wise training with PyTorch

Let's now take a look at how you can implement greedy layer-wise training with PyTorch. Even
though the strategy is really old (in 2022, it's 15 years ago that it was proposed!), there are cases
when it may be really useful today.

Implementing greedy layer-wise training with PyTorch involves multiple steps:

1. Importing all dependencies, including PyTorch.


2. Defining the nn.Module structure; in other words, your PyTorch model.
3. Creating a definition for getting the global configuration.
4. Creating another one for getting the model configuration.
5. Retrieving the DataLoader through another definition.
6. Writing a definition for adding a layer to an existing model, while freezing all existing
layers.
7. Creating a definition for training a model.
8. Wrapping everything together.

You might also like