Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Fundamentals of Deep Learning

Dr. D. SUDHEER
Assistant Professor
Department of CSE
VNR VJIET (NAAC: A++, NIRF: 113)
Hyderabad, Telangana.

©Dr. SUDHEER DEVULAPALLI 1


Fundamentals of Deep Learning

• Deep Learning is a subfield of machine learning concerned with


algorithms inspired by the structure and function of the brain
called artificial neural networks, those neural networks are a set of
algorithms, modeled loosely after the human brain, that is designed
to recognize patterns.
• They interpret sensory data through a kind of machine
perception, labeling, or clustering raw input. The patterns they
recognize are numerical, contained in vectors, into which all real-
world data, be it images, sound, text, or time series, must be
translated.
Machine Learning (ML)
represents computer Deep
algorithms which learning
AI
can observe and (DL)
represents
analyze the data is a subset
the techniques
on their own. machine learning
to provide
where machines
machines with the
learn on their own
ability to think and
using (ANN)
act like human beings.

©Dr. SUDHEER DEVULAPALLI 3


Fig. Ref.(hcs-pharma.com)
©Dr. SUDHEER DEVULAPALLI 4
Machine Learning (ML) Deep Learning (DL)

ML uses algorithms developed DL uses ANN that were


from statistical methods that simulated from biological
were use to train the model and neural networks of human
provide predictions brain.

In ML the input data is well In DL the data is not structured.


structured that comes with It works on any format of data
features. including images, audio, video.

Less computing power GPU required


ML required extracted features DL automatically extract the
features ©Dr. SUDHEER DEVULAPALLI 5
• Deep learning is a class of machine learning which performs much
better on unstructured data. Deep learning techniques are
outperforming current machine learning techniques. It enables
computational models to learn features progressively from data at
multiple levels.
Google traffic analysis
Tesla Self driving cars

©Dr. SUDHEER DEVULAPALLI 7


Image recognition Sofia AI robot

©Dr. SUDHEER DEVULAPALLI 8


More applications:

• Sentiment analysis
• Language translation
• Music composition
• product recommendation
• voice recognition
• stock market prediction
• medical diagnosis
• spam detection

©Dr. SUDHEER DEVULAPALLI 9


Common architectural principles of Deep Networks

• The word architecture refers to the overall structure of the


network: how many units it should have and how these units
should be connected to each other.
• Most neural networks are organized into groups of units called
layers.
• The structure of first layer given by
1. Universal Approximation Properties and Depth:
• universal approximation theorem states that a feed forward
network with a linear output layer and at least one hidden layer
with any “squashing” activation function (such as the logistic
sigmoid activation function) can approximate any Borel
measurable function from one finite-dimensional space to another
with any desired non-zero amount of error, provided that the
network is given enough hidden units.
2. Other considerations:
• Skip the connections from layer i to i+2 or higher to generalize more
better.
• Another key consideration of architecture design is exactly how to
connect a pair of layers to each other.
Core concepts of architectural principles of deep networks:
•Parameters
•Layers
•Activation functions
•Loss functions
•Optimization methods
•Hyperparameters
•Layer size
•Magnitude (momentum, learning rate)
•Regularization
•Activations (and activation function families)
•Weight initialization strategy
•Loss functions
•Settings for epochs during training (mini-batch size)
•Normalization scheme for input data (vectorization)
Ref: Deep Learning: A practitioners approach, Josh Patterson, Adam
Gibson.
Building Blocks of Deep Networks

• Building deep networks goes beyond basic feed-forward


multilayer neural networks.
• In some cases, deep networks combine smaller networks as
building blocks into larger networks.
•Feed-forward multilayer neural networks
•RBMs
•Autoencoders
RBM:
RBMs are used for unsupervised learning in deep learning for the
following:
•Feature extraction
•Dimensionality reduction
• The “restricted” part of the name “Restricted Boltzmann
Machines” means that connections between nodes of the same
layer are prohibited (e.g., there are no visible-visible or hidden-
hidden connections along which signal passes).
• A network of symmetrically connected, neuron-like units that
make stochastic decisions about whether to be on or off.
• A standard RBM has a visible layer and a hidden layer
• There are five main parts of a basic RBM:
Visible units
Hidden units
Weights
Visible bias units
Hidden bias units
• A standard RBM has a visible layer and a hidden layer
• Every visible unit is connected to every hidden unit, yet no units
from the same layer are connected.
•Each layer of an RBM can be imagined as a row of nodes.
• The nodes of the visible and hidden layers are connected by
connections with associated weights.
• Hidden units are feature detectors, learning features from the input
data.
• The initial weights are randomly generated.
• The technique known as pretraining using RBMs means teaching it to
reconstruct the original data from a limited sample of that data.
Training of RBM:
Gibbs sampling is the first part of the training. Whenever we are
given an input vector v, we use the following p(h| v) for predicting
the hidden values h. However, if we are given the hidden values h,
we use p(v| h) to predict the new input values v.
Weight updating
Auto encoders

• We use auto encoders to learn compressed representations of


datasets.
• we use them to reduce a dataset’s dimensionality.
• The output of the autoencoder network is a reconstruction of the
input data in the most efficient form.
• The key difference between multi layer perception and auto encoder
is the number output units are same as input layer for auto encoders.
• Auto encoders use unlabelled data for unsupervised learning.
• An auto encoder is trained to reproduce its own input data.
• The most recent type of auto encoder is Variational Autoencoders.
• It is introduced by Kingma and Welling.
• The VAE is similar to compression and denoising autoencoders in
that they are all trained in an unsupervised manner to reconstruct
inputs.
• However, the mechanisms that the VAEs use to perform training are
quite different.
• VAE uses a probabilistic approach for the forward pass instead of
standard neural network process.
The architecture of autoencoders
Let’s start with a quick overview of autoencoders’ architecture.
Autoencoders consist of 3 parts:
1. Encoder: A module that compresses the train-validate-test set
input data into an encoded representation that is typically several
orders of magnitude smaller than the input data.
2. Bottleneck: A module that contains the compressed knowledge
representations and is therefore the most important part of the
network.
3. Decoder: A module that helps the network“decompress” the
knowledge representations and reconstructs the data back from its
encoded form. The output is then compared with a ground truth.
Major architectures of Deep Networks
• Unsupervised Pretrained Networks (UPNs)
•Autoencoders
•Deep Belief Networks (DBNs)
•Generative Adversarial Networks
•Convolutional Neural Networks (CNNs)
•Recurrent Neural Networks
•Recursive Neural Networks
Deep Belief Networks (DBN)

• DBNs are composed of layers of Restricted Boltzmann


Machines (RBMs) for the pretrain phase and then a feed-forward
network for the fine-tune phase.
• We use RBMs to extract higher-level features from the raw input
vectors.
• The fundamental purpose of RBMs in the context of deep learning
and DBNs is to learn these higher-level features of a dataset in an
unsupervised training fashion.
• In DBN, RBMs learn progressively higher-level features using the
learned features from a lower level RBM pretrain layer as the input to
a higher-level RBM pretrain layer.
• Learning these features in an unsupervised fashion is considered the
pretrain phase of DBNs.
• Each hidden layer of the RBM in the pretrain phase learns
progressively more complex features from the distribution of the
data.
Fine tuning DBN:

1. Gentle back propagation


• The pretrain phase with RBM learns higher-order features from
the data.
• We want to take these weights and tune them a bit more to find
good values for our final neural network model.
2. Output layer
• The first layer of a deep network learns how to reconstruct the
original dataset.
• The subsequent layers learn how to reconstruct the probability
distributions of the activations of the previous layer.
• The output layer of a neural network is tied to the overall objective.
This is typically logistic regression, with the number of features equal
to the number of inputs of the final layer, and the number of outputs
equal to the number of classes.
Generative Adversarial Networks

• GANs have been shown to be quite adept at synthesizing novel


new images based on other training images.
• We can extend this concept to model other domains such as the
following:
•Sound
•Video
•Text to Image

GANs are an example of a network that uses unsupervised


learning to train two models in parallel.
• The network is forced to efficiently represent the training data,
making it more effective at generating data similar to the training
data.
• The generative model in GANs generates synthetic images while a
secondary “discriminator” network tries to classify these generated
images.
• This secondary discriminator network attempts to classify the output
images as real or synthetic.
• The goal here is to make images realistic enough that the
discriminator network is fooled to the point that it cannot distinguish
the difference between the real and the synthetic input data.
Different GAN’s

• Deep convolution GAN - DCGAN (generate image from random input


vector)
• Conditional GAN – Use label information also to generate data of
specific class.
Recurrent Neural Networks
• ANN and CNN are the feed forward networks, they do not remember the
inputs from the previous steps.
• In some applications network should remember the previous steps such as
stock market predictions or sentiment analysis.
• When model cannot take decision based on current data, we need recurrent
neuron.

©Dr. SUDHEER DEVULAPALLI 37


Traditional RNN

©Dr. SUDHEER DEVULAPALLI 38


LSTM Neuron

A
LSTM

©Dr. SUDHEER DEVULAPALLI 39


• It will be useful to remember the past data along with the present data to take
decision.
• Example, In a sentence beginning words more important than the lat words to
understand the meaning.
• LSTM stores all the words along with recent words to take decision.

LSTM

Long-term-memory Short-term-memory

©Dr. SUDHEER DEVULAPALLI 40


• Long-term memory represents all the words starting from the first word.
• Short-term-memory represents recent words from past state of the model.
• when LSTM keep on storing data, it may reach where they cannot store
further.
• It will remove the unwanted information from time to time.
• The removing or keeping the data implemented by logic gates.

input Gate
output
Forget 1 2 3
Gate
Gate
LSTM
Pass
Forget
updated
irrelevant
informatio
informatio
n
n New updated
information ©Dr. SUDHEER DEVULAPALLI 41
Layers of RNN
There two important layers: 1. Embedding 2. LSTM
1. Embedding
• It is useful to convert positive integers to vector of values.
• Fixed range of input values should be provide this layer.
• It will be more useful in language translation to understand the meaning.

Embedding(input_dim,output_dim,input_length)

©Dr. SUDHEER DEVULAPALLI 42


LSTM Layer
• It will remember the previous output.
• It is also possible to create RNN without LSTM.

LSTM(input_units,return_sequences=True)

©Dr. SUDHEER DEVULAPALLI 43

You might also like