Professional Documents
Culture Documents
DL Unit-1
DL Unit-1
DeepLearning:
Deep learning algorithms are used, especially when we have a huge no of inputs and
outputs. Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful in
solving out the problem of dimensionality.
Deep learning is based on the branch of machine learning, which is a sub set of artificial
intelligence. Since neural networks imitate the human brain and so deep learning will do.
Since deep learning has been evolved by the machine learning, which itself is a subset
of artificial intelligence and as the idea behind the artificial intelligence is to mimic the
human behavior, so same is" the idea of deep learning to build such algorithm that can
mimic the brain".
So basically, deep learning is implemented by the help of deep networks, which are
nothing but neural networks with multiple hidden layers.
1
In the example given above, we provide the raw data of images to the first layer of the
input layer. After then, these input layer will determine the patterns of local contrast that
means it will differentiate on the basis of colors, luminosity, etc. Then the 1st hidden layer
will determine the face feature, i.e., it will fixate on eyes, nose, andlips, etc. And then, it will
fixate those face features on the correct face template. So, in the 2 nd hidden layer, it will
actually determine the correct face here as it can be seen in the above image, after which it
will be sent to the output layer. Likewise, more hidden layers can be added to solve more
complex problems, for example, if you want to find out a particular kind of face having
large or light complexions. So, as and when the hidden layers increase, we are able to solve
complex problems.
Machine learning is a subset of AI, which enables the machine to automatically learn from
data, improve performance from past experiences, and make predictions. Machine learning contains
a set of algorithms that work on a huge amount of data. Data is fed to these algorithms to train them,
and on the basis of training, they build the model & perform a specific task.
Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
2
Supervised Learning:
Supervised learning is a type of machine learning that uses labelled data to train machine
learning models.
In labelled data, the output is already known. The model just needs to map the inputs to the
respective outputs.
It is the easiest to understand and the simplest to implement. It is very similar to teaching a
child with the use of flash cards.
An example of supervised learning is to train a system that identifies the image of an
animal.
These algorithms learn from the past data that is inputted, called training data, runs its
analysis and uses this analysis to predict future events of any new data within the known
classifications.
The algorithm can be trained further by comparing the training outputs to actual ones and
using the errors for modification of the algorithms.
The primary objective of the supervised learning technique is to map the input
variable (a) with the output variable (b).
1. Classification: These refer to algorithms that address classification problems where the
output variable is categorical; for example, yes or no, true or false, male or female, etc.
Real-world applications of this category are evident in spam detection and email filtering.
For example, consider an input dataset of parrot and crow images. Initially, the machine is
trained to understand the pictures, including the parrot and crow’s colour, eyes, shape, and size.
3
Post-training, an input picture of a parrot is provided, and the machine is expected to identify the
object and predict the output. The trained machine checks for the various features of the object,
such as colour, eyes, shape, etc., in the input picture, to make a final prediction. This is the process
of object identification in supervised machine learning.
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact idea about
the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
o These algorithms are not able to solve complex tasks.
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.
o Training required lots of computation times.
4
Unsupervised learning is much similar as a human learns to think by their own experiences,
which makes it closer to the real AI.
Unsupervised learning works on unlabelled and uncategorized data which make
unsupervised learning more important.
In real-world, we do not always have input data with the corresponding output so to solve
such cases, we need unsupervised learning.
Association: Association learning refers to identifying typical relations between the variables of a
large dataset. It determines the dependency of various data items and maps associated variables.
Typical applications include web usage mining and market data analysis.
Popular algorithms obeying association rules include the Apriori Algorithm, Eclat Algorithm,
and FP-Growth Algorithm.
5
Recommendation Systems: Recommendation systems widely use unsupervised learning
techniques for building recommendation applications for different web applications and e-
commerce websites.
Anomaly Detection: Anomaly detection is a popular application of unsupervised learning,
which can identify unusual data points within the dataset. It is used to discover fraudulent
transactions.
Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract
particular information from the database. For example, extracting information of each user
located at a particular location.
It represents the intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the combination
of labelled and unlabelled datasets during the training period.
It uses the combination of labelled and unlabelled datasets to train its algorithms
To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced. The main aim
of semi-supervised learning is to effectively use all the available data, rather than only
labelled data like in supervised learning.
Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabelled data into labelled data.
Consider an example of a college student.
A student learning a concept under a teacher’s supervision in college is termed supervised
learning.
In unsupervised learning, a student self-learns the same concept at home without a teacher’s
guidance. Meanwhile, a student revising the concept after learning under the direction of a
teacher in college is a semi-supervised form of learning.
Advantages and disadvantages of Semi-supervised Learning
Advantages:
o It is simple and easy to understand the algorithm.
o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:
o Iterations results may not be stable.
o We cannot apply these algorithms to network-level data.
o Accuracy is low.
6
Reinforcement Learning:
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking action,
learning from experiences, and improving its performance.
Agent gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.
The reinforcement learning process is similar to a human being;
For example, a child learns various things by experiences in his day-to-day life. An
example of reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a high score.
Agent receives feedback in terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
A reinforcement learning problem can be formalized using Markov Decision
Process(MDP). In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.
7
TextMining
Text-mining, one of the great applications of NLP(Natural language processing), is now
being implemented with the help of Reinforcement Learning by Sales force company.
Advantages and Disadvantages of Reinforcement Learning
Advantages:
It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
The learning model of RL is similar to the learning of human beings; hence most accurate
results can be found.
Helps in achieving long term results.
Disadvantage:
Supervised and Unsupervised learning are the two techniques of machine learning. But both the
techniques are used in different scenarios and with different datasets. Below the explanation of both
learning methods along with their difference table is given.
Supervised learning algorithms are trained Unsupervised learning algorithms are trained
using labelled data. using unlabelled data.
Supervised learning model takes direct feedback Unsupervised learning model does not take
to check if it is predicting correct output or not. any feedback.
Supervised learning model predicts the output. Unsupervised learning model finds the
hidden patterns in data.
8
In supervised learning, input data is provided to In unsupervised learning, only input data is
the model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find
model so that it can predict the output when it is the hidden patterns and useful insights from
given new data. the unknown dataset.
Supervised learning needs supervision to train Unsupervised learning does not need any
the model. supervision to train the model.
Supervised learning can be used for those cases Unsupervised learning can be used for those
where we know the input as well as cases where we have only input data and no
corresponding outputs. corresponding output data.
Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised
learning.
Supervised learning is not close to true Unsupervised learning is more close to the
Artificial intelligence as in this, we first train true Artificial Intelligence as it learns
the model for each data, and then only it can similarly as a child learns daily routine things
predict the correct output. by his experiences.
Below are some of the key features of a good deep learning framework:
9
Biological picture:
A lot of the motivation for deep nets did come from looking at the brain
The field of deep learning got started when scientists tried to approximate certain circuits
in the brain. For example, the Convolutional Neural Network (ConvNet) – a cornerstone of
modern computer vision – was inspired by a paper about neurons in the monkey striate
cortex. Another example is the field of Reinforcement Learning – the hottest area of AI right
now - which was built on our understanding of how the brain processes rewards.
Skeptics claim that deep learning stretches the brain metaphor too far. Some of their critiques are:
1. Scale: The largest neural nets have around 100 million ‘connections’. The human brain has over
a 100 trillion connections. With faster GPUs and cluster computing, the size of deep learning
models has increased by orders of magnitude in the past few years. This pattern will probably
continue.
2. Backprop: Some scientists believe that the backpropagation algorithm has no biological
correlate. Others disagree.
3. Architecture: Certain areas of the brain are able to store really complicated memories, perform
symbolic operations, and build models of the outside world. These are things that state-of-the art
deep learning cannot do. There’s been a push in research to change this.
Mathematical Picture:
These adjustable parameters, often called weights, are real numbers that can be seen as
‘knobs’ that define the input–output function of the machine
Mathematicians will tell you that deep learning is just complicated regression. Regression
is the art of finding a function that explains the relationship between an input and an output. The
simplest example is from middle school math when we used two points on a Cartesian plane to
10
find the variables m and b in f(x)= mx+b.
There are three types of problems that are straightforward to diagnose with regard to poor
performance of a deep learning neural network model; they are:
Problems with Learning: Problems with learning manifest in a model that cannot
effectively learn a training dataset or shows slow progress or bad performance when
learning the training dataset.
Problems with Predictions: Problems with predictions manifest in the stochastic training
11
algorithm having a strong influence on the final model, causing a high variance in behavior
and performance.
We can summarize techniques that assist with each of these problems as follows:
Better Learning Techniques that improve or accelerate the adaptation of neural network
model weights in response to a training dataset.
Better Predictions Techniques that reduce the variance in the performance of a final
model.
Convolutional layer: in this layer filter works on every part of the image. And search same
feature everywhere in the image. This layer involves shift, multiply and sum operations. The
purpose of this layer is to identify the basic pattern of which the object in the image is made up of.
Output of this layer is a new modified filtered image. In this layer features are identified. In
training phase this layer identified the most accurate feature for image classification.
ReLU layer: is rectified linear unit. Once features are extracted then next step is to move them to
ReLU layer. This layer mainly performs element wise operations, sets all the negative values to
zero and introduces non linearity to the network. At this layer sigmoid function are applied. This
function removes all the black elements from the images.
Pooling layer: provides the down sampling to the output that reduces the dimensionality of the
feature map. Pooling layer reduces size of each feature map by 2. Two types of pooling can be
applied in CNN, Average pooling and max pooling. In average pooling feature map is patched
with average value and in max pooling feature map is patched by maximum value of the matrix.
These three steps are applied multiple times to find best features of the data.
Flattening layer: this layer transforms the matrix into a vector form so that it can be fed into a
fully connected NN classifier.
Fully connected layer: at this layer data is in one dimensional structure. This layer helps in
classifying the input pattern with high-level features extracted by previous layer. This layer gives
a probability that a certain feature belongs to a label. For example, if the image is of a cat, features
representing things like whiskers or fur provide high probabilities for the label “cat”. Surtax
activation function is used to provide probability to each label.
RNN: is Recurrent Neural Network that is applied to problems where data changes according to
time. A single time step of input is provided to the network in RNN. In first step current state is
calculated by using current input and output from the previous state. In next step current state
become the previous state. Once all time steps are completed then output state is calculated from
the current state. Output obtained is then compared with expected output and error is calculated.
This error is back propagated in the network to upgrade the parameters and weights. In this way
13
RNN network is trained. The process of calculating error rate and upgrading weights and
parameters is called vanishing gradient. In RNN vanishing gradient can only remember one step
output but sometime problem requires outputs from long distant states for this LSTM is used.
LSTM have a chain like structure. In LSTM first architecture identify unnecessary information
that is throwing away from the cell state. This is done by sigmoid layer called forget layer. Then
model identify the information that is necessary for further processing. This is done through tanh
activation function. After this output from previous stages is combined to update new cell state.
Autoencoders: is an unsupervised algorithm which uses back propagation algorithm for setting
the desired output equal to the input. Autoencoders are based on the concept of principal
component analysis (PCA). It uses layer by layer fin-tuning with backpropagation and
unsupervised pretraining It is lossy compression. Autoencoder can easily works on non linear
data. Autoencoder have multiple representations of data. Autoencoder uses convolutional layer for
learning feature from data so it does not require learning from dense layer. It is widely used in
image reconstruction, image colorization, and dimensionality reduction. . Hinton et al. achieved a
perfect reconstruction of 784- pixel images using autoencoders which was better than principal
component analysis technique. Autoencoder are used only for which they are trained only we
cannot apply autoencoder for another applications.
Restricted Boltzmann Machine (RBM): Restricted Boltzmann Machine is a deep learning
technique applied on unlabeled data to build non-linear generative models. RBM contains two
layers called visible layer and hidden layer. Each node of visible layer is connected to all nodes in
the hidden layer and no nodes are connected to other nodes in the same layer.RBM increases at
the probability of vectors in the visible layers so that it can probabilistically reconstruct the
unlabeled data. The energy (E) function of the configuration is used for this.
14
Stochastic Gradient Descent: Using the convex function in gradient descent algorithms ensures
finding an optimal minimum without getting trapped in a local minimum. Depending upon the
values of the function and learning rate or step size, it may arrive at the optimum value in different
paths and manners
Learning Rate Decay: Adjusting the learning rate increases the performance and reduces the
training time of stochastic gradient descent algorithms. The widely used technique is to reduce the
learning rate gradually, in which we can make large changes at the beginning and then reduce the
learning rate gradually in the training process. This allows fine-tuning the weights in the later
stages
Dropout : The overfitting problem in deep neural networks can be addressed using the drop out
technique. This method is applied by randomly dropping units and their connections during
training. Dropout offers an effective regularization method to reduce overfitting and improve
generalization error. Dropout gives an improved performance on supervised learning tasks in
computer vision, computational biology, document classification, speech recognition
Max-Pooling: In max-pooling a filter is predefined, and this filter is then applied across the
nonover lapping sub-regions of the input taking the max of the values contained in the window as
the output. Dimensionality, as well as the computational cost to learn several parameters, can be
reduced using maxpooling.
Batch Normalization: Batch normalization reduces covariate shift, thereby accelerating deep
neural network. It normalizes the inputs to a layer, for each mini-batch, when the weights are
updated during the training. Normalization stabilizes learning and reduces the training epochs.
The stability of a neural network can be increased by normalizing the output from the previous
activation layer
Skip-gram: Word embedding algorithms can be modeled using Skip-gram. In the skip-gram
model, two vocabulary terms share a similar context; then those terms are identical. For example,
the sentences ”cats are mammals” and ”dogs are mammals” are meaningful sentences which
15
shares the same meaning ”are mammals.” Skip-gram can be implemented by considering a
context window containing n terms and train the neural network by skipping one of this term and
then use the model to predict skipped term
Transfer learning: In transfer learning, a model trained on a particular task is exploited on
another related task. The knowledge obtained while solving a particular problem can be
transferred to another network, which is to be trained on a related problem. This allows for rapid
progress and enhanced performance while solving the second problem
16
During data flow, input nodes receive data, which travel through hidden layers, and exit output
nodes. No links exist in the network that could get used to by sending information back from the
output node.
Feed forward neural networks serve as the basis for object detection in photos, as shown in the
Google Photos app.
For a network, we need two neurons. These neurons transfer information via synapse between the
dendrites of one and the terminal axon of another.
17
A probable model of an artificial neuron looks like this
The circles are neurons or nodes, with their functions on the data and the lines/edges
connecting them are the weights/information being passed along. Each column is a layer. The first
layer of your data is the input layer. Then, all the layers between the input layer and the output
layer are the hidden layers.
In this model, you have input data, you weight it, and pass it through the function in the
neuron that is called threshold function or activation function. Basically, it is the sum of all of the
values after comparing it with a certain value. If you fire a signal, then the result is (1) out, or
nothing is fired out, then (0). That is then weighted and passed along to the next neuron, and the
same sort of function is run.
We can have a sigmoid (s-shape) function as the activation function. As for the weights,
they are just random to start, and they are unique per input into the node/neuron. In a typical "feed
forward", the most basic type of neural network, you have your information pass straight through
the network you created, and you compare the output to what you hoped the output would have
been using your sample data.
18
From here, you need to adjust the weights to help you get your output to match your
desired output. The act of sending data straight through a neural network is called a feed forward
neural network. Our data goes from input, to the layers, in order, then to the output. When we go
backwards and begin adjusting weights to minimize loss/cost, this is called back propagation. A
deep neural network (DNN) is an ANN with multiple hidden layers between the input and output
layers
Modeled loosely on the human brain, a neural net consists of thousands or even millions of
simple processing nodes that are densely interconnected. Most of today’s neural nets are
organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them
in only one direction. An individual node might be connected to several nodes in the layer beneath
it, from which it receives data, and several nodes in the layer above it, to which it sends data.
To each of its incoming connections, a node will assign a number known as a “weight.”
When the network is active, the node receives a different data item — a different number — over
each of its connections and multiplies it by the associated weight. It then adds the resulting
products together, yielding a single number. If that number is below a threshold value, the node
passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which
in today’s neural nets generally means sending the number — the sum of the weighted inputs —
along all its outgoing connections.
When a neural net is being trained, all of its weights and thresholds are initially set to
random values. Training data is fed to the bottom layer — the input layer — and it passes through
the succeeding layers, getting multiplied and added together in complex ways, until it finally
arrives, radically transformed, at the output layer. During training, the weights and thresholds are
continually adjusted until training data with the same labels consistently yield similar outputs.
A typical neuron consists of the following four parts with the help of which we can explain its
working
Dendrites − They are tree-like branches, responsible for receiving the information from
other neurons it is connected to. In other sense, we can say that they are like the ears of
neuron.
Soma − It is the cell body of the neuron and is responsible for processing of information,
they have received from dendrites.
Axon − It is just like a cable through which neurons send the information.
Synapses − It is the connection between the axon and other neuron dendrites.
19
Activate Function:
Activation function defines the output of input or set of inputs or in other terms defines
node of the output of node that is given in inputs. They basically decide to deactivate neurons or
activate them to get the desired output. It also performs a nonlinear transformation on the input to
get better results on a complex neural network.
Activation function also helps to normalize the output of any input in the range between 1
to -1. Activation function must be efficient and it should reduce the computation time because the
neural network sometimes trained on millions of data points.
Activation function basically decides in any neural network that given input or receiving
information is relevant or it is irrelevant.
The neuron is basically is a weighted average of input, then this sum is passed through an
activation function to get an output
Y = ∑ (weights*input + bias)
Here Y can be anything for a neuron between range -infinity to +infinity. So, we have to bound
our output to get the desired prediction or generalized results.
Y = Activation function (∑ (weights*input + bias))
So, we pass that neuron to activation function to bound output values.
Without activation function, weight and bias would only have a linear transformation, or
neural network is just a linear regression model, a linear equation is polynomial of one degree
only which is simple to solve but limited in terms of ability to solve complex problems or higher
degree polynomials.
But opposite to that, the addition of activation function to neural network executes the non-
linear transformation to input and make it capable to solve complex problems such as language
translations and image classifications.
In addition to that, Activation functions are differentiable due to which they can easily
implement back propagations, optimized strategy while performing backpropagations to measure
gradient loss functions in the neural networks.
20
In this, we decide the threshold value to 0. It is very simple and useful to classify binary problems
or classifier.
2) Linear Function
It is a simple straight line activation function where our function is directly proportional to
the weighted sum of neurons or input. Linear activation functions are better in giving a wide range
of activations and a line of a positive slope may increase the firing rate as the input rate increases.
Y = mZ
We use Leaky ReLU function instead of ReLU to avoid this unfitting, in Leaky ReLU range is
expanded which enhances the performance.
21
5) Sigmoid Activation Function
The sigmoid activation function is used mostly as it does its task with great
efficiency, it basically is a probabilistic approach towards decision making and ranges in
between 0 to 1, so when we have to make a decision or to predict an output we use this
activation function because of the range is the minimum, therefore, prediction would be
more accurate.
22
7. Softmax Activation Function
Softmax is used mainly at the last layer i.e output layer for decision making the same as
sigmoid activation works, the softmax basically gives value to the input variable according
to their weight and the sum of these weights is eventually one.
For Binary classification, both sigmoid, as well as softmax, are equally approachable but
in case of multi-class classification problem we generally use softmax and cross-entropy
along with it.
Back propagation in Neural Networks: The principle behind back propagation algorithm
is to reduce the error values in randomly allocated weights and biases such that it produces
the correct output. The system is trained in the supervised learning method, where the error
between the system’s output and a known expected output is presented to the system and
used to modify its internal state. We need to update the weights such that we get the global
loss minimum. This is how back propagation in neural networks works.
24
Backpropagation is an algorithm commonly used to train neural networks. When
the neural network is initialized, weights are set for its individual elements, called neurons.
Inputs are loaded, they are passed through the network of neurons, and the network
provides an output for each one, given the initial weights. Backpropagation helps to adjust
the weights of the neurons so that the result comes closer and closer to the known true
result.
25