DL Unit-1

UNIT-1
Introduction: Various paradigms of learning problems, Perspectives and Issues in deep

learningframework, review of fundamental learning techniques. Feed forward neural
network: ArtificialNeuralNetwork, activationfunction, multi-layer neuralnetwork.
DeepLearning:
Deep learning algorithms are used, especially when we have a huge no of inputs and
outputs. Deep learning models are capable enough to focus on the accurate features
themselves by requiring a little guidance from the programmer and are very helpful in
solving out the problem of dimensionality.
Deep learning is based on the branch of machine learning, which is a sub set of artificial
intelligence. Since neural networks imitate the human brain and so deep learning will do.
Since deep learning has been evolved by the machine learning, which itself is a subset
of artificial intelligence and as the idea behind the artificial intelligence is to mimic the
human behavior, so same is" the idea of deep learning to build such algorithm that can
mimic the brain".
So basically, deep learning is implemented by the help of deep networks, which are
nothing but neural networks with multiple hidden layers.
1
In the example given above, we provide the raw data of images to the first layer of the
input layer. After then, these input layer will determine the patterns of local contrast that
means it will differentiate on the basis of colors, luminosity, etc. Then the 1st hidden layer
will determine the face feature, i.e., it will fixate on eyes, nose, andlips, etc. And then, it will
fixate those face features on the correct face template. So, in the 2 nd hidden layer, it will
actually determine the correct face here as it can be seen in the above image, after which it
will be sent to the output layer. Likewise, more hidden layers can be added to solve more
complex problems, for example, if you want to find out a particular kind of face having
large or light complexions. So, as and when the hidden layers increase, we are able to solve
complex problems.
Various paradigms of learning problems:
Machine learning is a subset of AI, which enables the machine to automatically learn from
data, improve performance from past experiences, and make predictions. Machine learning contains
a set of algorithms that work on a huge amount of data. Data is fed to these algorithms to train them,
and on the basis of training, they build the model & perform a specific task.
These ML algorithms help to solve different business problems like Regression,

Classification, Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
2
Supervised Learning:
 Supervised learning is a type of machine learning that uses labelled data to train machine
learning models.
 In labelled data, the output is already known. The model just needs to map the inputs to the
respective outputs.
 It is the easiest to understand and the simplest to implement. It is very similar to teaching a
child with the use of flash cards.
 An example of supervised learning is to train a system that identifies the image of an
animal.
 These algorithms learn from the past data that is inputted, called training data, runs its
analysis and uses this analysis to predict future events of any new data within the known
classifications.
 The algorithm can be trained further by comparing the training outputs to actual ones and
using the errors for modification of the algorithms.
 The primary objective of the supervised learning technique is to map the input
variable (a) with the output variable (b).
Supervised machine learning is further classified into two broad categories:
1. Classification: These refer to algorithms that address classification problems where the
output variable is categorical; for example, yes or no, true or false, male or female, etc.
Real-world applications of this category are evident in spam detection and email filtering.
Some known classification algorithms include the Random Forest Algorithm,

Decision Tree Algorithm, Logistic Regression Algorithm, and Support Vector Machine
Algorithm.
2. Regression: Regression algorithms handle regression problems where input and
output variables have a linear relationship. These are known to predict continuous output
variables. Examples include weather prediction, market trend analysis, etc. Popular regression
algorithms include the Simple Linear Regression Algorithm, Multivariate Regression
Algorithm, Decision Tree Algorithm, and Lasso Regression.
For example, consider an input dataset of parrot and crow images. Initially, the machine is
trained to understand the pictures, including the parrot and crow’s colour, eyes, shape, and size.
3
Post-training, an input picture of a parrot is provided, and the machine is expected to identify the
object and predict the output. The trained machine checks for the various features of the object,
such as colour, eyes, shape, etc., in the input picture, to make a final prediction. This is the process
of object identification in supervised machine learning.
Advantages and Disadvantages of Supervised Learning
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact idea about
the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
o These algorithms are not able to solve complex tasks.
o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.
o Training required lots of computation times.
Applications of Supervised Learning

Some common applications of Supervised Learning are given below:
Image Segmentation: Supervised Learning algorithms are used in image segmentation. In this
process, image classification is performed on different image data with pre-defined labels.
Medical Diagnosis: Supervised algorithms are also used in the medical field for diagnosis
purposes. It is done by using medical images and past labelled data with labels for disease
conditions. With such a process, the machine can identify a disease for the new patients.
Fraud Detection - Supervised Learning classification algorithms are used for identifying fraud
transactions, fraud customers, etc. It is done by using historic data to identify the patterns that can
lead to possible fraud.
Spam detection - In spam detection & filtering, classification algorithms are used. These
algorithms classify an email as spam or not spam. The spam emails are sent to the spam folder.
Speech Recognition - Supervised learning algorithms are also used in speech recognition. The
algorithm is trained with voice data, and various identifications can be done using the same, such as
voice-activated passwords, voice commands, etc.
Unsupervised Machine Learning:

 Unsupervised learning is a type of machine learning in which models are trained using
unlabelled dataset and are allowed to act on that data without any supervision.
 The goal of unsupervised learning is to find the underlying structure of dataset, group
that data according to similarities, and represent that dataset in a compressed format.
 An unsupervised learning algorithm aims to group the unsorted dataset based on the input’s
similarities, differences, and patterns.
 In unsupervised learning, there is no instructor or teacher, and the algorithm must learn to
make sense of the data without this guide.
 Unsupervised learning is helpful for finding useful insights from the data.
4
 Unsupervised learning is much similar as a human learns to think by their own experiences,
which makes it closer to the real AI.
 Unsupervised learning works on unlabelled and uncategorized data which make
unsupervised learning more important.
 In real-world, we do not always have input data with the corresponding output so to solve
such cases, we need unsupervised learning.
Unsupervised machine learning is further classified into two types:

Clustering: The clustering technique refers to grouping objects into clusters based on parameters
such as similarities or differences between objects. For example, grouping customers by the
products they purchase.
Some known clustering algorithms include the K-Means Clustering Algorithm, Mean-
Shift Algorithm, DBSCAN Algorithm, Principal ComponentAnalysis, and Independent
Component Analysis.
Association: Association learning refers to identifying typical relations between the variables of a
large dataset. It determines the dependency of various data items and maps associated variables.
Typical applications include web usage mining and market data analysis.
Popular algorithms obeying association rules include the Apriori Algorithm, Eclat Algorithm,
and FP-Growth Algorithm.
Advantages of Unsupervised Learning

 Unsupervised learning is used for more complex tasks as compared to supervised learning
because, in unsupervised learning, we don't have labelled input data.
 Unsupervised learning is preferable as it is easy to get unlabelled data in comparison to
labelled data.
Disadvantages of Unsupervised Learning
 Unsupervised learning is intrinsically more difficult than supervised learning as it does not
have corresponding output.
 The result of the unsupervised learning algorithm might be less accurate as input data is not
labelled, and algorithms do not know the exact output in advance.
Applications of Unsupervised Learning
 Network Analysis: Unsupervised learning is used for identifying plagiarism and copyright
in document network analysis of text data for scholarly articles.
5
 Recommendation Systems: Recommendation systems widely use unsupervised learning
techniques for building recommendation applications for different web applications and e-
commerce websites.
 Anomaly Detection: Anomaly detection is a popular application of unsupervised learning,
which can identify unusual data points within the dataset. It is used to discover fraudulent
transactions.
 Singular Value Decomposition: Singular Value Decomposition or SVD is used to extract
particular information from the database. For example, extracting information of each user
located at a particular location.
Semi- Supervised Learning:
 Semi-Supervised learning is a type of Machine Learning algorithm that lies between

Supervised and Unsupervised machine learning.
 It represents the intermediate ground between Supervised (With Labelled training data) and
Unsupervised learning (with no labelled training data) algorithms and uses the combination
of labelled and unlabelled datasets during the training period.
 It uses the combination of labelled and unlabelled datasets to train its algorithms
 To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced. The main aim
of semi-supervised learning is to effectively use all the available data, rather than only
labelled data like in supervised learning.
 Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabelled data into labelled data.
 Consider an example of a college student.
A student learning a concept under a teacher’s supervision in college is termed supervised
learning.
In unsupervised learning, a student self-learns the same concept at home without a teacher’s
guidance. Meanwhile, a student revising the concept after learning under the direction of a
teacher in college is a semi-supervised form of learning.
Advantages and disadvantages of Semi-supervised Learning
Advantages:
o It is simple and easy to understand the algorithm.
o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:
o Iterations results may not be stable.
o We cannot apply these algorithms to network-level data.
o Accuracy is low.
6
Reinforcement Learning:
 Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking action,
learning from experiences, and improving its performance.
 Agent gets rewarded for each good action and get punished for each bad action; hence the
goal of reinforcement learning agent is to maximize the rewards.
 In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.
 The reinforcement learning process is similar to a human being;
For example, a child learns various things by experiences in his day-to-day life. An
example of reinforcement learning is to play a game, where the Game is the environment,
moves of an agent at each step define states, and the goal of the agent is to get a high score.
Agent receives feedback in terms of punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
 A reinforcement learning problem can be formalized using Markov Decision
Process(MDP). In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.
Categories of Reinforcement Learning

Reinforcement learning is categorized mainly into two types of methods/algorithms:
 Positive Reinforcement Learning: Positive reinforcement learning specifies increasing the
tendency that the required behaviour would occur again by adding something. It enhances
the strength of the behaviour of the agent and positively impacts it.
 Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour would
occur again by avoiding the negative condition.
Real-world Use cases of Reinforcement Learning

 Videogames:
RL algorithms are much popular in gaming applications. It is used to gain super-human
performance. Some popular games that use RL algorithms are AlphaGO and AlphaGO
Zero.
 ResourceManagement:
The "Resource Management with Deep Reinforcement Learning" paper showed that how to
use RL in computer to automatically learn and schedule resources to wait for different jobs
in order to minimize average job slowdown.
 Robotics:
RL is widely being used in Robotics applications. Robots are used in the industrial and
manufacturing area, and these robots are made more powerful with reinforcement learning.
There are different industries that have their vision of building intelligent robots using AI
and Machine learning technology.
7
 TextMining
Text-mining, one of the great applications of NLP(Natural language processing), is now
being implemented with the help of Reinforcement Learning by Sales force company.
Advantages and Disadvantages of Reinforcement Learning
Advantages:
 It helps in solving complex real-world problems which are difficult to be solved by general
techniques.
 The learning model of RL is similar to the learning of human beings; hence most accurate
results can be found.
 Helps in achieving long term results.
Disadvantage:
 RL algorithms are not preferred for simple problems.

 RL algorithms require huge data and computations.
 Too much reinforcement learning can lead to an overload of states which can weaken the
results.
Difference between Supervised and Unsupervised Learning:
Supervised and Unsupervised learning are the two techniques of machine learning. But both the
techniques are used in different scenarios and with different datasets. Below the explanation of both
learning methods along with their difference table is given.
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained Unsupervised learning algorithms are trained
using labelled data. using unlabelled data.
Supervised learning model takes direct feedback Unsupervised learning model does not take
to check if it is predicting correct output or not. any feedback.
Supervised learning model predicts the output. Unsupervised learning model finds the
hidden patterns in data.
8
In supervised learning, input data is provided to In unsupervised learning, only input data is
the model along with the output. provided to the model.
The goal of supervised learning is to train the The goal of unsupervised learning is to find
model so that it can predict the output when it is the hidden patterns and useful insights from
given new data. the unknown dataset.
Supervised learning needs supervision to train Unsupervised learning does not need any
the model. supervision to train the model.
Supervised learning can be categorized Unsupervised Learning can be classified

in Classification and Regression problems. in Clustering and Associations problems.
Supervised learning can be used for those cases Unsupervised learning can be used for those
where we know the input as well as cases where we have only input data and no
corresponding outputs. corresponding output data.
Supervised learning model produces an accurate Unsupervised learning model may give less
result. accurate result as compared to supervised
learning.
Supervised learning is not close to true Unsupervised learning is more close to the
Artificial intelligence as in this, we first train true Artificial Intelligence as it learns
the model for each data, and then only it can similarly as a child learns daily routine things
predict the correct output. by his experiences.
It includes various algorithms such as Linear It includes various algorithms such as

Regression, Logistic Regression, Support Clustering, KNN, and Apriori algorithm.
Vector Machine, Multi-class Classification,
Decision tree, Bayesian Logic, etc.
Perspectives and Issues in deep learning framework:

A deep learning frame work is an interface, library or a tool which allows us to build deep
learning models more easily and quickly, without getting into the details of underlying
algorithms. They provide a clearandconcisewayfordefiningmodelsusingacollectionofpre-
builtandoptimizedcomponents.
Below are some of the key features of a good deep learning framework:
1. Optimized for performance

2. Easy to understand and code
3. Good community support
4. Parallelize the processes to reduce computations
5. Automatically compute gradients
9
Biological picture:
A lot of the motivation for deep nets did come from looking at the brain
The field of deep learning got started when scientists tried to approximate certain circuits
in the brain. For example, the Convolutional Neural Network (ConvNet) – a cornerstone of
modern computer vision – was inspired by a paper about neurons in the monkey striate
cortex. Another example is the field of Reinforcement Learning – the hottest area of AI right
now - which was built on our understanding of how the brain processes rewards.
The architecture of neural networks is inspired by connections between neurons
Skeptics claim that deep learning stretches the brain metaphor too far. Some of their critiques are:
1. Scale: The largest neural nets have around 100 million ‘connections’. The human brain has over
a 100 trillion connections. With faster GPUs and cluster computing, the size of deep learning
models has increased by orders of magnitude in the past few years. This pattern will probably
continue.
2. Backprop: Some scientists believe that the backpropagation algorithm has no biological
correlate. Others disagree.
3. Architecture: Certain areas of the brain are able to store really complicated memories, perform
symbolic operations, and build models of the outside world. These are things that state-of-the art
deep learning cannot do. There’s been a push in research to change this.
Mathematical Picture:
These adjustable parameters, often called weights, are real numbers that can be seen as
‘knobs’ that define the input–output function of the machine
Mathematicians will tell you that deep learning is just complicated regression. Regression
is the art of finding a function that explains the relationship between an input and an output. The
simplest example is from middle school math when we used two points on a Cartesian plane to
10
find the variables m and b in f(x)= mx+b.
Deep learning as a type of regression (images from 2015 ImageNet paper)

Deep learning is the same thing except f(x) can be arbitrarily complex and nonlinear. The
input could be the pixels of an image and the outputs a caption which describes the image. The
input could be a sentence of French and the output could be a sentence in English. Obviously,
functions that perform these mappings must be extremely complicated. In fact, deep learning
models often have millions of free parameters instead of just an mm and a bb. That said, it’s just
regression.
Computer Science Picture:

The tech industry. Tech companies love to build hype around vague keyphrases such as
‘cloud’, ‘big data’, and ‘machine learning.’ Inevitably, ‘deep learning’ will join the list. There are
really two approaches to deep learning in tech. The first says: this gets great results and makes us
a profit – let’s pour a ton of money into it. The second says: this is the cure-all, next-big-thing,
singularity-inducing pinnacle of computer science - let’s pour a ton of money into it. There are a
few notable exceptions.
There are three types of problems that are straightforward to diagnose with regard to poor
performance of a deep learning neural network model; they are:
 Problems with Learning: Problems with learning manifest in a model that cannot
effectively learn a training dataset or shows slow progress or bad performance when
learning the training dataset.
 Problems with Generalization: Problems with generalization manifest in a model that

overfits the training dataset and makes poor performance on a holdout dataset.
 Problems with Predictions: Problems with predictions manifest in the stochastic training
11
algorithm having a strong influence on the final model, causing a high variance in behavior
and performance.
We can summarize techniques that assist with each of these problems as follows:
 Better Learning Techniques that improve or accelerate the adaptation of neural network
model weights in response to a training dataset.
 Better Generalization Techniques that improve the performance of a neural network

model on a holdout dataset.
 Better Predictions Techniques that reduce the variance in the performance of a final
model.
Review of fundamental learning techniques:

Deep learning is a subarea of machine learning that is emerged from the concept of neural
network. Neural network is inspired by and resembles the human nervous system and the structure
of the brain. It is an application of Artificial Neural Network in which number of hidden layer is
one but as number of hidden number of hidden layer increases network goes deeper and it refers to
deep neural network. Deep learning can be applied in almost every machine learning problem.
The layers in DNN are divided into three categories broadly input layer, multiple hidden layer and
output layers. Input layer is used to give input to the network, hidden layer process the input
provided and output layer generates output to the system. Input in hidden layers processed a
distributed representation and the main driving variables of the input data. Deep learning working
model is divided into two phases, training and testing. In training phase of model, model
parameters with random numbers are introduced and the pre-train some models are done. After
first iteration completes then next step is to read and processes the training data and training errors
are calculated by comparing the obtained output with expected output. Then parameters are
upgraded according to training error through error backpropagation. Then testing phase is
performed to find whether conditions of the iterative training are met for termination or continue
the iterative process of training.
Techniques of deep learning:

CNN: is a feedforward neural network that is generally used to analyse the static or image
data and is also known as ConvNet. In CNN, whole architecture is divided into three layered
structure. In image identification, CNN take the input image, process it, and classify it in a certain
category e.g. cat, dog, tiger. In computer image is stored as an array of pixels. Like the traditional
machine learning architecture, CNN also have to train for data to solve a particular problem. For
this first architecture of Neural network is decided like how many layers are used in network, how
we arrange the layers, which layer to use, and how many neuron to be used in a layer. Various
CNN architectures are AlexNet, GoogleNet, Inception ResNet, and VGG. Once network
architecture is decided after that various biases and parameters for the network are selected bases
on the problem. At first these are selected randomly but further they are changed through back
propagation. Objective of this phase is to find the best possible values of network parameters and
data features so that further identification of data can be accurately done. For e.g. when we try to
build a classifier for cat and dog then we are looking to find the parameters that gives the
probability of dog 1 or higher than cat and for all the images of dog or 0 or less than dog for cat
12
image. Whole CNN process is divided into two parts feature learning and classification. In feature
learning there are three steps performed many times for different feature detection. These are
convolutional operation, ReLu and Pooling. In classification image is classification is performed.
In this phase three operation is performed flattened, fully connected and softmax operations.
Convolutional layer: in this layer filter works on every part of the image. And search same
feature everywhere in the image. This layer involves shift, multiply and sum operations. The
purpose of this layer is to identify the basic pattern of which the object in the image is made up of.
Output of this layer is a new modified filtered image. In this layer features are identified. In
training phase this layer identified the most accurate feature for image classification.
ReLU layer: is rectified linear unit. Once features are extracted then next step is to move them to
ReLU layer. This layer mainly performs element wise operations, sets all the negative values to
zero and introduces non linearity to the network. At this layer sigmoid function are applied. This
function removes all the black elements from the images.
Pooling layer: provides the down sampling to the output that reduces the dimensionality of the
feature map. Pooling layer reduces size of each feature map by 2. Two types of pooling can be
applied in CNN, Average pooling and max pooling. In average pooling feature map is patched
with average value and in max pooling feature map is patched by maximum value of the matrix.
These three steps are applied multiple times to find best features of the data.
Flattening layer: this layer transforms the matrix into a vector form so that it can be fed into a
fully connected NN classifier.
Fully connected layer: at this layer data is in one dimensional structure. This layer helps in
classifying the input pattern with high-level features extracted by previous layer. This layer gives
a probability that a certain feature belongs to a label. For example, if the image is of a cat, features
representing things like whiskers or fur provide high probabilities for the label “cat”. Surtax
activation function is used to provide probability to each label.
RNN: is Recurrent Neural Network that is applied to problems where data changes according to
time. A single time step of input is provided to the network in RNN. In first step current state is
calculated by using current input and output from the previous state. In next step current state
become the previous state. Once all time steps are completed then output state is calculated from
the current state. Output obtained is then compared with expected output and error is calculated.
This error is back propagated in the network to upgrade the parameters and weights. In this way
13
RNN network is trained. The process of calculating error rate and upgrading weights and
parameters is called vanishing gradient. In RNN vanishing gradient can only remember one step
output but sometime problem requires outputs from long distant states for this LSTM is used.
LSTM have a chain like structure. In LSTM first architecture identify unnecessary information
that is throwing away from the cell state. This is done by sigmoid layer called forget layer. Then
model identify the information that is necessary for further processing. This is done through tanh
activation function. After this output from previous stages is combined to update new cell state.
Autoencoders: is an unsupervised algorithm which uses back propagation algorithm for setting
the desired output equal to the input. Autoencoders are based on the concept of principal
component analysis (PCA). It uses layer by layer fin-tuning with backpropagation and
unsupervised pretraining It is lossy compression. Autoencoder can easily works on non linear
data. Autoencoder have multiple representations of data. Autoencoder uses convolutional layer for
learning feature from data so it does not require learning from dense layer. It is widely used in
image reconstruction, image colorization, and dimensionality reduction. . Hinton et al. achieved a
perfect reconstruction of 784- pixel images using autoencoders which was better than principal
component analysis technique. Autoencoder are used only for which they are trained only we
cannot apply autoencoder for another applications.
Restricted Boltzmann Machine (RBM): Restricted Boltzmann Machine is a deep learning
technique applied on unlabeled data to build non-linear generative models. RBM contains two
layers called visible layer and hidden layer. Each node of visible layer is connected to all nodes in
the hidden layer and no nodes are connected to other nodes in the same layer.RBM increases at
the probability of vectors in the visible layers so that it can probabilistically reconstruct the
unlabeled data. The energy (E) function of the configuration is used for this.
Deep Learning Methods:

Some of the powerful techniques that can be applied to deep learning algorithms to reduce
the training time and to optimize the model are discussed in the following section.
Back propagation: While solving an optimization problem using a gradient based method,
backpropagation can be used to calculate the gradient of the function for each iteration
14
Stochastic Gradient Descent: Using the convex function in gradient descent algorithms ensures
finding an optimal minimum without getting trapped in a local minimum. Depending upon the
values of the function and learning rate or step size, it may arrive at the optimum value in different
paths and manners
Learning Rate Decay: Adjusting the learning rate increases the performance and reduces the
training time of stochastic gradient descent algorithms. The widely used technique is to reduce the
learning rate gradually, in which we can make large changes at the beginning and then reduce the
learning rate gradually in the training process. This allows fine-tuning the weights in the later
stages
Dropout : The overfitting problem in deep neural networks can be addressed using the drop out
technique. This method is applied by randomly dropping units and their connections during
training. Dropout offers an effective regularization method to reduce overfitting and improve
generalization error. Dropout gives an improved performance on supervised learning tasks in
computer vision, computational biology, document classification, speech recognition
Max-Pooling: In max-pooling a filter is predefined, and this filter is then applied across the
nonover lapping sub-regions of the input taking the max of the values contained in the window as
the output. Dimensionality, as well as the computational cost to learn several parameters, can be
reduced using maxpooling.
Batch Normalization: Batch normalization reduces covariate shift, thereby accelerating deep
neural network. It normalizes the inputs to a layer, for each mini-batch, when the weights are
updated during the training. Normalization stabilizes learning and reduces the training epochs.
The stability of a neural network can be increased by normalizing the output from the previous
activation layer
Skip-gram: Word embedding algorithms can be modeled using Skip-gram. In the skip-gram
model, two vocabulary terms share a similar context; then those terms are identical. For example,
the sentences ”cats are mammals” and ”dogs are mammals” are meaningful sentences which
15
shares the same meaning ”are mammals.” Skip-gram can be implemented by considering a
context window containing n terms and train the neural network by skipping one of this term and
then use the model to predict skipped term
Transfer learning: In transfer learning, a model trained on a particular task is exploited on
another related task. The knowledge obtained while solving a particular problem can be
transferred to another network, which is to be trained on a related problem. This allows for rapid
progress and enhanced performance while solving the second problem
Deep Learning Frame works:

A deep learning framework helps in modeling a network more rapidly without going into details
of underlying algorithms. Each framework is built for different purposes differently.
TensorFlow, developed by Google brain, supports languages such as Python, C++and R. It
enables us to deploy our deep learning models in CPUs as well as GPUs
Keras is an API, written in Python and run on top of TensorFlow. It enables fast experimentation.
It supports both CNNs and RNNs and runs on CPUs and GPUs PyTorch can be used for building
deep neural networks as well as executing tensor computations. PyTorch is a Python-based
package that provides Tensor computations. PyTorch delivers a framework to create
computational graphs.
Caffe Yangqing Jia developed Caffe, and it is open source as well. Caffe stands out from other
frameworks in its speed of processing as well as learning from images. Caffe Model Zoo
framework facilitates us to access pre-trained models, which enable us to solve various problems
effortlessly
Deeplearning4j Deeplearnig4j is implemented in Java, and hence, it is more efficient when
compared to Python. The ND4J tensor library used by Deeplearning4j provides the capability to
work with multi-dimensional arrays or tensors. This framework supports CPUs and GPUs.
Deeplearnig4j works with images, csv as well as plaintext
Feed forward neural network:

Feed forward neural networks are artificial neural networks in which nodes do not form loops. This
type of neural network is also known as a multi-layer neural network as all information is only
passed forward.
16
During data flow, input nodes receive data, which travel through hidden layers, and exit output
nodes. No links exist in the network that could get used to by sending information back from the
output node.
A feed forward neural network approximates functions in the following way:
 An algorithm calculates classifiers by using the formula y = f* (x).

 Input x is therefore assigned to category y.
 According to the feed forward model, y = f (x; θ). This value determines the closest
approximation of the function.
Feed forward neural networks serve as the basis for object detection in photos, as shown in the
Google Photos app.
Artificial Neural Network:

This is one of the simplest types of artificial neural networks. In a feedforward neural network, the
data passes through the different input nodes until it reaches the output node.
A neural network mimics a neuron, which has dendrites, a nucleus, axon, and terminal axon
For a network, we need two neurons. These neurons transfer information via synapse between the
dendrites of one and the terminal axon of another.
17
A probable model of an artificial neuron looks like this
A neural network will look like as shown below
The circles are neurons or nodes, with their functions on the data and the lines/edges
connecting them are the weights/information being passed along. Each column is a layer. The first
layer of your data is the input layer. Then, all the layers between the input layer and the output
layer are the hidden layers.
In this model, you have input data, you weight it, and pass it through the function in the
neuron that is called threshold function or activation function. Basically, it is the sum of all of the
values after comparing it with a certain value. If you fire a signal, then the result is (1) out, or
nothing is fired out, then (0). That is then weighted and passed along to the next neuron, and the
same sort of function is run.
We can have a sigmoid (s-shape) function as the activation function. As for the weights,
they are just random to start, and they are unique per input into the node/neuron. In a typical "feed
forward", the most basic type of neural network, you have your information pass straight through
the network you created, and you compare the output to what you hoped the output would have
been using your sample data.
18
From here, you need to adjust the weights to help you get your output to match your
desired output. The act of sending data straight through a neural network is called a feed forward
neural network. Our data goes from input, to the layers, in order, then to the output. When we go
backwards and begin adjusting weights to minimize loss/cost, this is called back propagation. A
deep neural network (DNN) is an ANN with multiple hidden layers between the input and output
layers
Modeled loosely on the human brain, a neural net consists of thousands or even millions of
simple processing nodes that are densely interconnected. Most of today’s neural nets are
organized into layers of nodes, and they’re “feed-forward,” meaning that data moves through them
in only one direction. An individual node might be connected to several nodes in the layer beneath
it, from which it receives data, and several nodes in the layer above it, to which it sends data.
To each of its incoming connections, a node will assign a number known as a “weight.”
When the network is active, the node receives a different data item — a different number — over
each of its connections and multiplies it by the associated weight. It then adds the resulting
products together, yielding a single number. If that number is below a threshold value, the node
passes no data to the next layer. If the number exceeds the threshold value, the node “fires,” which
in today’s neural nets generally means sending the number — the sum of the weighted inputs —
along all its outgoing connections.
When a neural net is being trained, all of its weights and thresholds are initially set to
random values. Training data is fed to the bottom layer — the input layer — and it passes through
the succeeding layers, getting multiplied and added together in complex ways, until it finally
arrives, radically transformed, at the output layer. During training, the weights and thresholds are
continually adjusted until training data with the same labels consistently yield similar outputs.
A typical neuron consists of the following four parts with the help of which we can explain its
working
 Dendrites − They are tree-like branches, responsible for receiving the information from
other neurons it is connected to. In other sense, we can say that they are like the ears of
neuron.
 Soma − It is the cell body of the neuron and is responsible for processing of information,
they have received from dendrites.
 Axon − It is just like a cable through which neurons send the information.
 Synapses − It is the connection between the axon and other neuron dendrites.
19
Activate Function:
Activation function defines the output of input or set of inputs or in other terms defines
node of the output of node that is given in inputs. They basically decide to deactivate neurons or
activate them to get the desired output. It also performs a nonlinear transformation on the input to
get better results on a complex neural network.
Activation function also helps to normalize the output of any input in the range between 1
to -1. Activation function must be efficient and it should reduce the computation time because the
neural network sometimes trained on millions of data points.
Activation function basically decides in any neural network that given input or receiving
information is relevant or it is irrelevant.
The neuron is basically is a weighted average of input, then this sum is passed through an
activation function to get an output
Y = ∑ (weights*input + bias)
Here Y can be anything for a neuron between range -infinity to +infinity. So, we have to bound
our output to get the desired prediction or generalized results.
Y = Activation function (∑ (weights*input + bias))
So, we pass that neuron to activation function to bound output values.
Without activation function, weight and bias would only have a linear transformation, or
neural network is just a linear regression model, a linear equation is polynomial of one degree
only which is simple to solve but limited in terms of ability to solve complex problems or higher
degree polynomials.
But opposite to that, the addition of activation function to neural network executes the non-
linear transformation to input and make it capable to solve complex problems such as language
translations and image classifications.
In addition to that, Activation functions are differentiable due to which they can easily
implement back propagations, optimized strategy while performing backpropagations to measure
gradient loss functions in the neural networks.
Types of Activation Functions:

1) Binary Step Function
This activation function very basic and it comes to mind every time if we try to bound output.
It is basically a threshold base classifier, in this, we decide some threshold value to decide output
that neuron should be activated or deactivated.
f(x) = 1 if x > 0 else 0 if x < 0
20
In this, we decide the threshold value to 0. It is very simple and useful to classify binary problems
or classifier.
2) Linear Function
It is a simple straight line activation function where our function is directly proportional to
the weighted sum of neurons or input. Linear activation functions are better in giving a wide range
of activations and a line of a positive slope may increase the firing rate as the input rate increases.
Y = mZ
3) ReLU( Rectified Linear unit) Activation function

Rectified linear unit or ReLU is most widely used activation function right now which
ranges from 0 to infinity, All the negative values are converted into zero, and this conversion
rate is so fast that neither it can map nor fit into data properly which creates a problem, but
where there is a problem there is a solution.
We use Leaky ReLU function instead of ReLU to avoid this unfitting, in Leaky ReLU range is
expanded which enhances the performance.
4) Leaky ReLU Activation Function

We needed the Leaky ReLU activation function to solve the ‘Dying ReLU’ problem, as
discussed in ReLU, we observe that all the negative input values turn into zero very quickly
and in the case of Leaky ReLU we do not make all negative inputs to zero but to a value near
to zero which solves the major issue of ReLU activation function.
21
5) Sigmoid Activation Function
The sigmoid activation function is used mostly as it does its task with great
efficiency, it basically is a probabilistic approach towards decision making and ranges in
between 0 to 1, so when we have to make a decision or to predict an output we use this
activation function because of the range is the minimum, therefore, prediction would be
more accurate.
The equation for the sigmoid function is

f(x) = 1/(1+e(-x) )
The sigmoid function causes a problem mainly termed as vanishing gradient problem
which occurs because we convert large input in between the range of 0 to 1 and therefore
their derivatives become much smaller which does not give satisfactory output. To solve
this problem another activation function such as ReLU is used where we do not have a
small derivative problem.
6) Hyperbolic Tangent Activation Function(Tanh)

This activation function is slightly better than the sigmoid function, like the sigmoid
function it is also used to predict or to differentiate between two classes but it maps the
negative input into negative quantity only and ranges in between -1 to 1.
22
7. Softmax Activation Function
Softmax is used mainly at the last layer i.e output layer for decision making the same as
sigmoid activation works, the softmax basically gives value to the input variable according
to their weight and the sum of these weights is eventually one.
For Binary classification, both sigmoid, as well as softmax, are equally approachable but
in case of multi-class classification problem we generally use softmax and cross-entropy
along with it.
Multi-layer neural network:

A multilayer perceptron is a neural network connecting multiple layers in a
directed graph, which means that the signal path through the nodes only goes one way.
Each node, apart from the input nodes, has a nonlinear activation function. An MLP uses
backpropagation as a supervised learning technique. Since there are multiple layers of
neurons, MLP is a deep learning technique.
MLP is widely used for solving problems that require supervised learning as well
as research into computational neuroscience and parallel distributed processing.
Applications include speech recognition, image recognition and machine translation.
A multilayer perceptron (MLP) is a deep, artificial neural network. It is composed
of more than one perceptron. They are composed of an input layer to receive the signal, an
23
output layer that makes a decision or prediction about the input, and in between those two,
an arbitrary number of hidden layers that are the true computational engine of the MLP.
MLPs with one hidden layer are capable of approximating any continuous function
Multilayer perceptrons are often applied to supervised learning problems: they train
on a set of input-output pairs and learn to model the correlation (or dependencies) between
those inputs and outputs. Training involves adjusting the parameters, or the weights and
biases, of the model in order to minimize error. Backpropagation is used to make those
weigh and bias adjustments relative to the error, and the error itself can be measured in a
variety of ways, including by root mean squared error (RMSE).
Back propagation in Neural Networks: The principle behind back propagation algorithm
is to reduce the error values in randomly allocated weights and biases such that it produces
the correct output. The system is trained in the supervised learning method, where the error
between the system’s output and a known expected output is presented to the system and
used to modify its internal state. We need to update the weights such that we get the global
loss minimum. This is how back propagation in neural networks works.
When the gradient is negative, increase in weight decreases the error.

When the gradient is positive, decrease in weight decreases the error.
24
Backpropagation is an algorithm commonly used to train neural networks. When
the neural network is initialized, weights are set for its individual elements, called neurons.
Inputs are loaded, they are passed through the network of neurons, and the network
provides an output for each one, given the initial weights. Backpropagation helps to adjust
the weights of the neurons so that the result comes closer and closer to the known true
result.
25

DL Unit-1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DL Unit-1

Uploaded by

Copyright:

Available Formats

UNIT-1

Introduction: Various paradigms of learning problems, Perspectives and Issues in deep

Various paradigms of learning problems:

These ML algorithms help to solve different business problems like Regression,

Supervised machine learning is further classified into two broad categories:

Some known classification algorithms include the Random Forest Algorithm,

Advantages and Disadvantages of Supervised Learning

Applications of Supervised Learning

Unsupervised Machine Learning:

Unsupervised machine learning is further classified into two types:

Advantages of Unsupervised Learning

Semi- Supervised Learning:

 Semi-Supervised learning is a type of Machine Learning algorithm that lies between

Categories of Reinforcement Learning

Real-world Use cases of Reinforcement Learning

 RL algorithms are not preferred for simple problems.

Supervised Learning Unsupervised Learning

Supervised learning can be categorized Unsupervised Learning can be classified

It includes various algorithms such as Linear It includes various algorithms such as

Perspectives and Issues in deep learning framework:

1. Optimized for performance

The architecture of neural networks is inspired by connections between neurons

Deep learning as a type of regression (images from 2015 ImageNet paper)

Computer Science Picture:

 Problems with Generalization: Problems with generalization manifest in a model that

 Better Generalization Techniques that improve the performance of a neural network

Review of fundamental learning techniques:

Techniques of deep learning:

Deep Learning Methods:

Deep Learning Frame works:

Feed forward neural network:

A feed forward neural network approximates functions in the following way:

 An algorithm calculates classifiers by using the formula y = f* (x).

Artificial Neural Network:

A neural network will look like as shown below

Types of Activation Functions:

3) ReLU( Rectified Linear unit) Activation function

4) Leaky ReLU Activation Function

The equation for the sigmoid function is

6) Hyperbolic Tangent Activation Function(Tanh)

Multi-layer neural network:

When the gradient is negative, increase in weight decreases the error.

You might also like