Professional Documents
Culture Documents
DL Unit2
DL Unit2
School of Computing
Department of Computer Science & Engineering
School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
Science and Technology
UNIT II - FUNDAMENTALS OF NEURAL
NETWORKS
Basics of Neural Networks- Neural network
representation-History and cognitive
basis of neural computation- Perceptrons-
Perceptron Learning Algorithm- Multilayer
Perceptrons (MLPs)- Representation Power of
MLPs- Back Propagation.
History of Artificial Neural Network
history of the ANNs stems from the 1940s, the decade of the first electronic
computer.
However, the first important step took place in 1957 when Rosenblatt introduced
the first concrete neural model, the perceptron. Rosenblatt also took part in
constructing the first successful neurocomputer, the Mark I Perceptron. After this,
the development of ANNs has proceeded as described in Figure.
• Real-time Operations
Neural networks can learn synchronously and easily adapt to their
changing environments.
• Adaptive Learning
Neural networks can learn how to work on different tasks. Based on
the data given to produce the right output.
Hardware dependence
The pieces of equipment of a neural network are dependent on one
another. By which we mean, that neural networks require (or are highly
dependent on) processors with adequate processing capacity.
• Input Layer
• Output Layer
• The input layer computes the weighted input for every node. The
activation function is pertained to get the result as output.
Return true if the sum > 1.5 ("Yes I will go to the Concert")
• It will not be useful when there are multiple classes in the target
variable.
• Gradients are calculated to update the weights and biases during the
backprop process. Since the gradient of the function is zero, the
weights and biases don’t update.
where, z =∑ (xi * wi ) +b
• The logistic sigmoid function can cause a neural network to get stuck at
the training time.
• TanH also has the vanishing gradient problem, but the gradient is
stronger for TanH than sigmoid (derivatives are steeper).
• TanH is zero-centered, and gradients do not have to move in a
specific direction.
• ReLU stands for Rectified Linear Unit and is one of the most commonly used
activation function in the applications.
• It’s solved the problem of vanishing gradient because the maximum value of
the gradient of ReLU function is one.
• It also solved the problem of saturating neuron, since the slope is never zero
for ReLU function. The range of ReLU is between 0 and infinity.
• Since only a certain number of neurons are activated, the ReLU function is
far more computationally efficient when compared to the sigmoid and TanH
functions.
• ReLU accelerates the convergence of gradient descent towards the global
minimum of the loss function due to its linear, non-saturating property.
• One of its limitations is that it should only be used within hidden layers of
an artificial neural network model.
• Some gradients can be fragile during training.
• For activations in the region (x<0) of ReLu, the gradient will be 0 because
of which the weights will not get adjusted during descent.
• Dying ReLu problem can occur (That means, those neurons, which go into
that state will stop responding to variations in input (simply because the
gradient is 0, nothing changes).
• No learning about the ‘a’ value takes place, and exploding gradient
problem.
• It mimics the one encoded label better than the absolute values.
• The data enters the input nodes, travels through the hidden
layers, and eventually exits the output nodes.
Layer of input :
It contains the neurons that receive input. The data is subsequently passed on to the next tier. The
input layer’s total number of neurons is equal to the number of variables in the dataset.
Hidden layer
This is the intermediate layer, which is concealed between the input and output layers. This layer
has a large number of neurons that perform alterations on the inputs. They then communicate
with the output layer.
Output layer
It is the last layer and is depending on the model’s construction. Additionally, the output layer is
the expected feature, as you are aware of the desired outcome.
Neurons weights
Weights are used to describe the strength of a connection between neurons. The range of a
weight’s value is from 0 to 1.
Error = y-y’
Where,
Y – Actual Input , Y’ – Predicted output
Binary Classification
Loss function: Used to refer the error for a single training example.
Cost function: Used to refer to an average of the loss functions over an entire training dataset.
Class(Orange,Apple,Tomato)
The machine learning model will give a probability distribution of these 3
classes as output for a given input data. The class with the highest probability
is considered as a winner class for prediction.
Output = [P(Orange),P(Apple),P(Tomato)]
The actual probability distribution for each class is shown below.
Orange = [1,0,0]
Apple = [0,1,0]
Tomato = [0,0,1]
y(Tomato) = [0, 0, 1]
The above formula just measures the cross-entropy for a single observation or input data. The
error in classification for the complete model is given by categorical cross-entropy which is nothing
but the mean of cross-entropy for all N training data.
https://www.simplilearn.com/tutorials/deep-learning-tutorial/
perceptron#:~:text=Perceptron%20is%20an%20algorithm%20for%20Supervised
%20Learning%20of%20single%20layer,neuron%20is%20fired%20or%20not.
https://machinelearningmastery.com/perceptron-algorithm-for-classification-in-
python/
https://learnopencv.com/understanding-feedforward-neural-networks/