Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Back-propagation​ is the essence of neural net training.

It is the method of fine-tuning the


weights of a neural net based on the error rate obtained in the previous epoch (i.e., iteration).
Proper tuning of the weights allows you to reduce error rates and to make the model reliable by
increasing its generalization.

Backpropagation is a short form for "backward propagation of errors." It is a standard method of


training artificial neural networks. This method helps to calculate the gradient of a loss function
with respect to all the weights in the network.

Key Points About Back Propagation:


- Simplifies the network structure by elements weighted links that have the least effect on
the trained network
- You need to study a group of input and activation values to develop the relationship
between the input and hidden unit layers.
- It helps to assess the impact that a given input variable has on a network output. The
knowledge gained from this analysis should be represented in rules.
- Backpropagation is especially useful for deep neural networks working on error-prone
projects, such as image or speech recognition.
- Backpropagation takes advantage of the chain and power rules allows backpropagation
to function with any number of outputs.

Gradient Descent​ is an optimization algorithm that's used when training a machine learning
model. It's based on a convex function and tweaks its parameters iteratively to minimize a given
function to its local minimum.

For gradient descent to reach the local minimum we must set the learning rate to an appropriate
value, which is neither too low nor too high. This is important because if the steps it takes are
too big, it may not reach the local minimum because it bounces back and forth between the
convex function of gradient descent. If we set the learning rate to a very small value, gradient
descent will eventually reach the local minimum but that may take a while.

Steps to implement Gradient Descent


1. Randomly initialize values.
2. Update values.
3. Repeat until slope =0

ReLu​ is a non-linear activation function that is used in multi-layer neural networks or deep
neural networks. Traditionally, some prevalent non-linear activation functions, like sigmoid
functions (or logistic) and hyperbolic tangent, are used in neural networks to get activation
values corresponding to each neuron. Recently, the ReLu function has been used instead to
calculate the activation values in traditional neural network or deep neural network paradigms.
The reasons of replacing sigmoid and hyperbolic tangent with ReLu consist of:
1. Computation saving - the ReLu function is able to accelerate the training speed of deep
neural networks compared to traditional activation functions since the derivative of ReLu
is 1 for a positive input. Due to a constant, deep neural networks do not need to take
additional time for computing error terms during the training phase.

2. Solving the vanishing gradient problem - the ReLu function does not trigger the
vanishing gradient problem when the number of layers grows. This is because this
function does not have an asymptotic upper and lower bound. Thus, the earliest layer
(the first hidden layer) is able to receive the errors coming from the last layers to adjust
all weights between layers. By contrast, a traditional activation function like sigmoid is
restricted between 0 and 1, so the errors become small for the first hidden layer. This
scenario will lead to a poorly trained neural network.

-Nishita Verma
BTBM/18/120
Section D
Sem 5

You might also like