Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

Starting from the left, we have:

1. The input layer of our model in orange.


2. Our first hidden layer of neurons in blue.
3. Our second hidden layer of neurons in magenta.
4. The output layer (a.k.a. the prediction) of our model in green.

The arrows that connect the dots shows how all the neurons are interconnected and how data travels
from the input layer all the way through to the output layer.

We have a set of inputs and a set of target values — and we are trying to get predictions that match
those target values as closely as possible.

By repeatedly calculating [Z] i.e bias or activation, and applying the activation function to it for each
successive layer, we can move from input to output. This process is known as forward
propagation. i.e. moving forward from input to output by making some computations and
calculating activations at each neuron moreover at each layer.

 Weights: a weight decides how much influence the input will have on the output.
 Bias: Bias is just like an intercept added in a linear equation. It is an additional parameter in the
Neural Network which is used to adjust the output along with the weighted sum of the inputs to
the neuron. Moreover, bias value allows you to shift the activation function to either right or left.
 Cost function: A cost function is a measure of error between what value your model predicts and
what the value actually is.

The purpose of Cost Function is to be either: Minimized — then returned value is usually
called cost, loss or error. The goal is to find the values of model parameters for which Cost
Function return as small number as possible.

 Gradient Descent:
The gradient is a numeric calculation allowing us to know how to adjust the parameters of
a network in such a way that its output deviation is minimized.

Gradient descent is an optimization algorithm used to minimize some function by iteratively


moving in the direction of steepest descent as defined by the negative of the gradient. In machine
learning, we use gradient descent to update the parameters of our model.

The gradient of a function is the vector whose elements are its partial derivatives with respect to
each parameter. 

So each element of the gradient tells us how the cost function would change if we applied a small
change to that particular parameter — so we know what to tweak and by how much.

You might also like