Professional Documents
Culture Documents
4 - Activation Functions in Neural Networks
4 - Activation Functions in Neural Networks
Layers are the vertically stacked parts that make up a neural network. The
image's dotted lines each signify a layer. A NN has three different types
of layers.
Input Layer
The input layer is first. The data will be accepted by this layer and
forwarded to the remainder of the network. This layer allows feature
input. It feeds the network with data from the outside world; no
calculation is done here; instead, nodes simply transmit the information
(features) to the hidden units.
Hidden Layer
Since they are a component of the abstraction that any neural network
provides, the nodes in this layer are not visible to the outside world. Any
features entered through to the input layer are processed by the hidden
layer in any way, with the results being sent to the output layer. The
concealed layer is the name given to the second kind of layer. For a
neural network, either there are one or many hidden layers. The number
inside the example above is 1. In reality, hidden layers are what give
neural networks their exceptional performance and intricacy. They carry
out several tasks concurrently, including data transformation and
automatic feature generation.
Output Layer
This layer raises the knowledge that the network has acquired to the
outside world. The output layer is the final kind of layer The output layer
contains the answer to the issue. We receive output from the output layer
after passing raw photos to the input layer.
Data science makes extensive use of the rectified unit (ReLU) functional
or the category of sigmoid processes, which also includes the logistic
regression model, logistic hyperbolic tangent, and arctangent function.
Activation Function
Definition
Activation Function
o Linear Function
Currently, the ReLU is the activation function that is employed the most
globally. Since practically all convolutional neural networks and deep
learning systems employ it.
The derivative and the function are both monotonic.
However, the problem is that all negative values instantly become zero,
which reduces the model's capacity to effectively fit or learn from the
data. This means that any negative input to a ReLU activation function
immediately becomes zero in the graph, which has an impact on the final
graph by improperly mapping the negative values.
o Softmax Function
Output:
Notation: We can represent all the logits as a vector, v, and apply the
activation function, S, on this vector to output the probabilities
vector, p, and represent the operation as follows
Note that the labels: dog, cat, horse and cheetah are in string format.
We need to define a way in which we represent these values as
numerical values.
1. Integer encoding
2. One-hot encoding
When to use integer encoding: It is used when the labels are ordinal
in nature, that is, labels with some order, for example, consider a
classification problem where we want to classify a service as either
poor, neutral or good, then we can encode these classes as follows
Clearly, the labels have some order and the labels gives the weights
to the labels accordingly.
0 for “Iris setosa”, 1 for “Iris versicolor” and 2 for “Iris virginica”.
The model may take a natural ordering of the labels (2>1>0) and give
more weight to one class over another when in fact these are just
labels with no specific ordering implied.
One-hot encoding
For categorical variables where no such ordinal relationship exists,
the integer encoding is not enough. One-hot encoding is preferred.
But one may ask, why not use standard normalization, that is, take
each logit and divide it by the sum of the all logits to get the
probabilities? Why take the exponents? Here are some two reasons.
# Standard normalization
def std_norm(input_vector):
p = [round(i/sum(input_vector),3) for i in input_vector]
return p