Professional Documents
Culture Documents
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
Himanshu S
211 Followers About Follow
Let’s start with the basics of Neurons and Neural Network and What is an Activation
Function and Why we would need it :
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 1/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
First proposed in 1944 by Warren McCullough and Walter Pitts, Neural Networks are the
techniques powering the best speech recognizers and translators on our smartphones,
through something called “deep learning” which employs several layers of neural nets.
Neural Nets are modeled loosely based on the human brain, where there are thousands
or even millions of nodes that are densely connected with each other. Just like how the
brain “fires” up neurons, in Artificial Neural Networks (ANN) an Artificial Neuron is
fired up by sending a signal from the incoming node multiplies by some weight, this
node can be visualized as something that is holding a number which comes from the
ending branches (Synapses) supplied at that Neuron, what happens is for a Layer of
Neural Network (NN) we multiply the input to the Neuron with the weight held by that
synapse and sum all of those up to get our output.
1 class Neuron(object):
2 # ...
3 def forward(self, inputs):
4 """ assume inputs and weights are 1-D numpy arrays and bias is a number """
5 cell_body_sum = np.sum(inputs * self.weights) + self.bias
6 firing_rate = 1.0 / (1.0 + math.exp(-cell_body_sum)) # sigmoid activation function
7 return firing_rate
Neuron.py
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 2/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
For example (see D in above figure), if the weights are w1, w2, w3 …. wN and inputs
being i1, i2, i3 …. iN we get a summation of : w1*i1 + w2*i2 + w3*i3 …. wN*iN
For several layers of Neural Networks and Connections we can have varied values of wX
and iX and the summation S which varies according to whether the particular Neuron is
activated or not, so to normalize this and prevent drastically different range of values,
we use what is called a Activation Function for Neural networks that turns these values
into something equivalent between 0,1 or -1,1 to make the whole process statistically
balanced. This process is not just to preserve sanity of the code but also to reduce
complexity and computing power required which would be more difficult on inactivated
inputs.
Introduction >
Activation functions that are commonly used based on few desirable properties like :
Range — When the range of the activation function is finite, gradient-based training
methods tend to be more stable, because pattern presentations significantly affect
only limited weights. When the range is infinite, training is generally more efficient
because pattern presentations significantly affect most of the weights. In the latter
case, smaller learning rates are typically necessary.
Approximates identity near the origin — When activation functions have this
property, the neural network will learn efficiently when its weights are initialized
with small random values. When the activation function does not approximate
identity near the origin, special care must be used when initializing the weights.
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 4/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
(There are some linear and simple functions such as The Binary Step Function or Linear
Function f = ax but since they are not widely used and are undesirable as activation
functions, we won’t be discussing them.)
Sigmoid Function
Although sigmoid function and it’s derivative is simple and helps in reducing time
required for making models, there is a major drawback of info loss due to the derivative
having a short range.
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 6/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
So the more there are layers in our Neural Network (or the deeper our Neural Network
is) the more our information gets compressed and lost at each layer and this amplifies at
each step and causes major data loss overall. Vanishing and Exploding gradient problem
is present, with sigmoid functions since it is positive in output, all our output neurons
have a positive output too which is not ideal. Not being centered at 0 makes our sigmoid
function not a good choice to run at the early layers, although in the last layer sigmoid
function can be used.
Besides the logistic function, sigmoid functions include the ordinary arctangent, the
hyperbolic tangent, the Gudermannian function, and the error function, but also the
generalised logistic function and algebraic functions
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 7/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
tanh(x)
This does not however mean that tanh is devoid of the vanishing or exploding gradient
problem, it persists even in the case of tanh but unlike Sigmoid as it is centered at Zero,
it is more optimal than Sigmoid Function. Therefore other functions are employed more
often which we will see below for machine learning.
The rectifier is, as of 2018, the most popular activation function for deep neural networks.
Most Deep Learning applications right now make use of ReLU instead of Logistic
Activation functions for Computer Vision, Speech Recognition, Natural Language
Processing and Deep Neural Networks etc. ReLU also has a manifold convergence rate
on application when compared to tanh or sigmoid functions.
Some of the ReLU variants include : Softplus (SmoothReLU), Noisy ReLU, Leaky ReLU,
Parametric ReLU and ExponentialReLU (ELU). Some of which we will discuss below.
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 8/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
ReLU
ReLU : A Rectified Linear Unit (A unit employing the rectifier is also called a rectified
linear unit ReLU) has output 0 if the input is less than 0, and raw output otherwise. That
is, if the input is greater than 0, the output is equal to the input. The operation of ReLU is
closer to the way our biological neurons work.
ReLU f(x)
ReLU is non-linear and has the advantage of not having any backpropagation errors
unlike the sigmoid function, also for larger Neural Networks, the speed of building
models based off on ReLU is very fast opposed to using Sigmoids :
Sparse activation: For example, in a randomly initialized network, only about 50% of
hidden units are activated (having a non-zero output).
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 9/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
ReLUs aren’t without any drawbacks some of them are that ReLU is Non Zero centered
and is non differentiable at Zero, but differentiable anywhere else.
One of the conditions on ReLU is the usage, it can only be used in hidden layers and not
elsewhere. This is due to the limitation mentioned below
Sigmoid Vs ReLU
Another problem we see in ReLU is the Dying ReLU problem where some ReLU Neurons
essentially die for all inputs and remain inactive no matter what input is supplied, here
no gradient flows and if large number of dead neurons are there in a Neural Network
it’s performance is affected, this can be corrected by making use of what is called Leaky
ReLU where slope is changed left of x=0 in above figure and thus causing a leak and
extending the range of ReLU.
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 10/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
With Leaky ReLU there is a small negative slope, so instead of not firing at all for large
gradients, our neurons do output some value and that makes our layer much more
optimized too.
Using weights and biases, we tune the parameter that is learned by employing
backpropagation across multiple layers .
PReLU
for a ≤ 1
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 11/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
Exponential Linear Units are are used to speed up the deep learning process, this is done
by making the mean activations closer to Zero, here an alpha constant is used which
must be a positive number.
ELU have been shown to produce more accurate results than ReLU and also converge
faster. ELU and ReLU are same for positive inputs, but for negative inputs ELU smoothes
(to -alpha) slowly whereas ReLU smooths sharply.
are allowed but they are capped this greatly improves accuracy. It follows: f(x) = x for
x > theta , f(x) = 0 otherwise, where theta is a float >= 0 (Threshold location of
activation).
The softmax function is often used in the final layer of a neural network-based classifier.
Such networks are commonly trained under a log loss (or cross-entropy) regime, giving a
non-linear variant of multinomial logistic regression.
Softmax Graphed
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 13/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
Softmax Function
[1] : https://en.wikipedia.org/wiki/Artificial_neural_network
[2] : https://en.wikipedia.org/wiki/Activation_function
[3] : https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
[4] : http://cs231n.github.io/neural-networks-1/
[5] : https://en.wikipedia.org/wiki/Softmax_function
[6] : https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-
Softmax-Activation-Functions
[7] : https://www.kaggle.com/dansbecker/rectified-linear-units-relu-in-deep-learning
[8] : http://dataaspirant.com/2017/03/07/difference-between-softmax-function-and-
sigmoid-function/
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 14/15
7/8/2021 Activation Functions : Sigmoid, tanh, ReLU, Leaky ReLU, PReLU, ELU, Threshold ReLU and Softmax basics for Neural Networks and Deep Learnin…
https://himanshuxd.medium.com/activation-functions-sigmoid-relu-leaky-relu-and-softmax-basics-for-neural-networks-and-deep-8d9c70eed91e 15/15