Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

How to Choose the Best Learning Rate

for Neural Network (Beginner Approach)

source

In this article, before starting in the tuning parameter topic, I’m going
to show you the artificial neural network. Why? Because that is
important for me to start the concept first. Neural Network is the
branch of artificial intelligence that is quite broad and is closely related
to other disciplines. In just enough detail that we will ready to see how
Neural Network is the best choice with its applications because their
concept is crucial for bringing them to life, I think it’s safe to assume
that everyone reading this article at least have heard of Neural Network
and you’re probably also aware that they’ve turned out to be an
extremely powerful tool when applied to a wide variety of important
case problem in the world like text translation, image recognition, etc.

Overview of Neural Network

The architecture of the Neural Network drawn below:

Internal
The architecture of the Neural Network

As a college student with a Mathematics major, the Neural Network


concept is the best choice because this concept familiar with me when
calculus and some vector-matrix notation and operations are applied
inside it. Because fundamentally, a neural network is a just
mathematical function that takes a variable in and gives another
variable back where both of these variables could be vectors. We can
see the illustration below:

Internal
Both of these variable become

The mathematical treatment has been kept at a minimal level,


consistent with the primary aims of clarity and correctness.
Derivations, theorems, and proofs are included when they serve to
illustrate the important features of a particular neural network. For
example, the mathematical derivation of the backpropagation training
algorithm makes clear the correct order of the operations.

From these diagrams, a typical multilayer net because a multilayer net


has more than one layer connection. That every connection contains
weights that fixed with an iterative training process. Typically, there is
a layer of weights between Input and Output called the Hidden Unit as
Hidden Layer. Weight in every connection can be seen below:

Internal
Hyperparameter Subject

Neural Network needs some Hyperparameter, one of them is Learning


Rate (LR) (have value 0–1) that namely with Gradient Descent (GD)
which has a function as Optimizer in a neural network net. If you do
not know what is Hyperparameter? I will tell first, so Hyperparameters
are the variables that determine the network structure, in my
experience, two kinds are Number of Hidden layer and Learning Rate
which the variables which determine how the network is trained.
Hyperparameters are set before training (before optimizing the
weights and bias).

Internal
Generally, the type of GD is a derivative of the function itself.
The role of the learning rate in the neural net controls the rate or speed
at which the model learns. Specifically, Tuning Parameter includes
Learning Rate items that will control the amount of apportioned error
that the weights of the model are updated with each time they are
updated, such as at the end of each batch of training examples. The
example is given below:

example of function

Z symbolizes an equation of function, then W is weight in every


neuron. Example Z = w² +1 will be minimized the most optimal, so we
can initialize the value of w = 1. So, the value of Z will be minimized
when Z = 1.

First, we must know the formula for update weight in every neuron
below:

update weight

Gradient Descent includes Decay for a system that updates the


learning rate every epoch. We must note there’re updates in this
context means update Learning Rate not weight. If we’re not using
Decay, the learning rate will be constant from the first epoch until last.
If we’re using Decay, the formula will be written below:

Internal
update learning rate

Learning rate old or learning rate which initialized in first epoch


usually has value 0.1 or 0.01, while Decay is a parameter which has
value is greater than 0, in every epoch will be initialized 1–E1, 1-E2, 1-
E3, 1-E4. We must know that the selected value of the learning rate
must be careful because that function Z must be minimized the most
optimal, Learning rate that is not suitable will make a training
divergence. In mathematic, we know that Convergence, property
(exhibited by certain infinite series and functions) of approaching a
limit more and more closely as an argument (variable) of the function
increases or decreases or as the number of terms of the series
increases. In three case learning rate drawn below:

graph of a function with a suitable learning rate

Internal
graph of a function with a learning rate that is too small

graph of a function with a learning rate that is too large

Final Thought

Based on three graphs above, with suitable learning rate in range with
decay can make graph convergence (how fast they reach the problem

Internal
solved). The learning rate defines how quickly a network updates its
parameters. In conclusion, you must make many experiments to know
how your model improves. Too small learning rate slows down the
learning process but converges smoothly. Too Large learning rate
speeds up the learning but may not converge. I preferred using a
decaying Learning rate which updates the value of the learning rate
better in every epoch.

Next, I’ll be discussing the improved architecture of neural networks


like convolutional neural networks that relate to my Bachelor Degree
Final Project. Thank you.

Resources

Laurene, Fausett. “Fundamental of Neural Network” 12.257 (1994): 1–


35.

Yaldi, Gusri. “Improving the Neural Network Testing Performance for


Trip Distribution Modelling by Transforming Normalized Data
Nonlinearly”. IJASEIT 208.5334 (2017): 7.

Vasudevan, Shrihari. “Mutual Information Based Learning Rate Decay


for Stochastic Gradient Descent Training of Deep Neural Networks”.
MDPI Entropy 2020, 22(5), 560.

Internal

You might also like