How To Choose The Best Learning Rate For Neural Network

How to Choose the Best Learning Rate
for Neural Network (Beginner Approach)
source
In this article, before starting in the tuning parameter topic, I’m going
to show you the artificial neural network. Why? Because that is
important for me to start the concept first. Neural Network is the
branch of artificial intelligence that is quite broad and is closely related
to other disciplines. In just enough detail that we will ready to see how
Neural Network is the best choice with its applications because their
concept is crucial for bringing them to life, I think it’s safe to assume
that everyone reading this article at least have heard of Neural Network
and you’re probably also aware that they’ve turned out to be an
extremely powerful tool when applied to a wide variety of important
case problem in the world like text translation, image recognition, etc.
Overview of Neural Network
The architecture of the Neural Network drawn below:
Internal
The architecture of the Neural Network
As a college student with a Mathematics major, the Neural Network

concept is the best choice because this concept familiar with me when
calculus and some vector-matrix notation and operations are applied
inside it. Because fundamentally, a neural network is a just
mathematical function that takes a variable in and gives another
variable back where both of these variables could be vectors. We can
see the illustration below:
Internal
Both of these variable become
The mathematical treatment has been kept at a minimal level,

consistent with the primary aims of clarity and correctness.
Derivations, theorems, and proofs are included when they serve to
illustrate the important features of a particular neural network. For
example, the mathematical derivation of the backpropagation training
algorithm makes clear the correct order of the operations.
From these diagrams, a typical multilayer net because a multilayer net

has more than one layer connection. That every connection contains
weights that fixed with an iterative training process. Typically, there is
a layer of weights between Input and Output called the Hidden Unit as
Hidden Layer. Weight in every connection can be seen below:
Internal
Hyperparameter Subject
Neural Network needs some Hyperparameter, one of them is Learning

Rate (LR) (have value 0–1) that namely with Gradient Descent (GD)
which has a function as Optimizer in a neural network net. If you do
not know what is Hyperparameter? I will tell first, so Hyperparameters
are the variables that determine the network structure, in my
experience, two kinds are Number of Hidden layer and Learning Rate
which the variables which determine how the network is trained.
Hyperparameters are set before training (before optimizing the
weights and bias).
Internal
Generally, the type of GD is a derivative of the function itself.
The role of the learning rate in the neural net controls the rate or speed
at which the model learns. Specifically, Tuning Parameter includes
Learning Rate items that will control the amount of apportioned error
that the weights of the model are updated with each time they are
updated, such as at the end of each batch of training examples. The
example is given below:
example of function
Z symbolizes an equation of function, then W is weight in every

neuron. Example Z = w² +1 will be minimized the most optimal, so we
can initialize the value of w = 1. So, the value of Z will be minimized
when Z = 1.
First, we must know the formula for update weight in every neuron
below:
update weight
Gradient Descent includes Decay for a system that updates the

learning rate every epoch. We must note there’re updates in this
context means update Learning Rate not weight. If we’re not using
Decay, the learning rate will be constant from the first epoch until last.
If we’re using Decay, the formula will be written below:
Internal
update learning rate
Learning rate old or learning rate which initialized in first epoch

usually has value 0.1 or 0.01, while Decay is a parameter which has
value is greater than 0, in every epoch will be initialized 1–E1, 1-E2, 1-
E3, 1-E4. We must know that the selected value of the learning rate
must be careful because that function Z must be minimized the most
optimal, Learning rate that is not suitable will make a training
divergence. In mathematic, we know that Convergence, property
(exhibited by certain infinite series and functions) of approaching a
limit more and more closely as an argument (variable) of the function
increases or decreases or as the number of terms of the series
increases. In three case learning rate drawn below:
graph of a function with a suitable learning rate
Internal
graph of a function with a learning rate that is too small
graph of a function with a learning rate that is too large
Final Thought
Based on three graphs above, with suitable learning rate in range with
decay can make graph convergence (how fast they reach the problem
Internal
solved). The learning rate defines how quickly a network updates its
parameters. In conclusion, you must make many experiments to know
how your model improves. Too small learning rate slows down the
learning process but converges smoothly. Too Large learning rate
speeds up the learning but may not converge. I preferred using a
decaying Learning rate which updates the value of the learning rate
better in every epoch.
Next, I’ll be discussing the improved architecture of neural networks

like convolutional neural networks that relate to my Bachelor Degree
Final Project. Thank you.
Resources
Laurene, Fausett. “Fundamental of Neural Network” 12.257 (1994): 1–

35.
Yaldi, Gusri. “Improving the Neural Network Testing Performance for

Trip Distribution Modelling by Transforming Normalized Data
Nonlinearly”. IJASEIT 208.5334 (2017): 7.
Vasudevan, Shrihari. “Mutual Information Based Learning Rate Decay

for Stochastic Gradient Descent Training of Deep Neural Networks”.
MDPI Entropy 2020, 22(5), 560.
Internal

How To Choose The Best Learning Rate For Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

How To Choose The Best Learning Rate For Neural Network

Uploaded by

Copyright:

Available Formats

How to Choose the Best Learning Rate

for Neural Network (Beginner Approach)

Overview of Neural Network

The architecture of the Neural Network drawn below:

As a college student with a Mathematics major, the Neural Network

The mathematical treatment has been kept at a minimal level,

From these diagrams, a typical multilayer net because a multilayer net

Neural Network needs some Hyperparameter, one of them is Learning

Z symbolizes an equation of function, then W is weight in every

Gradient Descent includes Decay for a system that updates the

Learning rate old or learning rate which initialized in first epoch

graph of a function with a suitable learning rate

graph of a function with a learning rate that is too large

Next, I’ll be discussing the improved architecture of neural networks

Laurene, Fausett. “Fundamental of Neural Network” 12.257 (1994): 1–

Yaldi, Gusri. “Improving the Neural Network Testing Performance for

Vasudevan, Shrihari. “Mutual Information Based Learning Rate Decay

You might also like