Professional Documents
Culture Documents
DL Unit 1
DL Unit 1
DL Unit 1
By
Dr Nisarg Gandhewar
Overview of Syllabus
Stochastic GD,
AdaGrad,
RMSProp,
Adam
Neural Networks
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Artificial Neural Network (ANN)
Its an information processing system inspired by the biological neural Network.
An artificial neuron network (neural network) is a computational model that mimics the way nerve
cells work in the human brain.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Artificial Neural Network (ANN)
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Biological Neuron vs Artificial Neuron
Mathematical Intuition
Artificial Neural Network (ANN)
•It is the simplest type of feedforward neural network, consisting of a single layer of input
nodes that are fully connected to a layer of output nodes.
•Single layer: SLP is the simplest type of artificial neural networks and can only classify linearly
separable cases with a binary target.
•Multilayer: Fully connected feed-forward ANN with at least three layers (input, output, and at least
one hidden layer).
Types of ANN
•Feedforward Network
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
•Feedforward Network
previously.
•It is represented by cyclic directed graph which
contains cycle.
•It produces output, copies that output and loops
it back into the network.
Forward propagation and Backward propagation
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Forward Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Forward Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Forward Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
Compute the new value of weight w5 using back propagation after one iteration in below
ANN. Use sigmoid activation function through out the network and learning rate 0.5.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Back Propagation Example
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Activation Function
•It’s a function which helps to bring non linearity in the neural network.
•It converts the linear input signals and models into non-linear output signals.
•It also helps to transform the output to specific range depending on type of function
used.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
•Sigmoid
•Relu
•Leaky ReLU
•Tanh
•Softmax https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Sigmoid
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Leaky ReLU (Rectified Linear Unit)
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Tanh
• Tanh function became preferred over the sigmoid function as it gave better performance
for multi-layer neural networks.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Softmax
• The softmax activation function transforms the raw outputs of the neural network into a vector of
probabilities, essentially a probability distribution over the input classes.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Loss function
• The Loss function is a method of evaluating how well your algorithm is modeling your dataset.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Regression Loss function
• The Loss function is a method of evaluating how well your algorithm is modeling your dataset.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
• The Loss function is a method of evaluating how well your algorithm is modeling your dataset.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Optimization
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Optimizers
•Optimizers help to know how to change weights and learning rate of neural network
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
2D View of optimizers
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
3D View of optimizers
Types of Optimizers
•AdaGrad,
•RMSProp,
•Adam
Gradient Descent (GD)
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Stochastic Gradient Descent (GD)
•To overcome some of the disadvantages of the GD algorithm, the SGD
algorithm comes into the picture as an extension of the Gradient Descent.
•So, In the SGD algorithm, we compute the derivative by taking one data
point at a time i.e, tries to update the model’s parameters more frequently.
• If the model has 10K records SGD will update the model parameters 10k
times.
Stochastic Gradient Descent (GD)
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Mini-Batch Gradient Descent (GD)
•It simply splits the training dataset into small batches and performs an
update for each of those batches.
•It can reduce the variance when the parameters are updated, and the
convergence is more stable. https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
SGD with Momentum/ Momentum Based GD
• Eg Grace in marks
SGD with Momentum/ Momentum Based GD
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Nesterov Accelerated Gradient (NAG)
•Gradient Descent and other conventional optimization techniques use a fixed learning rate
throughout the duration of training. However, this uniform learning rate might not be the best
option for all parameters.
•This approach enable the algorithm to efficiently traverse the optimization landscape by
adjusting the learning rate for each parameter based on their previous gradients.
•The intuition behind AdaGrad is can we use different Learning Rates for each and every
neuron for each and every hidden layer based on different iterations.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
• Algorithm that adjusts the learning rate for each parameter based on its prior gradients.
AdaGrad (Adaptive Gradient Descent)
•Convergence is faster and more reliable than simple SGD when the scaling of the weights is
unequal.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
RMS-Prop (Root Mean Square Propagation)
•Root Mean Squared Propagation, or RMSProp, is an extension of gradient descent and the
AdaGrad version of gradient descent that uses a decaying average of partial gradients in the
adaptation of the step size for each parameter.
•The use of a decaying moving average allows the algorithm to forget early gradients and focus on the
most recently observed partial gradients seen during the progress of the search, overcoming the
limitation of AdaGrad.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Adam (Adaptive Moment Estimation)
• Adam optimizer is one of the most popular and famous gradient descent optimization
algorithms.
•It is a method that computes adaptive learning rates for each parameter.
•Runs faster
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
• Adam’s primary goal is to stabilize the training process and help neural networks converge to
optimal solutions.
Adam: Mathematical Formulation
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Adam (Adaptive Moment Estimation)
• Adaptive Learning Rates: Adam adjusts the learning rates for each parameter individually
based on historical gradients.
• Efficient Memory Usage: Adam maintains a moving average of past gradients and squared
gradients for each parameter. This allows the algorithm to effectively use memory resources and
results in efficient parameter updates.
• Suitability for Large Datasets and High-Dimensional Spaces: Adam performs well on large
datasets and high-dimensional parameter spaces.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Selection of optimizer
•Selecting an appropriate optimizer in deep learning involves considering various factors, like
characteristics of your data, the architecture of your neural network, and the specific problem you are
trying to solve.
•For smaller datasets or more straightforward problems, traditional stochastic gradient descent (SGD) might be
sufficient. https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Memory Constraints:
Some optimizers may have more efficient memory usage than others. If you have memory constraints, consider
optimizers like Adam, which maintain moving averages but can be more memory-efficient than methods like
Adagrad.
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Text Books
•Deep Learning from Scratch ,Building with Python from First Principles, Seth
Weidman, O’Reilly
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph
Reference Books
•Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville MIT press
https://www.linkedin.com/pulse/how-does-machine-learning-work-rohit-jayale/
https://www.freepik.com/free-vector/isometric-people-working-with-technology_5083803.htm#query=DeFi&position=6&from_view=search&track=sph