Gradient Descent Based Optimization: Darshankumar Kapadiya (21MCEC02) Yagnesh M. Bhadiyadra (21MCEC11)

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 23

Gradient Descent Based

Optimization
Darshankumar Kapadiya (21MCEC02)
Yagnesh M. Bhadiyadra (21MCEC11)
Introduction

2
Introduction

● Most Common Optimization Algorithm in Machine Learning and Deep Learning


● First order iterative optimization, it takes into account only first derivative for
updating the parameters

Where,
J(Θ) is a cost function and
α is a learning rate 3
Introduction

● The idea is to take repeated steps in


the opposite direction of the gradient
of the function at the current point
because this is the direction of
steepest descent.
● Gradient descent can be used with
linear regression, logistic regression
and polynomial regression techniques.

Source: https://en.wikipedia.org/wiki/File:Gradient_Descent_in_2D.webm#filelinks 4
Related Work

5
Related Work: Application
● Optimizing User Interface Layouts via Gradient Descent:Peitong Duan, Casimir
Wierzynski, Lama Nachman(2020)
○ 2 main things to worry about when we talk about UI optimization
■ Efficient to navigate

Quantifies the user error during navigation/ error rate

■ Intuitive interface

Can be Quantified by completion time/ delay in user’s decision

New gradient descent based neural network outperforms old analytical models for UI
optimization used in Human computer interaction and ergonomics.
6
Related Work: Application
● For online games, the goal is usually to maximize user engagement, or how long users
spend playing the game.
● To scope our work, we focus on tuning the size and location of each element in the UI to
minimize task completion time and error rate.
● Model is trained on collected data samples collected through crowdsourcing.
● 2 parameters we are considering to apply Gradient descent on.
○ Size of the target
○ Location of the target

7
Related Work: Application
● A is the undo icon,
● B is the upload icon,
● C is the slider,
● D is the button group that controls which set of
stickers are displayed,
● E is the set of stickers (icon group type),
● F is the button group with the save (checkmark)
and cancel ("X") buttons.
The colored rectangles in the photo are not part of
the UI; they are the drop targets for Task Type 4 (drag
and drop).

Source : https://dl.acm.org/doi/10.1145/3313831.3376589 8
Related Work: Application
● We can test the efficiency of
optimization by interacting with the UI
same way we collected the data.
● Model can be also tested on Fitts’ Law,
which predicts how long it takes users
to point to a visual target as a function
● For which we’ll have to develop task
routine.

Source : https://dl.acm.org/doi/10.1145/3313831.3376589 9
Related Work : Parallelized Gradient Descent

● Asynchronous Parallel Stochastic Gradient Descent : Janis Keuper and Franz-Josef


Pfreundt (2015)
○ proposed a novel method to overcome shortcomings of MapReduce framework in
Parallel system.
○ MapReduce fails to provide good scalability to the system when it comes to batch
gradient descent, because it needs to read entire dataset at once, hence scalability
with respect to entire dataset is poor.
○ Stochastic GD also hard to parallelise due to its inherent sequential nature
○ They proposed an algorithm based on one-sided asynchronous mode of IPC.

10
Related Work : Improved AdaGrad Algorithm
●AdaGrad Algorithm : Adaptive Gradient Descent Algorithm
○It keeps track of the sum of gradient squared and uses that to adapt the gradient in
different directions unlike sum in case of normal Gradient Descent
○It is used for data with some sparse and some dense parameters. Sparse parameter
converge slowly than dense parameters, and thus we use this algorithm to change
learning rate dynamically based on convergence with respect to various parameters.
Sum_of_gradient_squared = previous_sum_of_gradient_squared + gradient^2
Delta = -learning_rate * gradient / sqrt(sum_of_gradient_squared)
Theta += Delta

11
Related Work : Improved AdaGrad Algorithm
● AdaGrad Algorithm : Adaptive Gradient Descent Algorithm
○ Cyan : Normal Gradient Descent, White: AdaGrad algorithm

Source: https://towardsdatascience.com/a-
visual-explanation-of-gradient-descent-methods-
momentum-adagrad-rmsprop-adam-
f898b102325c.
12
Related Work : Improved AdaGrad Algorithm
● An improved AdaGrad gradient descent optimization algorithm : N. Zhang, D. Lei,
and J.F. Zhao (2018)
○ AdaGrad introduces a new difficulty that at the end, the learning rate becomes
slow for multiple epochs/iterations
○ To overcome this difficulty, they have proposed a new AdaGrad algorithm in
which they take length of gradient instead of squared gradient descent in
original AdaGrad algorithm.

Source
https://www.researchgate.net/publication/33142
2180_An_Improved_Adagrad_Gradient_Desce
13
nt_Optimization_Algorithm
Related Work : Improved AdaGrad Algorithm

● They test their approach both on the Reuters dataset and the IMDB dataset. The result
shows that their approach has a more stable convergence process and can reduce
overfitting in multiple epoch.

14
Research Gap

15
Research Gap

● Most of the research articles talk about two things when it comes to gradient descent
based optimization:
○ Either they try to modify the existing gradient descent algorithm to improve
convergence rate or
○ They use gradient descent for a specific business application
● Very few articles talk about combining both the worlds together.
● There is inadequate research in the field of neural network based Automated UI
optimization to begin with.
● Existing work only talk about optimizing UI using simple gradient descent based
techniques.

16
Research Gap

● Method shortcomings
○ It only finds local minima, not the global one, and is used for convex problems
only
○ It is very slow when dataset is very large
○ If learning rate is not set properly, it can go in wrong direction and diverge
rather than converge
○ Frequent updates are computationally expensive because of using all resources
for processing one training sample at a time.

17
Problem Statement

18
“ Use Optimized AdaGrad
algorithm to Automate
optimization of User Interface
Layouts ”

19
References
1. YouTube, YouTube, 7 Oct. 2016, www.youtube.com/watch?v=43fqzaSH0CQ.
Accessed 13 Oct. 2021.
2. Keuper, Janis, and Franz-Josef Pfreundt. “Asynchronous Parallel Stochastic Gradient
Descent.” Proceedings of the Workshop on Machine Learning in High-Performance
Computing Environments, 2015, doi:10.1145/2834892.2834893.
3. “When to Use (and Not to Use) Asynchronous Programming: 20 Pros Reveal the
Best Use Cases.” Stackify, 31 Mar. 2021, stackify.com/when-to-use-asynchronous-
programming/.
4. “MPI Topic: One-Sided Communication.” Pages.tacc.utexas.edu,
pages.tacc.utexas.edu/~eijkhout/pcse/html/mpi-2onesided.html.

20
References
5. Jiang, Lili. “A Visual Explanation of Gradient Descent Methods (Momentum,
AdaGrad, RMSProp, Adam).” Medium, Towards Data Science, 21 Sept. 2020,
https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-
momentum-adagrad-rmsprop-adam-f898b102325c.
6. Zhang, N. & Lei, D. & Zhao, J.F.. (2018). An Improved Adagrad Gradient Descent
Optimization Algorithm. 2359-2362. 10.1109/CAC.2018.8623271.
7. Peitong Duan, Casimir Wierzynski, and Lama Nachman. 2020. Optimizing User
Interface Layouts via Gradient Descent. In Proceedings of the 2020 CHI Conference
on Human Factors in Computing Systems (CHI '20). Association for Computing
Machinery, New York, NY, USA, 1–12.
DOI:https://doi.org/10.1145/3313831.3376589
21
References
8. “Advantages and Disadvantages of Stochastic Gradient Descent.” Asquero,
https://www.asquero.com/article/advantages-and-disadvantages-of-stochastic-
gradient-descent/.

22
Thank you!

23

You might also like