Professional Documents
Culture Documents
Gradient Descent Based Optimization: Darshankumar Kapadiya (21MCEC02) Yagnesh M. Bhadiyadra (21MCEC11)
Gradient Descent Based Optimization: Darshankumar Kapadiya (21MCEC02) Yagnesh M. Bhadiyadra (21MCEC11)
Gradient Descent Based Optimization: Darshankumar Kapadiya (21MCEC02) Yagnesh M. Bhadiyadra (21MCEC11)
Optimization
Darshankumar Kapadiya (21MCEC02)
Yagnesh M. Bhadiyadra (21MCEC11)
Introduction
2
Introduction
Where,
J(Θ) is a cost function and
α is a learning rate 3
Introduction
Source: https://en.wikipedia.org/wiki/File:Gradient_Descent_in_2D.webm#filelinks 4
Related Work
5
Related Work: Application
● Optimizing User Interface Layouts via Gradient Descent:Peitong Duan, Casimir
Wierzynski, Lama Nachman(2020)
○ 2 main things to worry about when we talk about UI optimization
■ Efficient to navigate
■ Intuitive interface
New gradient descent based neural network outperforms old analytical models for UI
optimization used in Human computer interaction and ergonomics.
6
Related Work: Application
● For online games, the goal is usually to maximize user engagement, or how long users
spend playing the game.
● To scope our work, we focus on tuning the size and location of each element in the UI to
minimize task completion time and error rate.
● Model is trained on collected data samples collected through crowdsourcing.
● 2 parameters we are considering to apply Gradient descent on.
○ Size of the target
○ Location of the target
7
Related Work: Application
● A is the undo icon,
● B is the upload icon,
● C is the slider,
● D is the button group that controls which set of
stickers are displayed,
● E is the set of stickers (icon group type),
● F is the button group with the save (checkmark)
and cancel ("X") buttons.
The colored rectangles in the photo are not part of
the UI; they are the drop targets for Task Type 4 (drag
and drop).
Source : https://dl.acm.org/doi/10.1145/3313831.3376589 8
Related Work: Application
● We can test the efficiency of
optimization by interacting with the UI
same way we collected the data.
● Model can be also tested on Fitts’ Law,
which predicts how long it takes users
to point to a visual target as a function
● For which we’ll have to develop task
routine.
Source : https://dl.acm.org/doi/10.1145/3313831.3376589 9
Related Work : Parallelized Gradient Descent
10
Related Work : Improved AdaGrad Algorithm
●AdaGrad Algorithm : Adaptive Gradient Descent Algorithm
○It keeps track of the sum of gradient squared and uses that to adapt the gradient in
different directions unlike sum in case of normal Gradient Descent
○It is used for data with some sparse and some dense parameters. Sparse parameter
converge slowly than dense parameters, and thus we use this algorithm to change
learning rate dynamically based on convergence with respect to various parameters.
Sum_of_gradient_squared = previous_sum_of_gradient_squared + gradient^2
Delta = -learning_rate * gradient / sqrt(sum_of_gradient_squared)
Theta += Delta
11
Related Work : Improved AdaGrad Algorithm
● AdaGrad Algorithm : Adaptive Gradient Descent Algorithm
○ Cyan : Normal Gradient Descent, White: AdaGrad algorithm
Source: https://towardsdatascience.com/a-
visual-explanation-of-gradient-descent-methods-
momentum-adagrad-rmsprop-adam-
f898b102325c.
12
Related Work : Improved AdaGrad Algorithm
● An improved AdaGrad gradient descent optimization algorithm : N. Zhang, D. Lei,
and J.F. Zhao (2018)
○ AdaGrad introduces a new difficulty that at the end, the learning rate becomes
slow for multiple epochs/iterations
○ To overcome this difficulty, they have proposed a new AdaGrad algorithm in
which they take length of gradient instead of squared gradient descent in
original AdaGrad algorithm.
Source
https://www.researchgate.net/publication/33142
2180_An_Improved_Adagrad_Gradient_Desce
13
nt_Optimization_Algorithm
Related Work : Improved AdaGrad Algorithm
● They test their approach both on the Reuters dataset and the IMDB dataset. The result
shows that their approach has a more stable convergence process and can reduce
overfitting in multiple epoch.
14
Research Gap
15
Research Gap
● Most of the research articles talk about two things when it comes to gradient descent
based optimization:
○ Either they try to modify the existing gradient descent algorithm to improve
convergence rate or
○ They use gradient descent for a specific business application
● Very few articles talk about combining both the worlds together.
● There is inadequate research in the field of neural network based Automated UI
optimization to begin with.
● Existing work only talk about optimizing UI using simple gradient descent based
techniques.
16
Research Gap
● Method shortcomings
○ It only finds local minima, not the global one, and is used for convex problems
only
○ It is very slow when dataset is very large
○ If learning rate is not set properly, it can go in wrong direction and diverge
rather than converge
○ Frequent updates are computationally expensive because of using all resources
for processing one training sample at a time.
17
Problem Statement
18
“ Use Optimized AdaGrad
algorithm to Automate
optimization of User Interface
Layouts ”
19
References
1. YouTube, YouTube, 7 Oct. 2016, www.youtube.com/watch?v=43fqzaSH0CQ.
Accessed 13 Oct. 2021.
2. Keuper, Janis, and Franz-Josef Pfreundt. “Asynchronous Parallel Stochastic Gradient
Descent.” Proceedings of the Workshop on Machine Learning in High-Performance
Computing Environments, 2015, doi:10.1145/2834892.2834893.
3. “When to Use (and Not to Use) Asynchronous Programming: 20 Pros Reveal the
Best Use Cases.” Stackify, 31 Mar. 2021, stackify.com/when-to-use-asynchronous-
programming/.
4. “MPI Topic: One-Sided Communication.” Pages.tacc.utexas.edu,
pages.tacc.utexas.edu/~eijkhout/pcse/html/mpi-2onesided.html.
20
References
5. Jiang, Lili. “A Visual Explanation of Gradient Descent Methods (Momentum,
AdaGrad, RMSProp, Adam).” Medium, Towards Data Science, 21 Sept. 2020,
https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-
momentum-adagrad-rmsprop-adam-f898b102325c.
6. Zhang, N. & Lei, D. & Zhao, J.F.. (2018). An Improved Adagrad Gradient Descent
Optimization Algorithm. 2359-2362. 10.1109/CAC.2018.8623271.
7. Peitong Duan, Casimir Wierzynski, and Lama Nachman. 2020. Optimizing User
Interface Layouts via Gradient Descent. In Proceedings of the 2020 CHI Conference
on Human Factors in Computing Systems (CHI '20). Association for Computing
Machinery, New York, NY, USA, 1–12.
DOI:https://doi.org/10.1145/3313831.3376589
21
References
8. “Advantages and Disadvantages of Stochastic Gradient Descent.” Asquero,
https://www.asquero.com/article/advantages-and-disadvantages-of-stochastic-
gradient-descent/.
22
Thank you!
23