Index

Finding Local Minima Efficiently in
Decentralized Optimization
(PEDESTAL ALGO)
Badarla Rohan Naidu Gopendra

Singh (2022101072) (2022101
003)
Outline
1.Introduction 2
.Related w orks
3.Algorithms
4.Experim
ents
5.Theorems
6. Conclusion
Motivation
• Decentralized optimization is a distributed optimization approach where models are
trained in parallel across multiple worker nodes over a decentralized communication
network.
• It has advantages over centralized approaches, such as reducing communication costs

and avoiding issues with network latency.
• Achieving second-order optimality (escaping saddle points and finding local minima) is
an important problem in nonconvex optimization.
• Two primary gradient-based methods for second-order optimality are perturbed

gradient descent and negative curvature descent.
• The study of second-order optimality for decentralized stochastic optimization

algorithms is still limited.
• There is a need to propose a stochastic algorithm to achieve second-order optimality

for decentralized problems, especially for large machine learning models.
Introduction
Es caping s addle points and locating local minima in
nonconvex optimization.
Proposed a novel algorithm PEDESTAL, which is the first

decentralized stochastic gradient-based algorithm to achieve
second-order optimality with non-asymptotic analysis.
Related works
Decentralized First-Order: Covers D-PSGD and variants for non-IID data. D-
GET, D-SPIDER-SFO, GT-HSGD achieve O(ε^-3) complexity using variance
reduction like efficient STORM estimator.
Centralized Second-Order: Discusses perturbed/negative curvature

methods like PGD's O(ε^-8), CNC-SGD's O~(ε^-5) complexity under
assumptions. SSRGD, Pullback improve to O~(ε^-3.5), O~(ε^-3) via variance
reduction.
SGD Limitations: Authors argue SGD's saddle escape under assumptions

differs from their general local minima goal. Experiments show SGD,
variance reduced methods may struggle escaping saddles efficiently,
necessitating second-order stationarity study for variance reduced
algorithms.
Related Works
short title |
X Y Z Pa g e 4/21
Algorithm
1. Overview
1. PEDESTAL algorithm for decentralized communication networks.
2. Nodes connected by weight matrix W.
3. Initial parameters x0 on all nodes.
4. xt: model parameter, vt: gradient estimator, yt: gradient tracker.
5. zt: temporary model parameter awaiting communication.
2. Algorithm Basics
1. esc(i): counts iterations in current escaping phase on i-th node.
2. First iteration: gradient estimator based on batch size b0.
3. Later iterations: vt calculated by mini-batch
3. Phase Transition
1. If ∥yt∥ < Cv, switch to escaping phase.
2. Draw perturbation ξ from B0(r), update zt = xt + ξ, set esc(i) to 0.
3. Anchor x̃ (i) = xt saved to monitor escaping phase.
Algorithm
• Escaping Phase Manage m e nt
• I nc r ement es c( i) until ∥x t - x̃ ( i)∥ > Cd, br eaking es caping
pha s e.
• I f not i n es caping phase, update zt = x t - ηyt.
• Te r m inat ion Condition
• I f es c (i) > CT for a ny node, x̄ t −CT is a c andidate s tationar y
poi nt.
• Ter mi na te when a t least 1 0 nodes s atisfy es c(i) ≥ CT.
• De ce nt r alized and Adaptive
• Pha s es deter mined by individual node s tatus.
• No c oor dination protocol r equired.
• Ada pti ve and independent oper ation.
• Conclusion
• PEDESTAL ena bl es dec entralized optimization.
• Ada pti ve phase tr a nsition based on local c onditions.
• Effec ti ve ter mination c riterion for c onvergence.
Algorithm
Algorithm
•Negative Curvature Descent:
•Neon & Neon2: Nested loops, inefficient in decentralized settings.
•Partial node participation compromises computation accuracy.
•Stepsize Consistency:
•PEDESTAL: Maintains consistent stepsizes for uniform convergence.
•Avoids normalization issues and ensures uniformity across nodes.
•PEDESTAL-S:
•Offers small batch version, reducing gradient complexity.
•Achieves second-order optimality with lower batch sizes.
•Termination Criteria:
•PEDESTAL terminates when a fraction of nodes remain in escaping phase.
•No bound on maximum escaping iterations due to asynchronous transitions.
•Struck Region Analysis:
•Adapts "small stuck region" lemma for decentralized settings.
•Utilizes total moving distance and considers consensus error.
•Overall:
•PEDESTAL optimizes saddle point detection for decentralized optimization.
Experiments
• Matrix Factorization
• Matrix Sensing
Matrix Factorization
So, the optimization problem can be written as :

•Objective: Approxi mate matrix 𝑀M by a l ow-rank matrix 𝑈𝑉𝑇UVT, where

𝑈∈𝑅𝑁×𝑟U∈RN×r a nd 𝑉∈𝑅𝑙×𝑟V∈Rl×r.
•Optimization Problem:
•Formulated as minimizing Frobenius norm ∥𝑀−𝑈𝑉𝑇∥𝐹∥M−UVT∥F
•Sol ve on Movi eLens-100k dataset, predicting users' ratings for
movi es.
•Dataset:
•Movi eLens-100k: 100,000 ra tings of 1,682 movies by 943 us ers
•Ra tings scaled to [0,1][0,1]
•Experimental Setup:
•𝑛=50n=50 worker nodes
•Ra ndom and Dirichlet distribution for user assignment
•Ri ng, toroidal, a nd undirected exponential graph for communication
•Baselines:
•D-PSGD, GTDSGD, D-GET, D-SPIDER-SFO, GTHSGD
•Results:
•PEDESTAL a chieves best performance, escaping saddle points and
fi nding second-order stationary points effectively.
•Ba s elines s truggle to escape saddle points efficiently.
•Va ri ance reduction methods s how worse performance than SGD-
ba s ed algorithms, indicating tra de-off between gradient noise
reducti on a nd saddle point escape capability.
•Conclusion:
•PEDESTAL's compatibility between fast convergence and saddle point
a voi dance is a significant contribution.
•Future Directions:
•Further exploration of PEDESTAL algorithm i n different matrix
fa ctori zation tasks and datasets.
CODE
Performance Comparison
Performance Comparison
Matrix sensing
• Objective: Solve a decentralized matrix sensing problem to recover a low-rank
symmetric matrix 𝑀∗=𝑈∗(𝑈∗)𝑇M∗=U∗(U∗)T, where 𝑈∗∈𝑅𝑑×𝑟U∗∈Rd×r and
𝑈∗U∗ has small rank 𝑟r.
• Experimental Setup:
• Number of worker nodes: 𝑛=20n=20
• Synthetic dataset with sensing matrices {𝐴𝑖}𝑖=1𝑁{Ai }i=1N and
observations 𝑏𝑖=⟨𝐴𝑖,𝑀⟩bi =⟨Ai ,M⟩
• Decentralized optimization problem formulation using loss function
𝐿𝑖(𝑈)Li (U)
• Matrix dimensions: 𝑑=50d=50 or 𝑑=100d=100, 𝑟=3r=3
• Ground truth low-rank matrix 𝑀∗=𝑈∗(𝑈∗)𝑇M∗=U∗(U∗)T with entries
from Gaussian distribution 𝑁(0,1/𝑑)N(0,1/d)
• Data Distribution:
• Random distribution and Dirichlet distribution Dir20(0.3)Dir20 (0.3)
for assigning data to worker nodes
• Network Topology:
• Ring topology, toroidal topology, and undirected exponential graph.
Matrix sensing
•Initialization:
•Initial value of 𝑈U set to [𝑢0,0,0][u0 ,0,0], where 𝑢0u0 from Gaussian
distribution, scaled such that ∥𝑢0∥≤max eig(𝑀∗)∥u0 ∥≤max eig(M∗)
•Algorithms Compared:
•PEDESTAL algorithm vs. decentralized baselines including D-PSGD, GTDSGD,
D-GET, D-SPIDER-SFO, and GTHSGD
•Results:
•PEDESTAL effectively reaches and escapes saddle points, finding the local
minimum
•Baselines remain stuck at saddle points
•Smallest eigenvalue of Hessian matrix at converged optimal point
significantly closer to 0 for PEDESTAL compared to baselines
•Conclusion:
•PEDESTAL demonstrates superior performance in escaping saddle points
and finding local minimum in decentralized matrix sensing problem.
•Future Directions:
•Further exploration of PEDESTAL algorithm in different optimization tasks
and real-world scenarios.
CODE
P er f or ma n c e C omp ar is o n
P er f or ma n c e C omp ar is o n
Assumptions
Convergence theorems
Proof
Proof is explained in detail in the paper w ith
some lemmas
Conclusion
Thi s pa per pr opos es a novel a lgor ithm PEDESTAL to fi nd l oc a l minima

i n nonc onvex dec entr a l i zed opti miza tion. PEDESTAL i s the fi r s t
dec entr a l i zed s toc has tic a lgor ithm to a c hi eve s ec ond-or der
opti ma l ity wi th non-a s ymptoti c a nalysis. I t i mpr ove the dr a wba c ks in
pr evi ous deter mi ni sti c c ounter pa rt to ma ke pha s e c ha nged
i ndependentl y on ea c h node a nd a voi d c ons ensus pr otoc ols of
br oa dc a st or a ggr ega ti on. I t pr oves tha t PEDESTAL c a n a c hi eve O (ϵ ,
ϵ )-s ec ond -or der s ta tionary poi nt wi th the gr a di ent c ompl ex i ty of
Õ (ϵ −3 ), whi c h ma tc hes s ta te-of-the-a r t r es ul ts of c entr a l ized
c ounter pa r t or dec entr a l i zed method to fi nd fi r s t-or der s ta tionar y
poi nt. I t a l s o c onduc t the ma tr i x s ens ing a nd ma tr ix fa ctor izati on
ta s ks i n thei r ex per i ments to va l idate the per for ma nc e of PEDESTAL.
Thank you

Index

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Index

Uploaded by

Copyright:

Available Formats

Finding Local Minima Efficiently in

Badarla Rohan Naidu Gopendra

• It has advantages over centralized approaches, such as reducing communication costs

• Two primary gradient-based methods for second-order optimality are perturbed

• The study of second-order optimality for decentralized stochastic optimization

• There is a need to propose a stochastic algorithm to achieve second-order optimality

Proposed a novel algorithm PEDESTAL, which is the first

Centralized Second-Order: Discusses perturbed/negative curvature

SGD Limitations: Authors argue SGD's saddle escape under assumptions

So, the optimization problem can be written as :

•Objective: Approxi mate matrix 𝑀M by a l ow-rank matrix 𝑈𝑉𝑇UVT, where

Thi s pa per pr opos es a novel a lgor ithm PEDESTAL to fi nd l oc a l minima

You might also like