Professional Documents
Culture Documents
Index
Index
Decentralized Optimization
(PEDESTAL ALGO)
6. Conclusion
Motivation
• Decentralized optimization is a distributed optimization approach where models are
trained in parallel across multiple worker nodes over a decentralized communication
network.
• Achieving second-order optimality (escaping saddle points and finding local minima) is
an important problem in nonconvex optimization.
short title |
X Y Z Pa g e 4/21
Algorithm
1. Overview
1. PEDESTAL algorithm for decentralized communication networks.
2. Nodes connected by weight matrix W.
3. Initial parameters x0 on all nodes.
4. xt: model parameter, vt: gradient estimator, yt: gradient tracker.
5. zt: temporary model parameter awaiting communication.
2. Algorithm Basics
1. esc(i): counts iterations in current escaping phase on i-th node.
2. First iteration: gradient estimator based on batch size b0.
3. Later iterations: vt calculated by mini-batch
3. Phase Transition
1. If ∥yt∥ < Cv, switch to escaping phase.
2. Draw perturbation ξ from B0(r), update zt = xt + ξ, set esc(i) to 0.
3. Anchor x̃ (i) = xt saved to monitor escaping phase.
Algorithm
• Escaping Phase Manage m e nt
• I nc r ement es c( i) until ∥x t - x̃ ( i)∥ > Cd, br eaking es caping
pha s e.
• I f not i n es caping phase, update zt = x t - ηyt.
• Te r m inat ion Condition
• I f es c (i) > CT for a ny node, x̄ t −CT is a c andidate s tationar y
poi nt.
• Ter mi na te when a t least 1 0 nodes s atisfy es c(i) ≥ CT.
• De ce nt r alized and Adaptive
• Pha s es deter mined by individual node s tatus.
• No c oor dination protocol r equired.
• Ada pti ve and independent oper ation.
• Conclusion
• PEDESTAL ena bl es dec entralized optimization.
• Ada pti ve phase tr a nsition based on local c onditions.
• Effec ti ve ter mination c riterion for c onvergence.
Algorithm
Algorithm
•Negative Curvature Descent:
•Neon & Neon2: Nested loops, inefficient in decentralized settings.
•Partial node participation compromises computation accuracy.
•Stepsize Consistency:
•PEDESTAL: Maintains consistent stepsizes for uniform convergence.
•Avoids normalization issues and ensures uniformity across nodes.
•PEDESTAL-S:
•Offers small batch version, reducing gradient complexity.
•Achieves second-order optimality with lower batch sizes.
•Termination Criteria:
•PEDESTAL terminates when a fraction of nodes remain in escaping phase.
•No bound on maximum escaping iterations due to asynchronous transitions.
•Struck Region Analysis:
•Adapts "small stuck region" lemma for decentralized settings.
•Utilizes total moving distance and considers consensus error.
•Overall:
•PEDESTAL optimizes saddle point detection for decentralized optimization.
Experiments
• Matrix Factorization
• Matrix Sensing
Matrix Factorization
•Results:
•PEDESTAL a chieves best performance, escaping saddle points and
fi nding second-order stationary points effectively.
•Ba s elines s truggle to escape saddle points efficiently.
•Va ri ance reduction methods s how worse performance than SGD-
ba s ed algorithms, indicating tra de-off between gradient noise
reducti on a nd saddle point escape capability.
•Conclusion:
•PEDESTAL's compatibility between fast convergence and saddle point
a voi dance is a significant contribution.
•Future Directions:
•Further exploration of PEDESTAL algorithm i n different matrix
fa ctori zation tasks and datasets.
CODE
Performance Comparison
Performance Comparison
Matrix sensing
• Objective: Solve a decentralized matrix sensing problem to recover a low-rank
symmetric matrix 𝑀∗=𝑈∗(𝑈∗)𝑇M∗=U∗(U∗)T, where 𝑈∗∈𝑅𝑑×𝑟U∗∈Rd×r and
𝑈∗U∗ has small rank 𝑟r.
• Experimental Setup:
• Number of worker nodes: 𝑛=20n=20
• Synthetic dataset with sensing matrices {𝐴𝑖}𝑖=1𝑁{Ai }i=1N and
observations 𝑏𝑖=⟨𝐴𝑖,𝑀⟩bi =⟨Ai ,M⟩
• Decentralized optimization problem formulation using loss function
𝐿𝑖(𝑈)Li (U)
• Matrix dimensions: 𝑑=50d=50 or 𝑑=100d=100, 𝑟=3r=3
• Ground truth low-rank matrix 𝑀∗=𝑈∗(𝑈∗)𝑇M∗=U∗(U∗)T with entries
from Gaussian distribution 𝑁(0,1/𝑑)N(0,1/d)
• Data Distribution:
• Random distribution and Dirichlet distribution Dir20(0.3)Dir20 (0.3)
for assigning data to worker nodes
• Network Topology:
• Ring topology, toroidal topology, and undirected exponential graph.
Matrix sensing
•Initialization:
•Initial value of 𝑈U set to [𝑢0,0,0][u0 ,0,0], where 𝑢0u0 from Gaussian
distribution, scaled such that ∥𝑢0∥≤max eig(𝑀∗)∥u0 ∥≤max eig(M∗)
•Algorithms Compared:
•PEDESTAL algorithm vs. decentralized baselines including D-PSGD, GTDSGD,
D-GET, D-SPIDER-SFO, and GTHSGD
•Results:
•PEDESTAL effectively reaches and escapes saddle points, finding the local
minimum
•Baselines remain stuck at saddle points
•Smallest eigenvalue of Hessian matrix at converged optimal point
significantly closer to 0 for PEDESTAL compared to baselines
•Conclusion:
•PEDESTAL demonstrates superior performance in escaping saddle points
and finding local minimum in decentralized matrix sensing problem.
•Future Directions:
•Further exploration of PEDESTAL algorithm in different optimization tasks
and real-world scenarios.
CODE
P er f or ma n c e C omp ar is o n
P er f or ma n c e C omp ar is o n
Assumptions
Convergence theorems
Proof
Proof is explained in detail in the paper w ith
some lemmas
Conclusion