Chapter9-Parallel Distributed Algorithm

Chapter 14 Parallel and Distributed Algorithm
Peng Li-Lanzhou University

May 27, 2023
We introduce two generic optimization problems,consensus and sharing [1].
1 Global Varable Consensus optimization

We first consider the case with a singular global variable with the objective and
constraint terms split into N parts.
X
N
min f (x) = fj (x) (1.1)
x∈Rn
j=1
Where fi : Rn → R ∪ {+∞} are convex.The goal is to solve the problem in such

a way that each term can be handled by its own processing element.
For example,x represents the parameters in a model and fj represents the loss
function associated with the jth block of data or measurements.We say that x is
found by collaborate filltering.Since the data sources are ”collaborating” to de-
velop a global model
minx f (x) = g(x) + h(x)
The problem can be rewriten with local variables xi ∈ Rn and a common global
variable z ∈ Rn P
minx̃,z j fj (xj )
(1.2)
s.t.xj − z = 0
where x̃ = (x1 , x2 , · · · , xN )⊤ ∈ RnN ,This is called the global consensus prob-
lem,since the constraint is that all the local variable should agree,i.e.be equal


 x1 −z = 0

 x2 −z = 0
. .. .. (1.3)

 .


N x −z =0
ADMM for (1.2) can be derivedPeither directly from engmented Lagrange.
Lρ (x1 , · · · , xN ; z) = j Fj (xj ) + ⟨yi , xi − z⟩ + ρ2 ∥xi − z∥22
1
or simply as a special case of the constraint optimization.
X
min f (x) = fj (xi ) (1.4)
x∈C
i
where the constraint set
C = {(x1 , · · · , xN ) ∈ RN ; x1 = x2 = x3 = · · · xN } (1.5)
The resulting ADMM algorithm is as follows.
 2

 xk+1 = arg minxi fi (xi ) + ρ2 xi − z k + yik /ρ 2 ,


i

 = P roxfi /ρ z k − yik /ρ , i = 1, . . . , N
P 2
z k+1 = arg minz i ρ2 xk+1 − z + yik /ρ 2 (1.6)

 P i

 = N1 N j=1 xj
k+1
+ yik /ρ

 k+1
yi = yik + ρ xk+1
i − z k+1 i = 1, · · · , N
This processing element that handles the global variable z is sometimes callled
the Center Collector or Fusion Center.Note that z-update is simply the projection
of xk+1 + y k /ρ. Onto the constraint sets C of ”block constant”vectors.i.e. z k+1 is
the average of xk+1 + y k /ρ for N-elements.
1 X k+1 1 X k
z k+1 = xj + y /ρ = x̄k+1 + ȳ k /ρ (1.7)
N j N j j
averaging the y-update gives

ȳ k+1 = ȳ k + ρ x̄k+1 − z k+1 (1.8)
Moreover,substituting (1.7) into (1.8) shows that

ȳ k+1 = ȳ k + ρ −ȳ k /ρ = 0 (1.9)
i.e. the dual variables have average value zero after the first iteratim.Usingz k =
x̄k .ADMM (1.6) can be written as
 2
 ρ

 x k+1
= arg min f i (x i ) + xi − x̄ k
+ y k
/ρ
 i 2 i 2
= P roxfi /ρ x̄ − yi /ρ
k k
(1.10)

 yik+1 = yik + ρ xk+1 − x̄k+1 i = 1, ..., N
 x̄k+1 = 1 PN xk+1
 i
N j=1 j
which is consensus ADMM.Here each pair (xk+1 i , yik+1 ) can be updated in parallel
way.for i = 1, ..., N .
This is similar as that of Tocabi-ADMM.
For consensus ADMM ,the primal and chual residuals are
( ⊤
γ k = xk1 − x̄k , · · · , xkN − x̄k
⊤ (1.11)
S k = −ρ x̄k−1 − x̄k , · · · , x̄k−1 − x̄k
So that,squard norms are
2
( 2 P
γ k = N xkj − x̄k
k 22
j=1 2 (1.12)
s = N ρ2 x̄k − x̄k−1 2
2 2
2 Global Varable Consensus with Regularition
A simple variable is as follows.
min fj (xj ) + g(z) s · t · xj − z = 0. (2.1)

x,z
Where g represents a simple constraint or regulazation handled by a central col-

lector P
minx j fj (xj ) + g(x)
The resulting ADMM algorithm is as follows
 2

 xk+1 = argminxj fj (xj ) + ρ2 xj − z k + yjk /ρ 2


j

 = proxfj /ρ z k − yjk /ρ

 k+1 P 2
z = argminz g(z) + i ρ2 xk+1 − z + yjk /ρ 2
j (2.2)

 = argminz g(z) + ρN z − x̄k+1 + ȳ k /ρ ∥22

 2

 = P rojg/ρN x̄k+1 + ȳ k /ρ

 k+1
yj = yjk + ρ xk+1
j − z k+1
Here
ρN 1 X
z k+1 = argming(z) + ∥z − xj k+1 + yjk /ρ ∥22 . (2.3)
z 2 N j
ρN X k+1
0 ∈ ∂g z k+1 + z − xk+1
j + yjk /ρ (2.4)
N j

= ∂g z k+1 + ρN z k+1 − x̄k+1 + ȳ k /ρ (2.5)
ρN
z k+1 = arg min g(z) + ∥z − (x̄k+1 + ȳ k /ρ)∥22 (2.6)
z 2
Please note that averaging the y-update gives
ȳ k+1 = ȳ k + ρ x̄k+1 − z k+1
In this case,we do not in general have ȳ k = 0 .So cannot drop the z-update.
Remark.For the following problem
P
min j f0 (xj ) + rj (xj )
(2.7)
s.t. xi = xj
Zeng and Yin solved it by parallel Proximal Gradient’s Descent Method [3]
3
3 Sharing
We consider the following sharing problem

!
X X
minxj fj (xj ) + g xj (3.1)
j j
where fj is a local cost function for subsystems i,and g is the shared objective.This
sharing problem is improtant both
(a).many useful problems can be put into this form
(b).it enjoys a dual relationship with the consensus problem:
Sharing can be writen as ADMM form by copying all variable.
( P P
minx,z j fj (xj ) + g j zj j = 1, . . . , N (3.2)
s · t. xj − zj = 0,
with x = (x1 , ..., xN )⊤ ∈ RnN ,z = (z1 , ..., zN )⊤ ∈ RnN . This ADMM is
 ρ

 x k+1
= argmin f j (xj ) +
xj − zjk + yjk /ρ 2


j
2 2


xj



 = projfj /ρ zjk − yjk /ρ
!
X ρ X k+1 (3.3)



 z k+1
= argmin g z j + ∥xj − zj + yjk /ρ∥22


j
2

 j

j
 k+1
yj = yj + ρ xj − z j
k k+1 k
The xj -and-yj -steps can be carvied at independently in paralle for each j = 1....., N
Next,we will show how to carried z-subproblem.Let vik = xk+1

i + yik /ρ,the z-
subproblem can be written as
( P 2
minz,z̄ g(N z̄) + ρ2 j zj − vjk 2
P (3.4)
s.t. z̄ = N1 j zj
Mininizing over z1 , ...zN with z̄ fixed has the solution

zj P= vjk
1 ⇒ zj = vjk + z k+1 − v k (3.5)
N j zj = z̄
P P
z̄ k+1 = N1 j zjk+1 = N1 j vjk = v̄ k
Then the z-update an yj -update can be computed by
P 2
minz̄ g(N z̄) + ρ2 j z̄ − v̄ k 2

ρN

k 2
= minz̄ g(N z̄) + √ 2
z̄ − v
ρ N
2
k 2
(3.6)
= min g(N z̄) √ + N z̄ − N v
2 2
= P roxg/ρ N N v̄ k /N
4
and
yjk+1 = yjk + ρ xk+1
j − z j
k+1
k+1
= yj + ρ xj − vjk + z̄ k+1 − v̄ k
k
.
k+1
= yj + ρ xj − xj + yj /ρ − z̄
k k+1 k k+1
+ v̄ k (3.7)

= yjk + ρ −yjk /ρ − z̄ k+1 + x̄k+1 + ȳ k /ρ

= ȳ k + ρ x̄k+1 − z̄ k+1
The equation (3.7) shows that the dual variable y k+1 are all equal (i.e.consensus)and
can be replaced by a single dual variable y ∈ Rn

y k+1 = y k + ρ x̄k+1 − z̄ k+1 (3.8)
substiting the equivalent expression for zj

zj = v0k + z̄ k+1 − ∇
¯k
= xk+1
j + yjk /ρ + z̄ k+1 − x̄k+1 − ȳ k /ρ)
(3.9)
= xk+1
j + y k /ρ + z̄ k+1 − x̄k+1 − y k /ρ

= xk+1
j + z̄ k+1 − x̄k+1
into the x-update,we get
ρ
xj − zjk + y k /ρ 2
xk+1
j = argmin fj (xj ) + 2
2
ρ
= arg min fj (xj ) + ∥xj − xkj + z̄ k − x̄k − y k /ρ ∥22 (3.10)
2
= P roxfj /ρ xj + z̄ − x̄k − y k /ρ
Combining the (3.10),(3.6) ,and (3.8).we get the final algorithm.
 k+1

 xj = P roxfj /ρ xkj + z̄ k − x̄k −
y k /ρ
 k+1
z̄ = P roxg/ρN N x̄k+1 + y k /ρ /N
(3.11)

 y
k+1
+ ρ x̄k+1 − z̄ k+1
= yk P

x̄k+1 = N1 j xk+1j
4 Duality od sharing problem
Let the lagrange funtion of be (3.2) P

P
L = j (fj (xj ) + ⟨yj , xj − zj ⟩) + g j zj
5
The dual function
Γ (y1 , · · · , yN ) = inf L (x1 , · · · , xN , z1 · · · , zN , y1 , · · · , yN )
x=z
" !#
X
N X X
= inf inf − [⟨−yj , xj ⟩ + fj (xj )] − ⟨yj , zj ⟩ − g zj
z x
j=1 j j
!!
X X X
=− fj∗ (−yj ) − sup ⟨yj , zj ⟩ − g zj
z
j j j
* + !!
X X X
=− fj∗ (yj ) − sup yj , zj −g zj
z
j j j
P
− j fj∗ (−yj ) − g ∗ (yj ) , if y1 = · · · = yN
=
−∞, otherwise
(4.1)
∗
Letting ψ = g ,hj (yj ) = fj∗ (−yj ).the
dual problem can be written as
P
− miny,w i hi (yj ) + ψ(w) (4.2)
s.t. yj − w = 0, j = 1, · · · , N
5 Optimal Exchange
We higlight an improtant special case of the sharing problem -Exchange problem

with an appenling ecomic interpretation
P
minxj j fj (xj )
P (5.1)
s.t. j xj = 0
This is a sharing problem with

1{x1 ,··· ,xN :∑ xj =0}
The compontents of the vector xj represent quantities of commodities that are
exchange among or subsystems.
(a).When(xi )j ≥ 0 .it can be viewed as the amount of commodity j contributed
by subsystem i to the exchange.
(b).(xi )j < 0,its megnititude|(xi )j |can be uewed as the amout of commodity j con-
tributed by subsystem i to the exchange.
The exchange problem can be treated as a generic constraint convex problem
P
min j fj (xj )
(5.2)
s.t. x ∈ C
where P
C = x = (x1 , · · · xN ) ∈ RnN , xj = 0
The exchange ADMM algorithm gives
6
 k+1
 xj = P roxfj /ρ xkj − x̄k − y k /ρ
y k+1 = y k P
+ ρx̄k+1 (5.3)
 k+1
x̄ = N1 j xk+1j
6 Decentribulized Concensus
when we solve the global consensus optimization problem,
PN
min f (x) = j fj (x) (6.1)
we use the constriants
P
minxi N i fi (xi ) (6.2)
s.t. xj − z = 0 j = 1, . . . , N
where x̂ = (x1 , · · · , xN )
Here z is the center collector.However,in many scenes,the data is collected or
sorted in a distributed meaner.a fusion center is either disallowed or not econ-
imal.Consequently,any computing tasks must be complushed a decentrialized and
collorative manner by a agents .This approach can be powerful and eﬀicient,as
(a)the computing tasks are distributed over all the agents.
(b)information exchange occurs only between the agents with direct communica-
tion links.
This is no risk of central computation overload .or networks congestion.
W.S.QLing.K.Y uan.G.W u.W Y in.IEEET SP, 2014

For the global consensus optimization (6.1) we give the local copy of x the agent
i,and get
P
min i fi (xi )
(6.3)
s.t. xi = xj , ∀(i, j) ∈ ε
where x = (x1 , · · · , xN )⊤ ∈ RnN ,or
P
min j=1 fj (xj )
(6.4)
s.t.xj = x̄
1
P 1
PN
where x̄ = |v| i∈v xi = N i=1 xi
Here,we consider a network consisting of L agents bibirectionly connected by E
edges:
We can described the network as a symmetic indrected graph
Gu = v, ε,where v = 1, · · · , N is the set of vertexes |v| = N , ε is the set of the
edges with |ε| = E
For any vertex is v,we denotes N{i} = {j : (i, j) ∈ ε} it neighborhood
7
v = {v1 , · · · , v5 } , ε = (1, 2), (1, 5), (1, 3), (2, 5), (3, 2), (3, 4), (4, 5)
7 Distributed ADMM
In the optimization problem (6.3) ,the constraint can be writted as
x1 =x2 = x3 = · · · = xN

 x1 − x2 =0

 x2 − x3 =0
i.e. . .. ..

 .

 x
N −1 − xN = 0
This can be represented by edge-node incidnete.matrix A as follows
Ax = 0
where 
1−1
 1−1 
 
A= . .  ∈ R(N −1)×N
 . 
1−1
is the Laplace matrix
Therefore,we can get following problem
P
min N j=1 fj (xj ) (7.1)
s.t. Ax = 0
or
( P
minx1 ,··· ,xN N j=1 fj (xj )
PN (7.2)
s.t. j=1 Aj xj = 0
Next,we give the distributed ADMM for solving (7.1). For agent i
( k P
xi = arg minfi (xi ) + ρ2 j∈N (i) ∥xj − xi + yij /ρ∥22 i = 1, · · · , N
xi (7.3)
k+1
yji k
= yji + ρ xk+1
j − x k+1
i
Given
Nj = Ni+ ∪ Ni−
8
with
Ni+ = {j : (i, j) ∈ ε, i < j}
(7.4)
Ni− = {j : (i, j) ∈ ε, i > j}
then design the following Distributed ADMM
 hP i hP i

 0 ∈ ∂fi xi − xj + yii /ρ + ρ j∈Ni− xj − xi
k+1 k+1 k+1
 +ρ + x
k k k k
+ yji /ρ
h j∈N i
i
P i hP − k+1 i
= ∂f x k+1
+ ρ N + k+1
x − x k
+ y k
/ρ + ρ x k
+ y k
/ρ + N i x i


i i i i

+
j∈Ni j ji −
j∈Ni j ji
 k+1
yji = yji k
+ ρ xk+1
j − xk+1i
(7.5)
k+1
where xi -update only relates to its neigborhood Ni
ρ P
xi = arg min fi (xi ) + 2 ∥ i̸=j Aj xj + Ai xi − b + y/ρ∥22
k+1 k
For the model (6.4).we can writted it as follows [2]

minx,z F (x) + g(z)
(7.6)
s.t. Ax + Bz = 0
where

A = [A1 ; A2 ] ∈ R2En×Nn
(7.7)
B = [IEn , −IEn ] ∈ R2En×En
Though some transformations, we get
( P
0 ∈ ∂fi xk+1
i + w
i
k
+ 2ρ |N i | x k+1
i − ρ |N
i | x k
i + x
i∈Ni i
k
P (7.8)
wik+1 = wik + ρ |Ni | xk+1
i − x
j∈Nj j
k+1
where w = m with y ∈ RLn with M = A⊤ ⊤

1 − A2
8 Distributed Gradeint Method

Next ,we will show to solve (6.3) via gradient descent method.
At temtion k, every agent i sends its current itenite xki to its neighbors j ∈
Ni ,and receives xki from its neighbors j ∈ Ni .Then every agent i,excutes the con-
sensus update step P P
xk+1
i = aii xki + j∈Ni aijP
xkj = j∈Ni ∪{i} aij xkj
where the weights aii > 0 for j ∈ Ni with j∈Ni ∪{i}
Here the position sclars aij ,j ∈ Ni ∪ {i} are referred to as convex weights,and
vector xk+1
i is said to e a convex combination(or weighted average)of points xkj ,j ∈
Ni ∪ {i}. Note that the weights aij ,i ∈ Ni ∪ {i}are selected by agent i.
Let us define an N × N weight matrix A with entries

> 0 j ∈ Ni ∪ {i} P
aij , j∈Ni ∪{i} , aij = 1 (8.1)
=0 j∈ / Ni ∪ {i}
9
PN
Where the sum of each row of A is equal to l;i.e. j=1 aij = 1.we refer such matrix
to row stochastic.Then becomes
1 X
N
xk+1
i = PN aij xkj (8.2)
j=1 aij i=1
Then every agent i executes the folowing two steps

k PL
x̃i = j=1 aij xkj (8.3)
xk+1
i =x eki − ak ∇fi x̃ki
where αk > 0 is the step size

xk+1
i = xki − αk ∇fi xki
we also can have the following variance of
k
x̄i = xki − αk ∇fi xki
P (8.4)
xk+1
i = Lj=1 aij x̄ki
By taking the average
1 X k+1 1 X k αk
L
xi = x̃i − ∇fi x̄ki
L i L i L
! (8.5)
1X X αk X
L L L
= aij xj −
k
fi x̃ki
L j=1 i=1 L i=1
Where
PL the matrix is doublely column stochastic,i.e.The matix A is also stochastic
i=1 aij = 1,for j,then
k P
x̃ = j aij xkj
P (8.6)
x̄k+1 = x̄kj − aLk Lj=1 ∇fi x̃ki
9 The Relationship Between Parallel and Dis-

tributed Computing Parallel Computing
Figure 1: Parallel Computing
10
Figure 2: Distributed Computing
X ρ k
xk+1 = argmin f i j +
(x xj − xi + yijk /p 2 i = 1, · · · , L. (9.1)
i 2
j∈N
2
i
example Given a training date set

S = {(ai , bi ) ∈ Rn × R}N
i=1
we want to learn a parameter x that minimizes
X
N
min fi (xi , ai , bi ) (9.2)
x∈χ
j=1
where χ ∈ [−1, 1]n ,and fi are the loss functions associated over the dataset.For
example ,fi are quadratic,i.e.
X
N
2
min a⊤
i x − bi (9.3)
x∈χ
i=1
The stochastic matrix C is chosen as a lazy Metropolis matrix corresponding to G

satisfying,i.e.
 1
 2 max{i,|Nj |} (i, j) ∈ ε
C = [cij ] = (i, j) ∈ (9.4)
 P0 / ε, i = j
1 − j∈Ni aij i=j
Then we get 2
P ⊤
minxi ∈χ N i=1 ai xi − bi (9.5)
s.t.Cx = 0
and solve it by distributed gradient descent method by the weight matrix C.
References
[1] S Boyd, N Parikh, E Chu, B Peleato, and J Eckstein. Found trends mach. In
Learn, volume 3, page 1, 2011.
[2] Ermin Wei and Asuman Ozdaglar. Distributed alternating direction method
of multipliers. In 2012 IEEE 51st IEEE Conference on Decision and Control
(CDC), pages 5445–5450. IEEE, 2012.
[3] Jinshan Zeng and Wotao Yin. On nonconvex decentralized gradient descent.
IEEE Transactions on signal processing, 66(11):2834–2848, 2018.
12

Chapter9-Parallel Distributed Algorithm

Uploaded by

Copyright:

Available Formats

You might also like

Chapter9-Parallel Distributed Algorithm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter9-Parallel Distributed Algorithm

Uploaded by

Copyright:

Available Formats

Chapter 14 Parallel and Distributed Algorithm

Peng Li-Lanzhou University

We introduce two generic optimization problems,consensus and sharing [1].

1 Global Varable Consensus optimization

Where fi : Rn → R ∪ {+∞} are convex.The goal is to solve the problem in such

where the constraint set

averaging the y-update gives

2 Global Varable Consensus with Regularition

A simple variable is as follows.

min fj (xj ) + g(z) s · t · xj − z = 0. (2.1)

Where g represents a simple constraint or regulazation handled by a central col-

We consider the following sharing problem

Next,we will show how to carried z-subproblem.Let vik = xk+1

4 Duality od sharing problem

Let the lagrange funtion of be (3.2) P 

We higlight an improtant special case of the sharing problem -Exchange problem

This is a sharing problem with

W.S.QLing.K.Y uan.G.W u.W Y in.IEEET SP, 2014

For the model (6.4).we can writted it as follows [2]

where w = m with y ∈ RLn with M = A⊤ ⊤

8 Distributed Gradeint Method

Then every agent i executes the folowing two steps

where αk > 0 is the step size

9 The Relationship Between Parallel and Dis-

Figure 1: Parallel Computing

example Given a training date set

The stochastic matrix C is chosen as a lazy Metropolis matrix corresponding to G

You might also like

Let the lagrange funtion of be (3.2) P