Professional Documents
Culture Documents
Matrix Multiplications and Collective Communication: Michael Hanke
Matrix Multiplications and Collective Communication: Michael Hanke
Matrix Multiplications and Collective Communication: Michael Hanke
Multiplication
Michael
Hanke
Introduction
Matrix Multiplications And Collective
The Recursive
Doubling
Algorithm Communication
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication Michael Hanke
An
Application:
Google
School of Engineering Sciences
Chapter 5, p. 1/38
Matrix
Multiplication
Michael Outline
Hanke
Introduction
The Recursive
Doubling
Algorithm
1 Introduction
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication 2 The Recursive Doubling Algorithm
An
Application:
Google
3 Matrix-Vector Multiplication
4 Matrix-Matrix Multiplication
5 An Application: Google
Chapter 5, p. 2/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm • Let A and B be to matrices of dimensions M × N and K × L
Matrix-Vector
Multiplication • Matrix addition C = A + B (defined if M = K and N = L)
Matrix-Matrix
Multiplication
cm,n = am,n + bm,n (1 ≤ m ≤ M, 1 ≤ n ≤ N)
An
Application:
Google
• Matrix multiplication C = AB (defined if N = K )
N
X
cm,l = am,n bn,l
n=1
cm,l is the product of the m-th row of A and the l -th column of
B
Chapter 5, p. 3/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
for n = 1:N
Matrix-Matrix
Multiplication for m = 1:M
An c(m,n) = a(m,n)+b(m,n);
Application:
Google end
end
Chapter 5, p. 4/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
• Assume that C is initialized to zero.
Matrix-Vector
Multiplication
for l = 1:L
Matrix-Matrix
Multiplication for m = 1:M
An
Application:
for n = 1:N
Google c(m,l) = c(m,l)+a(m,n)*b(n,l);
end
end
end
Chapter 5, p. 5/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication
An
Application: Find an algorithm and a data distribution such that
Google
1 data doubling on different processes is avoided;
2 excessive data movement (communication!) is avoided.
Chapter 5, p. 6/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Strategy
Multiplication
Consider the loops from the innermost to the outermost one.
An
Application:
Google
1 The innermost (n-) loop is the scalar product of two vectors.
2 The n- and the m-loops are a matrix-vector multiplication for
each l
3 Finally, find an implementation of the complete algorithm
Chapter 5, p. 7/38
Matrix
Multiplication
Introduction
• The scalar product of two vectors x, y ∈ RM is defined as
The Recursive
Doubling
Algorithm
M
X
Matrix-Vector T
Multiplication s = hx, y i = x y = xm ym
Matrix-Matrix m=1
Multiplication
Chapter 5, p. 8/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication Problem
Matrix-Matrix
Multiplication
Compute the sum s on P processors,
An
Application:
Google
s = ω0 + ω1 + · · · + ωP−1
Chapter 5, p. 9/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication
Idea (for P = 8):
An
Application:
Google s = ω0 + ω1 + ω2 + ω3 + ω4 + ω5 + ω6 + ω7
| {z } | {z } | {z } | {z }
| {z } | {z }
| {z }
s
Chapter 5, p. 10/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication
An
Application:
Google
Chapter 5, p. 11/38
Matrix
Multiplication
Michael Realization
Hanke
Introduction
The Recursive
Doubling
Algorithm • Assume that P = 2D is a power of 2. On process p, the program
Matrix-Vector
Multiplication
reads:
Matrix-Matrix
Multiplication
s = ωp ;
An
for d = 0:D-1
Application:
Google
q = bitflip(p,d);
send(s,q);
receive(h,q);
s = s+h;
end
Chapter 5, p. 12/38
Matrix
Multiplication
Michael Comments
Hanke
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
• After execution of the program, every processor contains s
Matrix-Matrix
Multiplication • This algorithm is called Recursive Doubling (Butterfly algorithm)
An
Application: • The basic algorithm can be applied in many different contexts
Google
(examples: barriers, broadcast)
• Since every processor contains the final sum, its MPI
counterpart is MPI_Allreduce
• The algorithm as given may dead-lock. Use MPI_Sendrecv
Chapter 5, p. 13/38
Matrix
Multiplication
Introduction
Processes
The Recursive • Let 2D < P < 2D+1
Doubling
Algorithm s = ωp ;
Matrix-Vector
if p >= 2D
Multiplication send(s,bitflip(p,D));
Matrix-Matrix end
Multiplication if p < P-2D
An receive(h,bitflip(p,D));
Application:
Google s = s+h;
end
if p < 2D
for d = 0:D-1
send(s,bitflip(p,d));
receive(h,bitflip(p,d));
s = s+h;
end
end
if p < P-2D
send(s,bitflip(p,D));
end
if p >= 2D
receive(s,bitflip(p,D));
end
Chapter 5, p. 14/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
• Obviously, T1∗ = (2M − 1)ta
Matrix-Matrix • The local computation time is tcomp,1 = (2Ip − 1)ta
Multiplication
Chapter 5, p. 15/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm • For a given M × N-matrix A and an N-dimensional vector x,
Matrix-Vector
Multiplication y = Ax is given by
Matrix-Matrix
Multiplication N
X
An
Application:
ym = am,n xn = ham , xi
Google n=1
Chapter 5, p. 16/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication
An
Application:
Google
Chapter 5, p. 17/38
Matrix
Multiplication
Michael Algorithm
Hanke
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication • Once the data are distributed, all individual components ym can
An be computed in parallel using the Recursive Doubling Algorithm
Application:
Google
• Do not forget to block data exchanges for avoiding excessive
startup times!
• The performance analysis is similar to the one given before
Chapter 5, p. 18/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
• Let A be an M × N-matrix, B an N × K -matrix. Then C = AB
Matrix-Matrix
Multiplication is
N
An X
Application:
Google
cm,k = am,n bn,k = ham , bk i
n=1
m
where a is the m-th row of A and bk is the k-th column of B
• Why not simply generalize matrix-vector multiplication?
We would need excessive additional memory (QMK + PNK )
Chapter 5, p. 19/38
Matrix
Multiplication
Michael Potential
Hanke
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication Observation
An If M = N = K , the matrix multiplication requires M 3 operations on
Application:
Google M 2 data. So we expect a good potential for parallelization.
Chapter 5, p. 20/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
• Assume that we have P = M
Multiplication
• Assign (am , bk ) to processor (m, k)
Matrix-Matrix
Multiplication
• Compute cm,k on processor (m, k)
An
Application: • Advantages:
Google
• Drawbacks:
• A lot of communication for initializing
• or, large memory requirements
Chapter 5, p. 21/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Chapter 5, p. 22/38
Matrix
Multiplication
The Recursive
Doubling cm,m = am,1 b1,m + · · · + am,m bm,m + · · · + am,M bM,m
Algorithm
Matrix-Vector
Multiplication are available
Matrix-Matrix A × B = C
Multiplication
An b12
Application:
Google a21 a22 a23 b22 c22
b32
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication • Consider element c1,2 ,
Matrix-Matrix
Multiplication
c1,2 = a1,1 b1,2 + a1,2 b2,2 + · · ·
An
Application:
Google • While a1,2 is available, b2,2 is not
• This can be corrected by shifting the second column of B
(cyclically) upwards
• In order not to change the indices on the diagonal, the second
row of A must be shifted (cyclically) to the left
Chapter 5, p. 24/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
• More general, the matrices A and B must be initialized as
Matrix-Matrix follows:
Multiplication
• The m-th row of A is shifted cyclically m − 1 positions to the left
An
Application: • The n-th column of B is shifted cyclically n − 1 positions upwards
Google
A B
a11 a12 a13 b11 b22 b33
a22 a23 a21 b21 b32 b13
a33 a31 a32 b31 b12 b23
Chapter 5, p. 25/38
Matrix
Multiplication
Michael Algorithm
Hanke
Introduction
The Recursive
Doubling 1 Initially, processor (p, q) has elements ap+1,q+1 , bp+1,q+1
Algorithm
Matrix-Vector 2 Shift the rows of A and the columns of B into the structure
Multiplication
described above
Matrix-Matrix
Multiplication 3 for k = 1, . . . , N
An
Application: 1 Multiply the local values on each processor
Google
2 Shift the rows of A and the columns of B cyclically by one
processor
4 If necessary: undo step 1
Remarks:
• If the number of processors is much less than N 2 ,
a blocked version will do.
• The last shift in step 3.2 can be omitted.
Chapter 5, p. 26/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm Assume: R = P × P and P divides N
Matrix-Vector
Multiplication Step 2: Use red-black communication
Matrix-Matrix
Multiplication
t2 = 4 ∗ (tstartup + Ip2 tdata )
An
Application:
Google
Step 3.1: t3,1 = 2 ∗ Ip3 ta
Step 3.2: t3,2 = 4 ∗ (tstartup + Ip2 tdata )
Chapter 5, p. 27/38
Matrix
Multiplication
Michael Speedup
Hanke
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication
2N 3 ta
SR =
An 2PIp3 ta + 4(P + 1)(tstartup + Ip2 tdata )
Application:
Google
2N 3 ta
=
2P(N/P)3 ta + 4(P + 1)(tstartup + (N/P)2 tdata )
1
=R √
R( R+1) tstartup tdata
1 + 2 N3 ta + 2 P+1N ta
Chapter 5, p. 28/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication There are a number of other algorithms available which have a better
Matrix-Matrix efficiency than Cannon’s algorithm:
Multiplication
Chapter 5, p. 29/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm The birth of Google
Matrix-Vector
Multiplication
Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd: The
Matrix-Matrix PageRank citation ranking: Bringing order to the web. Technical
Multiplication
Report SIDL-WP-1999-0120,
An
Application: http://dbpubs.stanford.edu/pub/1999-66
Google
A search engine faces two problems:
1 Find all web pages which contain the search phrase (more
general: satisfy some search criterion)
2 Among all these pages, find the most relevant
Ad 1: Web crawler
Ad 2: PageRank citation ranking
Chapter 5, p. 30/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm • Every web page has a number of forward links (pointers to other
Matrix-Vector
Multiplication
web pages) and a number of backlinks (web pages pointing to
Matrix-Matrix the given page)
Multiplication
An
• A web page seems to be more important if the number of
Application:
Google
backlinks is large
• Simple counting the number of backlinks may not be sufficient:
A backlink with a high importance should have a higher weight
• This is a recursive definition: A page is important if the
backlinks are important.
• But what is the root of this recursion?? The web does not have
a root :-(
Chapter 5, p. 31/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm • Let m be a webpage
Matrix-Vector
Multiplication • Let Fm be the set of pages m points to
Matrix-Matrix
Multiplication • Let Bm be the set of pages that point to m
An
Application:
• Let Nm = |Fm | be the number of forward links, and c be a
Google
normalization factor
• Then we define PageRank:
X Rn
Rm = c
Nn
n∈Bm
Chapter 5, p. 32/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
• Define a matrix A by
Matrix-Matrix
Multiplication (
An 1/Nn , if there is a link from m to n,
Application: amn =
Google 0, otherwise
R = cAR
Chapter 5, p. 33/38
Matrix
Multiplication
Michael PageRank
Hanke
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Definition
Matrix-Matrix
The PageRank of a given connectivity matrix A is an eigenvector R
Multiplication corresponding to the largest eigenvalue c of A.
An
Application:
Google
Chapter 5, p. 34/38
Matrix
Multiplication
Introduction
Problems
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
Matrix-Matrix
Multiplication • Given a square matrix A and an initial guess x [0] 6= 0
An
Application: • Form the sequence x [k+1] = Ax [k] , k = 0, 1, . . .
Google
• It can be shown that, for many matrices A and almost all initial
guesses x [0] the sequence {x [k] /kx [k] k} converges towards an
eigenvector to the largest eigenvalue c of A
hx [k+1] ,x [k] i
• An estimation ck of c can be obtained by ck =
hx [k] ,x [k] i
Chapter 5, p. 35/38
Matrix
Multiplication
Michael Algorithm
Hanke
Introduction
The Recursive
Doubling
Algorithm
% Let x = x0 be chosen
Matrix-Vector
Multiplication err = tol;
Matrix-Matrix c = 0;
Multiplication
while err >= tol*abs(c)
An
Application: nrm = 1/sqrt(<x,x>); % recursive doubling
Google
x = nrm*x;
y = A*x; % matrix-vektor mult
cnew = <y,x>; % recursice doubling
err = abs(cnew-c);
c = cnew;
x = y; % vector transposition
end
Chapter 5, p. 36/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Vector
Multiplication
• Assume a P × Q processor mesh and A distributed accordingly
Matrix-Matrix
Multiplication • Let y be distributed in the same way as the columns of A
An
Application: • The simple case: P = Q and row and column data distributions
Google
are equal:
• MPI_Alltoall can be used to transpose y
• The general case is much more involved. Essentially, it is
equivalent to one recursive doubling step
Chapter 5, p. 37/38
Matrix
Multiplication
Introduction
The Recursive
Doubling
Algorithm
Matrix-Matrix
consecutive steps are optimized individually.
Multiplication
• Hence, optimizing a sequential algorithm is a local problem.
An
Application:
Google
• In a parallel environment, for a given algorithm the execution
time depends both on the compuation time and the data
distribution.
• A certain data distribution may be optimal at one step of the
algorithm while suboptimal for another.
• Thus, optimizing a parallel program is a global problem!
Chapter 5, p. 38/38