Professional Documents
Culture Documents
Graph Coloring in Parallel Processing and Scientific Computing
Graph Coloring in Parallel Processing and Scientific Computing
Assefaw Gebremedhin
CSCAPES Institute and Old Dominion University assefaw@cs.odu.edu www.cs.odu.edu/~assefaw
Joint work with Alex Pothen (Purdue) Doruk Bozdag and Umit Catalyurek (Ohio State Univ) Erik Boman (Sandia) Fredrik Manne (Univ of Bergen) Andrea Walther (Tech Univ of Dresden) Arijit Tarafdar and Duc Nguyen (ODU) CSCAPES Workshop, June 2008, Santa Fe, NM
Assefaw Gebremedhin (CSCAPES) Coloring in Scientific Computing 1 / 27
S2 S1 S6
S3
S2 S1 S6
S3 S5 S4
Time
T1
T2
T3
2 / 27
context
S1. Determine the sparsity structure of the derivative (rst or second) matrix A R mn of the function F S2. Obtain a seed matrix S {0, 1}nq with the smallest q S3. Compute the numerical values of the entries of the compressed matrix B = AS R mq S4. Recover the numerical values of the entries of A from B The seed matrix S partitions the columns of A: sjk = 1 i column aj belongs to group k, 0 otherwise.
Recovery method
Direct Substitution
4 / 27
c3
c5 c4 Ga c1 c5 c4 Gb [V ] 2 c2
2
c3
a12 a22 0 0
0 0 0 a43
0 0 a34 a44
a15 0 0 a45
c3
Structurally orthogonal partition of matrix A equivalent to: Distance-2 coloring of the adjacency graph Ga (A) = (V , E ) when A is symmetric (McCormick, 1983) Partial distance-2 coloring of the bipartite graph Gb (A) = (V1 , V2 , E ) when A is nonsymmetric (GMP, 2005) Distance-1 coloring of the appropriate square graph (Coleman and Mor, 1983) e
Assefaw Gebremedhin (CSCAPES) Coloring in Scientific Computing 5 / 27
X X XX XXX XX 5 1 2
h1
1
h2
2 1
h3
1 4
h4
3
h5
3
h6
5 1 2
h11 B h21 + h23 + h25 B h33 B B h43 B B h55 B B B h63 + h65 + h69 B h71 B B h85 + h89 B @ h 0
99
h10,9
0 0 0 0
h7
h8
h9
h 10
Symmetrically orthogonal partition: whenever hij = 0 hj only column in a group with nonzero at row i or hi only column in a group with nonzero at row j Star coloring: a vertex coloring of Ga (H) s.t. is a distance-1 coloring and every path on 4 vertices (P4 ) uses at least 3 colors SymOP equivalent to star coloring (Coleman and Mor, 84) e
Assefaw Gebremedhin (CSCAPES) Coloring in Scientific Computing 6 / 27
0
X X XX XXX XX 3 2 1 X X
h1
1
h2
2 1
h3
1 3
h4
2
h
2
5
3
h
2
6
1
B B B B B B B B B B B B B B @
h11 h21 + h23 + h25 h33 h43 + h4,10 h55 h63 + h65 h71 h85 h9,10 h10,10
h12 + h17 h22 h32 + h34 h44 h52 h69 h77 h87 + h89 h99 h10,4 + h10,9
h7
h8
h9
h 10
Substitutable partition: whenever hij = 0 hj in a group where all nonzeros in row i are ordered before hij or hi in a group where all nonzeros in row j are ordered before hij Acyclic coloring: a vertex coloring of Ga (H) s.t. is a distance-1 coloring and every cycle uses at least 3 colors Substitutable partition equivalent to acyclic coloring (Coleman and Cai, 86)
Assefaw Gebremedhin (CSCAPES) Coloring in Scientific Computing 7 / 27
Overview of coloring models in derivative computation General sparsity pattern: unidirectional partition Jacobian distance-2 coloring Hessian star coloring (restricted star coloring) Jacobian NA Hessian acyclic coloring (triangular coloring)
Nonsym A Sym A
Regular sparsity pattern (discretization of structured grids): Formula-based coloring (Goldfarb and Toint, 1984) Hierarchical coloring (Hovland, 2007)
8 / 27
Outline
Models
Parallel scientic computing Derivative computation
9 / 27
10 / 27
(a)
(b)
(c)
(d)
(e)
(f)
Choose color for v forbid colors used by neighbors N(v ) of v forbid colors leading to two-colored P4
{w , x} N(v ) where (w ) = (x), forbid colors used by N(w ) and N(x) non-single-edge star S incident on v , forbid color of hub of S
Time: O(|V |d 2 )
Choose color for v forbid colors used by neighbors N(v ) of v forbid colors leading to two-colored cycles
tree T incident on v , if v adj to 2 vertices of same color, forbid the other color in T
Time: O(|V |d 2 )
Performance comparison:
new star and acyclic coloring algorithms vs previous algorithms
range sum
D2 9,240 28.2
RS 8,749 34.4
NS 7,636 930
S 7,558 162
A 4,110 32.5
D1 1,757 0.04
Table: Total number of colors and runtime, summed over all test cases.
13 / 27
Outline
Models
Parallel scientic computing Derivative computation
14 / 27
Jacobian computation in a Simulated Moving Bed process (chromatographic separation in chemical engineering) Hessian computation in an optimal electric power ow problem technique enabled cheap Jacobian/Hessian computation where dense computation is infeasible observed results for each step matched analytical results
0.02
Experiments showed
runtime(task)/runtime(F)
total S1 S2 S3 S4
10 15 20 nnz/100,000
25
30
30
15 / 27
Outline
Models
Parallel scientic computing Derivative computation
16 / 27
A dicult task since Greedy is inherently sequential For D1 coloring, several approaches based on Lubys parallel algorithm for maximal independent set exist Some drawbacks:
no actual parallel implementation many more colors than a serial implementation poor parallel speedup on unstructured graphs
17 / 27
Relaxed partitioning:
break up the given problem into p, not necessarily entirely independent, subproblems of almost equal sizes solve the p subproblems concurrently detect inconsistencies in the solutions concurrently resolve any inconsistencies RP can be used successfully if the resolution in the fourth step involves only local adjustments.
Assefaw Gebremedhin (CSCAPES) Coloring in Scientific Computing 18 / 27
19 / 27
Specializations of Framework
Framework can be specialized along several axes:
1
Coloring order:
interior vertices can be colored before, after, or interleaved with boundary vertices
Supersteps:
can be run synchronously or asynchronously
Inter-processor communication:
can be customized or broadcast-based
21 / 27
An answer requires considering a complex set of factors, including size and density of input graph number of processors quality of initial partitioning characteristic of platform on which implementation is run Determination bound to rely on experimentation
22 / 27
4 5
a superstep size s in the order of 1000 asynchronous supersteps a coloring order in which interior vertices appear either strictly before or strictly after boundary vertices First Fit color choice strategy customized inter-processor communication s in the order of 100 items 2 to 4 same as for moderately unstructured graphs broadcast-based communication
23 / 27
30
25
1.2
20
1.1
Speedup
15
10
0.9
0.8
0 0
10
Number of processors
15
20
25
30
35
40
0.7 0
Number of processors
15
20
25
30
35
40
24 / 27
110 100 90
110 100 90
80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40
80 70 60 50 40 30 20 10 0 0 5 10 15 20 25 30 35 40
Number of processors
Number of processors
Itanium 2
Pentium 4
25 / 27
Summary
Current accomplishments: Developed a unifying graph-theoretic framework for sparse derivative computation. Designed and implemented new sequential algorithms for distance-k, star, acyclic, and other coloring problems. C++ implementations assembled in a package called ColPack.
ColPack also includes various ordering routines for greedy coloring.
Integrated parts of ColPack with the AD tool ADOL-C. Developed parallel algorithms for distance-1, distance-2, and restricted star coloring.
Algorithms scale well for a hundred processors. Implementations made available via Zoltan.
Planned activities: Integrate coloring software with tools in OpenAD. Develop algorithms for coloring problems in partial matrix computation. Develop parallel star and acyclic coloring algorithms. Develop parallel coloring algorithms for tera and petascale computation. Collaborate with application and tool developers to plug in coloring technologies to enable CSE.
Further reading
Gebremedhin, Manne and Pothen. What Color Is Your Jacobian? Graph Coloring for Computing Derivatives. SIAM Review 47(4):629705, 2005. Gebremedhin, Tarafdar, Manne and Pothen. New Acyclic and Star Coloring Algorithms with Application to Computing Hessians. SIAM J. Sci. Comput. 29:10421072, 2007. Gebremedhin, Pothen and Walther. Exploiting Sparsity in Jacobian Computation via Coloring and Automatic Dierentiation: A Case Study in a Simulated Moving Bed Process. Fifth International Conference on AD, Bonn, Germany, Aug 2008, to appear. 12 pp. Gebremedhin, Pothen, Tarafdar and Walther. Ecient Computation of Sparse Hessians using Coloring and Automatic Dierentiation. INFORMS Journal on Computing, to appear. 30 pp. Bozdag, Gebremedhin, Manne, Boman and Catalyurek. A Framework for Scalable Greedy Coloring on Distributed-memory Parallel Computers. J. Parallel Distrib. Comput. 68(4):515535, 2008. Bozdag, Catalyurek, Gebremedhin, Manne, Boman and Ozguner. Distributed-memory Parallel Graph Coloring Algorithms for Jacobian and Hessian Computation. in preparation for submission to SIAM J. Sci. Comput. 22 pp.