2010 Silva Power Grid Simulation (15132)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO.

10, OCTOBER 2010 1523

Efficient Simulation of Power Grids


João M. S. Silva, Member, IEEE, Joel R. Phillips, Fellow, IEEE, and Luı́s Miguel Silveira, Senior Member, IEEE

Abstract—Modern deep sub-micron ultra-large scale inte- The simulation of power distribution networks is a difficult
gration designs with hundreds of millions of devices require task owing to the huge number of elements in such circuits [2],
huge grids for power distribution. Such grids, operating with
[3]. Much research and development has been devoted to this
decreasing power supply voltages, are a design limiting factor and
accurate analysis of their behavior is of paramount importance problem, both in the modeling and simulation phases. In this
as any voltage drops can seriously impact performance or paper, we concentrate on the analysis or simulation part of
functionality. As power grid models have millions of unknowns, the problem. We will assume a typical model of a power grid
highly optimized special-purpose simulation tools are required to consist of an extremely large RC network, sometimes with
to handle the time and memory complexity of solving for their
millions of nodes, with a complex set of excitation sources
dynamic behavior. In this paper, we propose a hierarchical matrix
representation of the power grid model that is both space and and drains. Inductance is sometimes also included in the
time efficient. With this representation, reduced storage matrix model, typically to model wire bonding, but by comparison
factors are efficiently computed and applied in the analysis at it represents a very small subset of the overall model. In
every time-step of the simulation. Results show an almost linear this paper we will restrict ourselves to the RC portion of the
complexity growth, namely O(n loga (n)), for some small constant
model as its analysis is always necessary even if inductance is
a, in both space and time, when using this matrix representation.
Comparisons of our academic implementation with production- included. Block techniques can be used to handle the coupling
quality code prove this method to be very efficient when dealing between the large RC block and the inductance-modeled
with the simulation of large power grid models. packaging/wiring sub-circuit. The network excitations usually
Index Terms—Power-grid analysis, power-grid simulation, hi- consist of the biasing sources, generally modeled as constant
erarchical matrices, power-grid-modeling, Cholesky decomposi- voltage sources, and the connection to the designed-in devices,
tion. which act as sources or drains depending on their electrical
state. A simplified model for such sources is to consider
I. Introduction them as time-varying independent current sources, but more
complex models that take into account loading and feedback
I N RECENT years, the relentless trend for high-
performance circuits with low-power consumption has
raised new challenges to designers. Higher performance has
into the network are sometimes used. This simplified modeling
approach has the important implication that the grid model
becomes a linear system, thus much easier to simulate. Com-
meant increased functionality fueled often by technology
mercial large scale power grid simulators in use nowadays
scaling. Achieving lower power has been met by special-
typically assume a model as described.
ized design techniques together with supply voltage scaling.
From a simulation standpoint, there are two families of
However, adding more functionality implies that more devices
techniques for the time domain analysis of the linear descrip-
must be powered, and thus huge power distribution networks
tion of the power grid: iterative or direct methods. For DC-
are now deployed throughout the design. Moreover, lower
type problems that require a single system solution, iterative
supply voltage makes potential voltage drops a more serious
methods are preferred. However, robust verification techniques
concern, as they may impact performance or functionality
require that dynamic analysis be performed, which implies
of the whole design by reducing noise margins and slowing
solving for the time evolution of the system along a given
down devices [1]. Therefore, it has become clear that in
simulation interval.
modern circuits proper design and behavior of power grids
Taking advantage of the problem structure and inspired
is a performance limiting factor. The efficient verification of
by knowledge gained in solving similar problems resulting
such grids is thus seen as an essential step in predicting and
from discretized elliptic partial-differential equations (PDE),
ensuring correct behavior and performance.
efficient algorithms have been proposed based on precondi-
Manuscript received July 24, 2009; revised March 4, 2010; accepted April tioned conjugate gradient [4] or multigrid methods [5], [6].
15, 2010. Date of current version September 22, 2010. This work was These methods show good convergence ratios, but require
supported in part by FCT (INESC-ID Multiannual Funding) through the
PIDDAC Program funds. This paper was recommended by Associate Editor several iterations for recomputing the solution of the system
R. Suaya. for each time-point (in essence the dynamic problem is similar
J. M. S. Silva is with Synopsys, Mountain View, CA 94043 USA (e-mail: to solving the same system with multiple right-hand sides).
joao.m.santos.silva@gmail.com).
J. R. Phillips is with Cadence Design Systems, Berkeley, CA 95134 USA Since a few hundred time points are usually employed in
(e-mail: jrp@cadence.com). the analysis, the potential advantages of iterative methods are
L. M. Silveira is with INESC ID, the Systems and Computers Engineering quickly lost. Direct methods, on the other hand, find an a
Institute, Research and Development/Instituto Superior Técnico, Technical
University of Lisbon, Lisbon 1000-029, Portugal (e-mail: lms@inesc-id.pt). priori system matrix decomposition which can then be used
Digital Object Identifier 10.1109/TCAD.2010.2061512 in accelerating the solution at each time-step. Even though the
0278-0070/$26.00 
c 2010 IEEE
1524 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 10, OCTOBER 2010

resulting matrix factors are very costly to compute and store,


most commercial power grid analysis tools available nowadays
use some form of highly optimized direct solver technology.
A comparative analysis between direct and iterative methods
for this problem is available elsewhere [2], [5] and we will
not further pursue it here.
In this paper, we explore an approach based on using direct
methods to solve the linear subset of equations which results
from the formulation of a power distribution network. The
novelty of the proposed technique is related to how we address
the issues of storage and computational cost. While we still
compute a factorization of the system matrix and apply the
resulting factors to obtain the solution at each time-step, we Fig. 1. Power grid model.
propose to use a hierarchical matrix representation of the
underlying system based on an H-Matrix representation [7]. ing, other kinds of parasitic effects, such as coupling between
H-Matrices are efficient data-sparse representations of certain strips in the same level and so on, can be included. While
densely populated matrices. With this representation, the in- simplified, this type of model is to some extent representative
crease in matrix density that comes from the factorization of what is used in commercial tools.
is managed to obtain a reduced storage data structure. Such Assuming the extracted model consists of n nodes, the
a representation is then efficiently applied to generate the network equations can generally be written as
system solution at each time-step in the simulation window.
dv(t)
H-Matrices show almost linear complexity in both storage C + Gv(t) = i(t) (1)
and evaluation [8], [9]. The initial hierarchical factoriza- dt
tion procedure, while having a slightly higher complexity, where C, G ∈ Rn×n are the matrices modeling the dynamic
is nevertheless also almost linear. The efficient storage and and static network components, respectively, v ∈ Rn is the
evaluation properties of the proposed method, characterized by vector of voltages at the grid nodes, and i ∈ Rn the vector
complexities growing as O(n loga (n)) for some small constant of currents imposed at those same nodes. If time-varying
a, immediately pay-off in the fast evaluation that results from independent current sources are assumed at the sources/drains
the application of the matrix factors to repeatedly obtain the of the network, then the formulation is akin to nodal analysis
solutions at every time-step of the analysis interval. (NA).1 In this case, G, which may reflect an unstructured
The remainder of this paper is organized in the following grid, is a sparse matrix very similar to those encountered
manner. In Section II, we present some background on the in (finite-difference) discretized 3-D problems (no more than
power grid simulation problem and in Section III we introduce seven elements per row). For power grids however, the third
H-Matrix theory to the extent required for this paper. Existing dimension is shallow, compared to the other dimensions, as
code that supports H-Matrices representations [7] can handle chip height is much smaller and less dense than the die
only 2-D problems and assumes geometrical information is area. On the other hand, and since in the assumed model
available which may not always be the case. Therefore, we only have capacitances between different planes in the
3-D extensions as well as an algebraic version of the H- z direction, C is a 3-diagonal sparse matrix. Both matrices
Matrices formulation were developed and will be described in are therefore extremely sparse and fairly regular. Adding
Section IV, along with several other developments that were additional capacitance coupling increases the density in the
necessary for testing purposes. In Section V, we compare our C matrix, but most of the properties are retained.
academic implementation against a highly optimized direct In order to analyze the grid in the time domain, we can use,
solver using state-of-the-art storage and reordering algorithms as an example, Euler’s method and discretize the time interval
on several 2-D and 3-D, regular and irregular synthetic prob- of interest in steps of constant size h, obtaining
lems. Then in Section VI we show the results of applying  
C C
the same techniques on real power grid models. Finally, in + G v(t) = i(t) + v(t − h) (2)
h  h 
Section VII conclusions are drawn.   
A b(t)

where A is the system matrix and b(t), is the right-hand side


II. Background
at each time-step. Analysis of the power grid then entails
A. Problem Model solving the above system, Av(t) = b(t) at every time step in
For this paper we assumed a simplified 3-D power grid the analysis interval.
model like the one depicted in Fig. 1. An arbitrary number of 1 Considering VDD and GND bias one would use modified nodal analysis
metal layers dedicated to power delivery can be considered. (MNA) which is the common procedure. Here, and without loss of generality,
This model also assumes that VDD and GND strips, as well as we simply compute Norton equivalents to the VDD sources and then use NA.
vias, are modeled resistively (Rstrip and Rvia , respectively). The This has two immediate advantages: the system matrix becomes symmetric
and better conditioned. Alternatively, one could use superposition. In this case
coupling resulting from the overlapping between metal strips the solution to the VDD bias sources is trivial and (1) needs to be solved to
in different levels is modeled through Coverlap . Notwithstand- determine the response to the remaining sources.
SILVA et al.: EFFICIENT SIMULATION OF POWER GRIDS 1525

B. Problem Complexity If, for instance, B11 and B22 are fairly dense (large number of
The trivial solution to (2) is v = A−1 b. Unfortunately, nonzero entries) and full rank, they should be represented by
although A is sparse, A−1 is full and therefore computing full matrices. On the other hand, if B12 and B21 are low rank
the inverse is prohibitive. Iterative methods, such as conjugate they can be represented in a factorized form
gradient (CG), preconditioned CG (e.g., incomplete Cholesky
BIJ ≈ MN T (4)
preconditioned CG (ICCG) [4]) or multigrid (MG) can be used
to solve (2). Given that the system matrix, A, is symmetric and where M ∈ R#I×k , N ∈ R#J×k and k is the rank of the block
positive-definite, CG-type methods appear to be prime tech- up to some accuracy, eps (#I and #J represent, respectively,
niques for this problem. Furthermore their convergence rates the number of rows and columns of matrix block BIJ ). This
grow slowly with problem size, their memory requirements representation will be most efficient if k  #I, #J. Obviously,
are minimal, and the cost per iteration is quite low. However, not all blocks will allow such a representation.
the need for a good preconditioner which requires non-trivial The goal of finding a H-Matrix representation is akin
effort to generate, and the fact that solutions at multiple time to determining the right reordering and blocking of nodes
points along a pre-defined time interval are required, end (rows/columns), that maximizes the number of blocks that can
up dwarfing increased usage of such techniques. Multigrid- be represented in factorized form. Note that the low-rank block
type algorithms are interesting in that they possess optimal representation in (4) is approximate. Using a pre-specified
theoretical complexity properties. However, in practice, the accuracy parameter eps, we can control the rank-k used to
setup costs, the need for multiple time point solutions, and represent the low-rank blocks in factorized form [7], [11].
the constant terms associated with the complexity estimates Error bounds for the low rank approximation of matrix blocks
seem to cast a shadow on the advantages, as discussed in [5]. can be derived and are available in many cases [7]. If a block
For dynamic analysis of power grid systems, direct methods is not represented in factorized form, it may be further split
are still the methods of choice, as the cost of computing matrix and its sub-blocks recursively tested for such a representation.
factors is amortized over all time-step solutions. So, H-Matrices are matrices which result from the recursive
multilevel splitting of matrix blocks, until low-rank blocks are
found and represented in factorized form, or no further sparsity
III. Hierarchical Matrices is available and full matrices must be used. Of course, the
Hierarchical Matrices, or H-Matrices [8], [9], enable matrix optimum storage scheme for a sparse stiffness matrix such as
operations in almost linear complexity, where “almost linear” A in (2) is well known, with only nonzero elements being
means linear up to logarithmic factors. H-Matrices are the stored in an efficient way. The more interesting question is
algebraic counterpart of panel clustering techniques [10] for how to represent the corresponding LU or Cholesky factors.
integral operators. Due to the existence of Green’s function While a matrix resulting from an FD discretization in 1-D
for elliptic problems, these techniques can be extended to yields no LU or Cholesky fill-in, the same does not apply to
inverses [11] and factorizations [12], [13] of finite-element 2-D matrices and certainly not to 3-D matrices, whose factors
and finite-difference type matrices. Given the already noted tend to fill up.
structural similarity between the power grid formulation and
those resulting from elliptic PDEs [5], this implies that effi- B. H-Matrix Splitting Approaches
cient representations exist for the inverse and the LU/Cholesky
In this paper, we use H-Matrices to represent the Cholesky
factors of the underlying matrix describing the power grid
factors of power grid models. This can be achieved through
model. Once that representation is formed, efficient, almost
one of two distinct approaches: geometrically or algebraically.
linear computations are within grasp. Theoretical and practical
results indicate both storage and computational complexity to 1) H-Matrix Geometric Splitting: In the geometric ap-
grow as O(n × loga (n)), for some small constant a [7], [14], proach, the criteria to decide whether a block allows a low-
when using H-Matrices. In this section, we discuss how to rank approximation is based on the geometry of the underlying
generate and operate with such a representation. medium discretization. This criteria is also called admissibil-
ity [9], [14], meaning whether the block admits a represen-
tation by a low-rank factorization or not. The admissibility
A. H-Matrix Fundamentals
check procedure, and therefore the reordering and clustering
H-Matrices are efficient data-sparse representations of cer- of nodes, must be such that blocks are obtained that allow
tain densely populated matrices. The basic idea is to split a for a low-rank factorization in terms of the LU or Cholesky
given matrix into a hierarchy of blocks and approximate each factors, not on the original sparse matrix.
of the blocks by a low-rank matrix. H-Matrices are super- Consider two sets of nodes in the physical domain, which
matrices (in the sense that they may contain either full, low- we term as clusters, and assume that each cluster has a radius
rank, and other super-matrices) with an inherent hierarchy of corresponding to a bounding box that includes all nodes in
block splitting. the cluster. If the physical distance between clusters is much
Consider a splitting of B = A−1 from (2), which we know larger than the cluster radius, it is likely that the interaction
to be full, in 2 × 2 blocks between nodes in separate clusters can be represented by a
 
B11 B12 low-rank approximation (this type of reasoning is akin to a
B= . (3)
B21 B22 multipole-type approximation [15], [16]). If the cluster size
1526 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 10, OCTOBER 2010

is too large to satisfy the admissibility criteria, it is split and from the finest graph which represents the matrix and build
each resulting sub-cluster is then checked. clusters over its nodes, then build a coarse graph by merging
Therefore, in order to represent the matrix block AIJ , which the nodes in the same cluster, and repeat this process on
relates the nodes in I to the nodes in J in the domain of the LU the coarsened graphs until we reach a single cluster. This
or Cholesky factors, by a low-rank sub-matrix, an admissibility procedure uses the edge weights from the coarse graphs to
condition such as the following must be observed [7]: make decisions on the merging of nodes.
In essence, the cluster tree is a format to represent and
min{diam(XI ), diam(XJ )} ≤ η × dist(XI , XJ ) (5) store the matrix. The H-Matrix representation has the same
hierarchy or tree structure as the cluster tree. The matrix
where XI and XJ are the supports of the local basis functions
row/column indexes are in fact the leaves of the cluster tree.
of I and J, respectively, diam(·) is the Euclidean diameter of
the bounding boxes, dist(·) is the Euclidean distance between D. H-Matrices Arithmetic and Complexity
the bounding boxes, and η is an admissibility parameter (used
to relax or restrict the admissibility criteria). See [7] for details. As discussed, our goal is to directly compute a hierarchical
The splitting process is repeated until a minimal block size LLT Cholesky factorization of the symmetric matrix A in
is reached, nmin , or the blocks can be approximated in a low- (2) in an efficient manner and then proceed with forward
rank sense. Large blocks are inefficiently approximated by (4) and backward solves at each time-step (for non-symmetric
and must be split whenever possible in order to give origin formulations, an LU factorization is generated and used in
to smaller low rank blocks. This approach is the right one a similar fashion).
when handling regular structures where geometrical distance In terms of complexity, the storage of a rank k factorized
is a good criteria for estimating (electrical) influence. n × n matrix requires 2 × k × n (instead of n2 ) elements. For
2) H-Matrix Algebraic Splitting: An alternative approach the H-Matrix, the storage is O(n × log(n) × k), in which k is
is based on the algebraic information of the matrix. This is the worst-case rank of the sub-matrices of the H-Matrix [14].
the technique of choice for irregular structures and likely the The Cholesky decomposition in the H-Matrix format requires
best one for common power grids. Whenever two nodes in the O(n × log2 (n) × k2 ) operations and the evaluation of the LLT
conductance matrix are “connected” by an entry exhibiting a factors in each time-step requires O(n×log(n)×k) operations,
large magnitude, this means that the nodes are indeed tightly which is proportional to the storage requirements of the H-
connected and should be kept together in the splitting process. Matrix representation of the Cholesky factors (for details,
This approach is akin to the standard heavy edge matching see [14]).
(HEM) algorithm [17]. In these methods, used for instance
in the publicly available graph partitioning tool Metis [17], IV. Implementation Framework
nodes connected by a large conductance and/or capacitance
The goal of this paper is to ascertain whether H-Matrix
are favored to be clustered together through the multilevel
representations of power grid models can enable its efficient
splitting process. The advantage of this method over the
space and time analysis and verification. In order to achieve
previous one is that no geometric information on the problem
this, we should be able to compress the power grid model
is required, since the structure of the H-Matrix relies solely
representation by efficiently computing reduced storage matrix
on the matrix entries. On the other hand, being able to follow
factors which in turn must be efficiently applied in the analysis
geometric admissibility conditions leads to a slightly better
at every time-step of the simulation.
approximation of the representation. The duality between
We must then convert the existing information regarding
geometric and algebraic splitting is quite similar to that found
the power grid structure into a representation leading to a
when considering multigrid and algebraic multigrid methods.
hierarchical Cholesky factorization of the system matrix. Ad-
ditionally, we require the ability to operate with matrices in H-
C. H-Matrix Representation Matrix representation in order to compute the system solution
To build an H-Matrix corresponding to a sparse matrix, we at each time point of the simulation interval. In this paper, the
first need to build what is known as a cluster tree over the factorization as well as all the required algebraic operations are
matrix index set. This cluster tree describes the hierarchical performed with the proper arithmetic functionality provided by
clustering of the nodes of the matrix (from bottom to top) that the H-Matrices library, HLib [14].
yields the hierarchical partitioning of the matrix. Note that the The current release of the HLib package [18] is freely
clusters are composed of nodes which may not be adjacent, available for research purposes under a simplified agreement.
which implies row-column reordering may be required. It consists of a set of routines for constructing and operating
The difference between the algebraic and the geometric with hierarchical matrix structures, i.e., cluster trees, block
approaches for building the cluster tree is that in the geo- cluster trees, low-rank factorized matrices, super-matrices, and
metric approach we use geometric conditions (admissibility so on. It also provides discretization functions for basic FEM
conditions) to determine whether or not a block of the matrix and BEM operators, as well as arithmetic functionality for per-
will be partitioned further. On the other hand, in the algebraic forming basic matrix operations like addition, multiplication,
approach the construction of the cluster tree is based on factorizations, and inversion. The library also contains routines
multilevel clustering methods which are widely used in graph that allow conversion of sparse and dense matrices into H-
partitioning. The basic idea of multilevel clustering is to start Matrices as well as a set of additional functionality relevant in
SILVA et al.: EFFICIENT SIMULATION OF POWER GRIDS 1527

this context. However, the current release of the HLib library


was developed to deal with problems where inherent geometric
information is available. For cases where such information is
not available, applying the available methods would result in a
blind approach, assuming an underlying geometry that may not
exist. This scenario is nonetheless quite common if we want
to simulate a power grid that is available, after extraction,
at circuit-level, as a netlist or even a matrix representation
resulting from nodal analysis. While such a netlist could be
annotated with geometrical information, there is no standard
format for that. In this case, we do not know anything about
node displacement; we know only about their degree of electri-
cal connection. To solve this problem, we must resort to some
kind of algebraic H-Matrix splitting approach, as previously
described. Since such functionality was not available in the
HLib library, we had to add algebraic code on top of HLib.
The code we developed was mostly based on the techniques
described in [19], but several procedures were coded in order
to test different metrics for determining connection strength,
which directly impacts the clustering criteria.
Moreover, the functions available in HLib are only for 2-D
FEM or BEM problems. In a power grid model, while the
number of metal layers is normally much smaller than the
number of vertical and horizontal wires, a third dimension is
needed. So the code for 2-D data-structures was extended to
handle 3-D power grid models. Besides that, the FEM formu-
lation available in HLib was based on triangle discretization
of the medium which is not the most adequate discretization
for power grids models, since diagonal wires are not available
in most fabrication processes. Thus, we also had to develop Fig. 2. Algebraic splitting pseudo-code.
code for dealing with our basic FD problems.
Additionally, as we shall see in Section V, there are grids the previous level. We want to avoid propagating these isolated
that exhibit, for various reasons, multiple disconnected parts: nodes since that would lead to an unbalanced cluster tree and
different VDD domains, VDD and GND domains, and so on. consequently to a less efficient H-Matrix representation. In
In order to be able to handle these, we also had to adapt the both loops, each node is matched with its strongest connected
code used for the H-Matrices by artificially joining separate neighbor which is still unmatched. They will form a cluster.
physical domains into the same matrix. While some of the This cluster will be matched with another cluster in the next
modifications mentioned are not algorithmically complex, they level again using a similar connection strength criteria. At any
required a careful analysis of the library’s architecture, changes iteration, if there are nodes that were left unmatched they are
to several existing functions, as well as the development of marked with the priority flag for the next level. The algorithm
several other new functions. then proceeds in a hierarchical fashion. The final step of the
In the following, we will describe in more detail what we algorithm is needed in case the power grid is formed by two
believe was the most challenging and interesting extension or more disconnected parts, in which case we need to put
to the library, namely the implementation of the algebraic them together in an appropriate manner at the top-level of the
clustering approach based on [19] (see Fig. 2). In the algebraic H-Matrix (line 24 in Fig. 2).
approach, we use the information of the matrix itself and Since we are discussing implementation issues in this sec-
appropriate multilevel clustering methods. Here, we describe tion, it is worthwhile pointing out that some of the functional-
an approach akin to HEM [17] to obtain the corresponding H- ity of the HLib library can be parameterized in several ways.
Matrix. In this approach, the geometrical proximity criteria is In particular, the construction of H-Matrices can be controlled
replaced by an electrical equivalent: proximity is determined (at compilation time) by setting the following parameters:
by the strength of the connection between nodes or clusters of 1) k, for controlling the rank of the factored sub-matrices
nodes, as revealed by the magnitude of the matrix entries. [defined in (4)]; k = 0 means the rank is dynamically
In terms of efficiency, this approach is comparable to the adjusted to satisfy the required accuracy, eps (see next);
geometric approach based on nested dissection [13]. setting k to any positive nonzero value forces the choice
In Fig. 2, we present the algebraic approach pseudo-code. of the sub-matrices rank and allows direct control over
From the algorithm, we can see that we have two inner loops, the memory usage;
one for the “priority” nodes and the other for the remaining 2) eps, for the block-wise relative accuracy of the factor-
nodes. Priority nodes are nodes that were left unmatched in ization (for example eps = 10−2 may result in a low
1528 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 10, OCTOBER 2010

rank approximant while eps = 10−3 . . . 10−6 provides TABLE I


better and better approximations up to an almost exact H-Matrix Approximation Error and Required Resources vs. the
factorization); Accuracy Parameter for a 1024 × 1024 Regular Grid
3) η, for the admissibility condition in (5) for the geometric
case; eps Setup Solve Space Error
Time (s) Time (s) (MB)
4) nmin , as the minimum block size of the sub-matrices
≈ 10−0 16.15 0.21 297 5.522e-01
of the hierarchical matrices (below nmin no splitting 10−1 49.73 0.40 645 6.927e-03
occurs). 10−2 57.01 0.44 726 2.304e-03
Note that proper selection of the above parameters is of 10−3 68.09 0.50 857 2.491e-05
course problem dependent. 10−4 77.16 0.53 964 7.631e-06
10−6 90.71 0.58 1096 1.025e-07
10−9 123.24 0.69 1385 9.7611e-11
V. Results
In the following, we present the experimental setup used and Blas libraries. All codes were integrated in a single
in our paper and the corresponding results. The follow- executable. The comparisons we establish are thus reasonably
ing methods were tested for solving the system resulting fair even though HLib is not a commercial-quality code and
from the nodal analysis formulation of power grid models: our implementation is only a prototype. The sparse Cholesky
i) hierarchical Cholesky based on geometric admissibility; code, on the other hand, has a slight edge since it is based in
ii) hierarchical Cholesky based on algebraic admissibility; and highly optimized code which is the basis for many commercial
iii) sparse Cholesky. We have decided to account for three codes. In any case, what we consider relevant are not the
measures of efficiency. The first is the setup time, in which absolute values obtained in the experiments but rather the
the hierarchical methods compute the super-matrix structures comparative trends of complexity. Finally, experiments were
and the Cholesky decomposition, while the sparse Cholesky run on a Dual AMD Opteron operating at 2.4 GHz with
approach computes a matrix row/column permutation which 16 GB of memory for increasing discretization sizes. In all
tends to minimize fill-in in the Cholesky factors during the experiments, the norm of the solution vector was computed
decomposition process. The second is the solve time, where in order to verify the correctness of the solution. Time was
the Cholesky factors are used to obtain the solution to the measured with the system function clock from glibc.
desired time-steps in a time analysis. Finally, we measured
storage requirements for all methods. We also discuss how B. Approximation Error
the accuracy parameter eps affects the approximation error.
As mentioned at the end of Section IV, the construction and
operation of an H-Matrix can be constrained by appropriately
A. Experimental Setup setting several parameters. One of these, eps, is related to
Since our paper focuses on the efficiency of power grid sim- the accuracy of the approximation leading to the H-Matrix
ulation, and for the sake of complexity analysis, we decided representation. eps is a relative threshold and it is used
to work with artificially generated matrices. These represent to control the rank of the blocks resulting from a matrix
the main characteristics of a system resulting from the model factorization (recall (4)) by truncating singular values below a
extraction of a power grid structure. By choosing artificial certain relative magnitude.
matrices, we can easily control their size and characteristics In Table I, one can see how the parameter eps affects the
in a meaningful way. In the following we will show results accuracy of the solution as well as the resource requirements.
for 2-D and 3-D problems, representing regular and irregular The table shows the error obtained in the solution of a simple,
grids, of increasing sizes. In the 3-D grids, the number of regular, medium sized 2-D problem for some given right hand
metal layers has been fixed at 8 (z direction). Irregularity is side. From the table, we can observe that as we increase eps,
emulated by generating random matrix entries in the interval so does the accuracy of the approximation, as well as the
[0, 1[ and discarding entries smaller than 0.5. demand for resources, particularly space. Of course, if the
The algorithms tested were implemented in C upon the example grid was larger (with several millions of nodes, for
libraries HLib [7] and Cholmod [20]–[22]. HLib provides instance) the resource needs would increase more strongly
some code for 2-D FEM problems, which has been modified with increasing accuracy. Nevertheless, this example shows
to handle our 2-D FD problems. As previously described, quite clearly that the time and space complexities are rising
the code related to the hierarchical algebraic approach was much slower compared to the added precision in the solution.
implemented on top of the HLib data structures and uses In this example, it is possible to infer that an eps of around
library functions for H-Matrix arithmetic. We will use it for 10−3 leads to a very good accuracy and seems to be a
3-D problems. Cholmod, from the SuiteSparse package from reasonable initial choice. Note that the extreme values of
the University of Florida (which also provides UMFpack eps shown in the table are very exaggerated. The smallest
among other well-known tools) was used to implement the value, eps ≈ 1, implies that the rank of all factorized sub-
sparse Cholesky method and was compiled with the su- matrix blocks will be extremely small (1 or 2), leading to
pernodal option for maximum performance. Both HLib and significant savings in memory, but at great cost in accuracy.
Cholmod used the same versions of the widely known Lapack Such a value is clearly of no use whatsoever. Similarly, the
SILVA et al.: EFFICIENT SIMULATION OF POWER GRIDS 1529

TABLE II TABLE III


Setup Times (in Second) Solve Times Per Time-Step (in Second)

Number Sparse Hierarchical Number Sparse Hierarchical


of Nodes (Cholmod) (Geometric) of Nodes (Cholmod) (Geometric)
16 384 0.11 0.68 16 384 0.01 0.01
65 536 0.71 3.52 65 536 0.05 0.05
2-D reg. 262 144 6.27 16.70 2-D reg. 262 144 0.21 0.24
1 048 576 124.27 72.61 1 048 576 0.92 0.93
4 194 304 1285.64 375.41 4 194 304 4.02 3.72
16 777 216 9921.41 4068.08 16 777 216 21.72 17.74
Number Sparse Hierarchical Number Sparse Hierarchical
of Nodes (Cholmod) (Algebraic) of Nodes (Cholmod) (Algebraic)
3-D reg. 131 072 39.09 81.86 3-D reg. 131 072 0.27 0.39
524 288 467.99 415.58 524 288 1.43 1.80
2 097 152 5137.12 2230.62 2 097 152 7.26 8.58
Number Sparse Hierarchical Number Sparse Hierarchical
of Nodes (Cholmod) (Geometric) of Nodes (Cholmod) (Geometric)
16 384 0.12 0.64 16 384 0.01 0.02
65 536 0.70 3.27 65 536 0.05 0.05
2-D irreg. 262 144 7.15 15.04 2-D irreg. 262 144 0.20 0.21
1 048 576 130.38 66.95 1 048 576 0.92 0.87
4 194 304 1307.74 555.93 4 194 304 4.03 3.45
16 777 216 9296.28 7425.37 16 777 216 21.68 16.22
Number Sparse Hierarchical Number Sparse Hierarchical
of Nodes (Cholmod) (Algebraic) of Nodes (Cholmod) (Algebraic)
131 072 30.77 50.77 3-D irreg. 131 072 0.27 0.28
3-D irreg. 524 288 39.08 104.26 524 288 1.43 0.87
2 097 152 146.75 507.67 2 097 152 7.25 3.80
8 388 608 Out-of-mem 47838.57 8 388 608 Out-of-mem 15.49

the sparse code. The setup for the hierarchical approach


last value, eps = 10−9 leads to a solution of almost machine appears to grow more slowly than the sparse approach, as
precision, a level of accuracy which for most purposes is not expected, except for the largest 2-D examples, regular and
necessary. irregular. Interestingly enough, the irregularity of the grid
The actual values shown in the table are not to be taken does not seem to affect the sparse approach as it affects the
too literally as the example used is a very simple, highly geometric hierarchical solver. For the 3-D irregular case it is
constrained example. Its value is mostly qualitative, showing hard to draw any conclusions since the irregularity has a larger
how space and time can be traded for accuracy. Interestingly impact in this case.
enough, the solve portion of the problem seems fairly opti-
mized and varies very little with the various choices of eps. D. Solve Time
To compute the solve time, the Cholesky factors were used
C. Setup Time to solve for a given right-hand side. Results for solving a single
The setup time corresponds to the time spent in creating right-hand side are presented in Table III, respectively, for
matrix and vector structures, and computing a priori factor- 2-D and 3-D regular grids, and 2-D and 3-D irregular grids. In
izations. It should be noted that the setup time is a one time terms of the solve time, we notice a similar pattern as for the
cost that will get amortized through repetitive usage of the setup time. However, the efficiency of the sparse solver slightly
model, in our case the solution of the power grid in a sequence delays the advantage of the hierarchical methods for larger
of time points in the interval of interest. The results for the problems. In the 2-D cases, only for problems with around
setup time are presented in Table II, respectively, for 2-D and 4 million nodes, we do see a clear advantage. For the 3-D
3-D regular grids, and 2-D and 3-D irregular grids. The results problem the advantage becomes noticeable sooner due to the
in the table are not entirely enlightening but then again the added complexity of the problem and the data clearly shows
computations being performed are also quite different and the slower growth rate for the hierarchical method which is
direct data comparison is thus misleading. Still, it appears, also able to handle much larger problems.
looking at raw values, that the hierarchical approaches clearly
become more efficient than the sparse solver as problem sizes E. Storage Requirements
grow. For 2-D problems, this becomes noticeable at about 1 Finally, we look at the storage requirements of the various
million nodes. However, a more careful look at the tables, methods. Only matrix structures were taken into account (since
in particular a comparison of the ratios between setup times vectors do not add that much to the total memory) and among
on consecutive grids, sometimes shows non-monotonic growth these, only matrices used repeatedly in the time analysis.
rate behavior, which is unexpected. This happens mostly for We assume other matrix structures can be freed after setup,
1530 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 10, OCTOBER 2010

TABLE IV resistors per node, which gives an estimate on the density of


Storage Requirements (in Megabyte) the system matrix.
In order to deal with grids that have disconnected parts
Number Sparse Hierarchical (two disconnects at least are always present, representing the
of Nodes (Cholmod) (Geometric) VDD and GND networks) we had to, as previously mentioned,
16 384 7 11 adapt the code used for the H-Matrices. A number of two
65 536 33 49
disconnects implies that, after clustering all nodes according to
2-D reg. 262 144 151 210
1 048 576 674 873 the HEM algorithm, we get two separate clusters. This means
4 194 304 2973 3555 the first block splitting in the matrix is the VDD and GND
16 777 216 13 044 14 351 networks splitting, and in fact we are dealing with two separate
Number Sparse Hierarchical problems inside the same matrix. While this is transparent to
of Nodes (Cholmod) (Algebraic) both Cholmod and HLib, there is a significant consequence:
3-D reg. 131 072 256 339 we are dealing with two independent sub-problems with half
524 288 1406 1559
the size of the original problem.
2 097 152 7314 7022
Number Sparse Hierarchical Given this situation, we chose to try our algorithms on
of Nodes (Cholmod) (Geometric) power grids ibmpg2 and ibmpg4 since, for the purpose
16 384 7 10 of establishing a comparison, they represent a good pair of
65 536 33 47 examples (after short-circuit reduction and Norton equivalents
2-D irreg. 262 144 151 199 are computed). The corresponding results are presented in
1 048 576 674 821 Table VI for Cholmod and the algebraic approach of HLib
4 194 304 2973 3258
(we are not considering any geometric information on the
16 777 216 13 044 13 191
Number Sparse Hierarchical grid). The last column represents the maximum relative error
of Nodes (Cholmod) (Algebraic) of the HLib solution compared to the solution of Cholmod,
131 072 256 229 that is
3-D irreg. 524 288 1406 677

2 097 152 7314 2996 |vCholmod − vHLib |


8 388 608 Out-of-mem 10 495 max. rel. err. = max i i
, ∀vi ∈ v (6)
|vi
Cholmod
|
and only evaluation matrices matter. The storage requirement
where v is the solution voltage vector. The eps parameters of
results are shown in Table IV again for 2-D and 3-D regular
H-Matrices were adjusted so that this error is below 1%. We
grids, and 2-D and 3-D irregular grids. The results in the
can see that for these problems, Cholmod is still better than
table are again similar to the other metrics, but here the
HLib. Notwithstanding, results for the artificially generated
sparse solver is more competitive. This is likely a result of
2-D and 3-D grids clearly indicate H-Matrices can be more
the sparse solver targeting lower fill-in in the matrix factors,
efficient than Cholmod from a certain problem dimension
while the hierarchical approaches, even though they also
onward. In the case of 3-D grids, this point occurs before than
indirectly attempt to minimize fill-in, their main target is to
for 2-D grids. This is due to the fact that in 3-D problems,
compress the representation a posteriori. It is interesting to
much more fill-in results from the Cholesky factorization and
observe that for the 3-D irregular case the H-Matrix com-
thus the matrix is more complex to deal with. With the
pression is clearly more efficient than the fill-in minimization
ability of H-Matrices to compress the Cholesky factor, this
ordering of the sparse approach.
makes HLib behave more efficiently sooner in the case of
As a final conclusion, we believe that the hierarchi-
3-D problems. As we can see, the real power grids that we
cal approaches are indeed quite promising when compared
have had access to are somewhere between 1-D and 2-D,
to a state-of-the-art sparse direct solver. We observe that
which implies this gain of HLib vs. Cholmod will occur for
the geometric hierarchical approach can solve 16 million
higher problem dimensions. In fact, for 1-D problems, there
grid nodes using 14 GB of RAM at a rate of almost
is no fill-in, so no compression can be achieved since the
1 million nodes per second.
optimal representation for the Cholesky factor is the original
sparse matrix representation. Still, from the results in Table VI,
VI. Experiments on Real Power Grids specifically the columns indicating the ratios between the
A suite of small to medium sized power grid bench- HLib and Cholmod data, we see that the performance ratio
marks [23] was published with the purpose of motivating decreases as the size of the problem is increased. This agrees
research in this subject. Table V summarizes the characteristics with our observations for the artificial 2-D and 3-D problems,
of these benchmark grids along with some statistics. For our although the advantage of using H-Matrices will occur for
experiments, the Norton equivalent of all voltage sources was higher dimension problems, or for problems where there are
computed, so that instead of MNA, simple NA could be more non-zero elements per row (more connections akin to
used. Furthermore, vias between metal layers in these grids 2-D or 3-D problems). Unfortunately, we did not have access
were modeled by short-circuits (zero-valued voltage sources to larger or denser real power grids to experiment with. As
or zero-valued resistors) which were replaced by appropriate for the apparent high setup time of H-Matrices, this is in
equivalents. In the referred table, we have the number of part expected since the complexity of factorization has higher
SILVA et al.: EFFICIENT SIMULATION OF POWER GRIDS 1531

TABLE V
Statistics for IBM Power Grid Benchmarks

Benchmark n1 v n2 r r/n2 i d
ibmpg1 30 635 14 308 16 327 30 027 1.84 10 874 5
ibmpg2 127 235 330 126 905 208 325 1.64 38 046 2
ibmpg3 851 581 955 850 626 1 401 572 1.65 201 548 3
ibmpg4 953 580 962 952 618 1 560 645 1.64 277 288 2
ibmpg5 1 079 307 539 087 540 220 1 076 848 1.99 540 900 5
ibmpg6 1 670 491 836 239 834 252 1 649 002 1.98 761 616 2
n1 is the original number of nodes, n2 is the number of nodes after computing the Norton equivalent of all voltage sources and removing short-circuits, r is
the number of resistors, i is the number of current sources (ports) after computing the Norton equivalent, and d is the number of disconnects.

TABLE VI
Results for ibmpg2 and ibmpg4 with Cholmod and HLib [Raw Results and Ratios Between Results (HLib/Cholmod)]

Benchmark Method Setup Time Ratio Solve Time Ratio Space Ratio Max. Rel. Err.
(s) (HL/Ch) (s) (HL/Ch) (MB) (HL/Ch)
Cholmod 1.57 19.54 0.08 2.63 69 2.93 –
ibmpg2
HLib 30.68 0.21 202 4.897e-04
Cholmod 111.62 5.20 1.04 2.40 996 2.44 –
ibmpg4
HLib 579.92 2.50 2432 7.506e-03

constants than the complexity of the solution and memory [6] Z. Zhu, B. Yao, and C.-K. Cheng, “Power network analysis using an
adaptive algebraic multigrid approach,” in Proc. ACM/IEEE DAC, Jun.
requirements (although also almost linear). But again, what 2003, pp. 105–108.
we would like to point out here is that the gap between the [7] S. Börm, L. Grasedyck, and W. Hackbusch, Hierarchical Matri-
setup times of HLib and Cholmod is quickly vanishing, which ces. Leipzig, Germany: Max Planck Institute for Mathematics in the
Sciences, Jun. 2006.
seems to indicate that for larger problems using H-Matrices [8] W. Hackbusch, “A sparse matrix arithmetic based on H-matrices, part
is compensatory. I: Introduction to H-matrices,” Computing, vol. 62, no. 2, pp. 89–108,
1999.
[9] W. Hackbusch and B. N. Khoromskij, “A sparse H-matrix arithmetic,
VII. Conclusion part II: Application to multi-dimensional problems,” Computing, vol. 64,
H-Matrices are hierarchical matrix representation schemes no. 1, pp. 21–47, 2000.
[10] W. Hackbusch and Z. P. Nowak, “On the fast matrix multiplication in
whereby blocks of the matrix are represented by low rank the boundary element method by panel clustering,” Numer. Math., vol.
factorizations in a compact form. This representation enables 54, no. 4, pp. 463–491, 1989.
computations and storage in almost linear time. In this way, [11] M. Bebendorf and W. Hackbusch, “Existence of H-matrix approximants
to the inverse FE-matrix of elliptic operators with L∞ -coefficients,”
Cholesky factors can be efficiently computed and represented, Numer. Math., vol. 95, no. 1, pp. 1–28, 2003.
leading to fast system storage and solution and enabling effi- [12] M. Bebendorf, “Hierarchical LU decomposition-based preconditioners
cient power grid analysis. Experimental results on artificially for BEM,” Computing, vol. 74, pp. 225–247, 2005.
[13] S. L. Borne, L. Grasedyck, and R. Kriemann, “Domain-decomposition
generated and realistic test cases show that using the hierarchi- based H-LU preconditioners,” in Proc. LNCSE, 2005, pp. 661–668.
cal Cholesky representation can be very efficient in both space [14] L. Grasedyck and W. Hackbusch, “Construction and arithmetics of H-
and time, when dealing with the simulation of large power matrices,” Computing, vol. 70, no. 4, pp. 295–334, 2003.
[15] L. Greengard and V. Rokhlin, “A fast algorithm for particle simulations,”
grid models. Comparisons of our academic implementation J. Comput. Phys., vol. 73, no. 2, pp. 325–348, Dec. 1987.
against a state-of-the-art sparse Cholesky code, using a very [16] L. Greengard, The Rapid Evaluation of Potential Fields in Particle
efficient reordering scheme, show that the hierarchical matrix Systems. Cambridge, MA: MIT, 1988.
[17] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel
representations is very competitive. hypergraph partitioning: Applications in VLSI domain,” IEEE Trans.
Very Large Scale Integr., vol. 7, no. 1, pp. 69–79, Mar. 1999.
[18] W. Hackbusch, S. Börm, and L. Grasedyck. HLib Package [Online].
References Available: http://www.hlib.org
[1] D. Kouroussis and F. N. Najm, “A static pattern-independent technique [19] S. Oliveira and F. Yang, “An algebraic approach for H-matrix precon-
for power grid voltage integrity verification,” in Proc. ACM/IEEE DAC, ditioners,” Univ. Iowa, Iowa City, Tech. Rep. TR168, 2006.
Jun. 2003, pp. 99–104. [20] T. A. Davis and W. W. Hager, “Modifying a sparse Cholesky factoriza-
[2] S. R. Nassif and J. N. Kozhaya, “Fast power grid simulation,” in Proc. tion,” SIAM J. Matrix Anal. Applicat., vol. 20, no. 3, pp. 606–627, Jul.
ACM/IEEE DAC, Jun. 2000, pp. 156–161. 1999.
[3] S. Pant and E. Chiprout, “Power grid physics and implications for CAD,” [21] T. A. Davis and W. W. Hager, “Multiple-rank modifications of a sparse
in Proc. 43rd Annu. Conf. DAC, 2006, pp. 199–204. Cholesky factorization,” SIAM J. Matrix Anal. Applicat., vol. 22, no. 4,
[4] T.-H. Chen and C. C.-P. Chen, “Efficient large-scale power grid analysis pp. 997–1013, 2000.
based on preconditioned Krylov-subspace iterative methods,” in Proc. [22] T. A. Davis and W. W. Hager, “Row modifications of a sparse Cholesky
ACM/IEEE DAC, Jun. 2001, pp. 559–562. factorization,” SIAM J. Matrix Anal. Applicat., vol. 26, no. 3, pp. 621–
[5] J. N. Kozhaya, S. N. Nassif, and F. N. Najm, “A multigrid-like technique 639, 2005.
for power grid analysis,” IEEE Trans. Comput.-Aided Des. Integr. [23] S. R. Nassif, “Power grid analysis benchmarks,” in Proc. ASP-DAC,
Circuits, vol. 21, no. 10, pp. 1148–1160, Oct. 2002. 2008, pp. 376–381.
1532 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 10, OCTOBER 2010

João M. S. Silva (M’02) was born in Lisbon, Dr. Phillips was an Associate Editor of the IEEE TRANSACTIONS ON
Portugal. He received the Licenciatura, M.S., and COMPUTER-AIDED DESIGN from 2005 to 2009, and has served on the
Ph.D. degrees in electrical and computer engineering Executive Committee of the International Conference on Computer-Aided
from Instituto Superior Técnico, Technical Univer- Design (ICCAD), as well as the technical program committees of numerous
sity of Lisbon, Lisbon, in 2000, 2003, and 2008, conferences including DATE, DAC, and ICCAD.
respectively.
He was a Researcher with INESC ID, the Systems
and Computers Engineering Institute, Research and Luı́s Miguel Silveira (S’85–M’95–SM’00) was
Development. He is currently a CAD Engineer with born in Lisbon, Portugal. He received the Engineers
Synopsys, Mountain View, CA. His current research (summa cum laude) and Masters degrees in electrical
interests include computer-aided design algorithms and computer engineering from the Instituto Supe-
for electronics, physical design of very large scale integration circuits, and rior Técnico (IST), Technical University of Lisbon,
computer architectures. Lisbon, in 1986 and 1989, respectively, and the M.S.,
E.E., and Ph.D. degrees from the Massachusetts
Institute of Technology, Cambridge, in 1990, 1991,
Joel R. Phillips (S’91–M’97–SM’05–F’09) received and 1994, respectively.
the S.B., M.S., and Ph.D. degrees from the Mas- He is currently a Professor of electrical and com-
sachusetts Institute of Technology, Cambridge, in puter engineering with IST, Technical University of
1991, 1993, and 1997, respectively. Lisbon, a Senior Researcher with INESC ID, the Systems and Computers
Since 1997, he has been with Cadence Design Engineering Institute, Research and Development, and a founding member of
Systems, Berkeley, CA. the Lisbon Center of the Cadence Research Laboratories. His current research
His current research interests include the applica- interests include various aspects of computer-aided design of integrated
tion of numerical simulation techniques to problems circuits, with emphasis on interconnect modeling and simulation, parallel
in electronic design automation. He received notable computer algorithms, and the theoretical and practical issues concerning
paper citations from the Design Automation Con- numerical simulation methods for circuit design problems.
ference (DAC) 2000, DAC 2002, the Proceedings Dr. Silveira is an Associate Editor of the IEEE Transactions on
of the 15th Symposium on Integrated Circuits and Systems Design, Design, Computer-Aided Design and has served on the technical program commit-
Automation and Test in Europe (DATE) 2004, the 2006 International Con- tees of numerous conferences, including DATE and ICCAD. He has received
ference on Very Large Scale Integration of System-on-Chip, and the 1995 notable paper citations at various conferences, including ECTC, ICCAD,
Copper Mountain Conference on Multigrid Methods. DAC, DATE, and SBCCI. He is a member of Sigma Xi.

You might also like