Floorplanning With Graph Attention

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Floorplanning with Graph Attention

Yiting Liu, Ziyi Ju, Zhengming Mingzhi Dong∗ Hai Zhou


Li School of Computer Science IC Bench, Inc.
School of Computer Science Fudan University Shanghai, China
Fudan University Zhangjiang Fudan International hai@icbench.com
Shanghai, China Innovation Center
{yitingliu20,zyju20,zhengmingli21}@fudan.edu.cn Shanghai, China
mingzhidong@gmail.com

Jia Wang Fan Yang, Xuan Zeng Li Shang


Department of ECE School of Microelectronics School of Computer Science
IIIinois Institute of Technology Fudan University Fudan University
Chicago, USA Shanghai, China Shanghai, China
jwang@ece.iit.edu {yangfan,xzeng}@fudan.edu.cn lishang@fudan.edu.cn

ABSTRACT have resorted to doing the job manually, usually taking months of
Floorplanning has long been a critical physical design task with high intense efforts [13]. In order to improve the productivity, Google
computation complexity. Its key objective is to determine the initial has developed a floorplanner based on deep reinforcement learn-
locations of macros and standard cells with optimized wirelength for ing [13]. Adopting the approach used in the successful AlphaGo
a given area constraint. This paper presents Flora, a graph attention- system, it treats the floorplanning as a sequence of moves, each of
based floorplanner to learn an optimized mapping between circuit which places a macro on the chip. When all the macros are placed,
connectivity and physical wirelength, and produce a chip floorplan the down-stream physical implementation stages are conducted
using efficient model inference. Flora has been integrated with using commercial tools and a final reward is computed. Similar to
two state-of-the-art mixed-size placers. Experimental studies using AlphaGo, this final reward is back-propagated to each situation
both academic benchmarks and industrial designs demonstrate and movement in the sequence, and an evaluation network and a
that compared to state-of-the-art mixed-size placers alone, Flora policy network are trained based on these rewards. It has reported
improves placement runtime by 18%, with 2% wirelength reduction better than human expert results on TPU designs, generated under
on average. 6 hours by the system.
Our work has been greatly inspired by Google’s work, and we
KEYWORDS agree that deep learning is a promising way to solve the chip floor-
planning problem. On the other hand, we are rather conservative
floorplanning, physical design, electronic design automation, graph
on using reinforcement learning on chip floorplanning, because
attention network, deep learning
decomposing the task into a sequence of moves, each only places
1 INTRODUCTION one block, complicates the job. Quite different from the Go game,
where a player can only take one move before the opposite’s un-
Chip floorplanning has been a critical and challenging task in the
known next move, chip designers have the complete connectivity
physical implementation of VLSI chips. It affects key down-stream
information of the circuit netlist upfront and rarely need to do the
optimization objectives, among which placement wirelength is
floorplan in a sequential process, one block at a time. In reality,
the most important one. Different from traditional floorplanning
having a holistic view of the circuit in mind, an expert designer
which only places macros, the modern version inputs a netlist of
usually partitions the circuit into finer and finer subcircuits, and
mixed macro blocks and standard cells, and places them so as to
each time simultaneously places all subcircuits at one level onto the
optimize physical wirelength for a given area constraint, which
chip, optimizing the data flow among them. It is, in essence, a pro-
closely correlates with timing and routability[10].
cess of establishing a rough floorplan at a time with an optimized
Proven to be NP-hard even in the classical formulation [14], chip
mapping from subcircuit connectivity to physical locations.
floorplanning is difficult to solve efficiently using algorithmic ap-
In this work, we pursue a Graph Attention Network (GAT)-based
proaches. Lack of effective automatic tools, most expert designers
approach for chip floorplanning. By the general rule of thumb that
∗ Corresponding author. “Where the human intuition plays, neural networks may triumphs”,
Permission to make digital or hard copies of all or part of this work for personal or
the main idea of our approach is to utilize GAT [16] to gain a
classroom use is granted without fee provided that copies are not made or distributed holistic understanding of the subcircuit connectivity, and learn an
for profit or commercial advantage and that copies bear this notice and the full citation optimized mapping between subcircuit connectivity and physical
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, wirelength, and then decode the physical locations of circuit blocks
to post on servers or to redistribute to lists, requires prior specific permission and/or a with efficient model inference. The resulted floorplan is then used
fee. Request permissions from permissions@acm.org. to drive the downstream mixed-size placement task.
DAC ’22, July 10–14, 2022, San Francisco, CA, USA
© 2022 Association for Computing Machinery.
This work aims to answer the following research questions.
ACM ISBN 978-1-4503-9142-9/22/07. . . $15.00
https://doi.org/10.1145/3489517.3530484

1303
DAC ’22, July 10–14, 2022, San Francisco, CA, USA Liu, et al.

(1) How to establish an optimized mapping between circuit con- netlist information upfront, the reinforcement learning based se-
nectivity and physical wirelength? We propose to use a GAT- quential optimization process may unnecessarily complicate the job.
based approach. The proposed model consists a shared GAT- Inspired by how circuit experts create and optimize chip floorplan,
based encoder followed by two task-specific autoencoders we pursue a GAT-based approach to learn a mapping from circuit
to learn an optimized mapping between circuit connectivity connectivity to block physical locations with optimized wirelength.
and physical wirelength, and generate the physical locations
of macros and standard cells, i.e., chip floorplan, which then 3 PROBLEM FORMULATION
drives the downstream placement task. A floorplan instance can be modeled as a hypergraph 𝐺 = (𝐶, 𝐸)
(2) How to train a GAT-based floorplanner without the need of with a set of objects 𝐶 (i.e., macros and clustered standard cells)
big real-world dataset? Real-world optimal floorplans are connected by hyperedges 𝐸. The primary objective of chip floor-
challenging to obtain. We propose a methodology to gen- planning can be formulated as minimizing the total wirelength
erate a synthetic training dataset to train the GAT model. 𝑊 (𝑐) while adhering to the density constraint 𝜌 (𝑐).
In the dataset, the mapping between circuit connectivity
and physical wirelength is guaranteed to be optimal. In ad-
dition, it offers a broad coverage of statistical connectivity min 𝑓 (𝑐) = 𝑊 (𝑐) 𝑠.𝑡 . 𝜌 (𝑐) ≤ 𝜌𝑡 . (1)
𝑐
distributions, to ensure the trained model generalizes well
The total wirelength 𝑊 (𝑐) of the floorplan can be estimated
on real-world designs.
as the sum of wirelength between all connected objects, which is
The proposed floorplanner, called Flora, has been integrated defined in Eq. 2.
with two state-of-the-art mixed-size placers [4, 7], which will be
𝑛 𝑛 𝑛 𝑛
publicly released along with the synthetic dataset. Experimental 1 ∑︁ ∑︁ 1 ∑︁ ∑︁
𝑊 (𝑐) = 𝑤𝑖,𝑗 = 𝑒𝑖,𝑗 𝑑𝑖,𝑗 𝑎𝑖,𝑗 , (2)
studies using both ISPD2005 benchmarks and real-world industry 2 𝑖=1 𝑗=1 2 𝑖=1 𝑗=1
designs demonstrate that, compared to state-of-the-art mixed-size
placers alone, Flora consistently improves placement runtime by where 𝑒𝑖,𝑗 and 𝑑𝑖,𝑗 denote the number of connections and distance
18%, with 2% wirelength reduction on average. Using both academic between the objects 𝑖 and 𝑗, respectively, and 𝑎𝑖,𝑗 is an adjacency
benchmarks and industrial designs also validates the generalization matrix with 𝑎𝑖,𝑗 = 1 if objects 𝑖 and 𝑗 are connected, 0 otherwise.
capability of the proposed GAT-based approach. For a given circuit netlist, 𝑒 and 𝑎 are known. According to Eq. 2,
The rest of this paper is organized as follows. Section 2 summa- the goal of floorplanning is to calculate the optimal distance 𝑑
rizes the related works. Section 3 describes the problem formulation. between connected objects. As such, the essence of floorplanning
Section 4 presents the proposed approach. Section 5 demonstrates is to construct an optimized mapping between interconnections 𝑒
the experimental results. We conclude the paper in Section 6. and physical distances 𝑑, which then guides the downstream task
to optimize and produce final chip placement.

2 RELATED WORK 4 METHOD OF FLORA


Floorplanning is an NP-hard problem which determines the loca- This section presents Flora, the proposed GAT-based floorplanner.
tions of large physical modules (e.g., embedded memories, intel- It first describes the overall algorithmic flow, and then presents the
lectual property (IP) cores, clusters of standard cells) and enables key components of Flora, including clustering, GAT-based model,
early estimation of interconnect wirelength [1]. Researchers have and synthetic training dataset generation.
proposed various representation schemes [14] and optimization
algorithms [9]. Early work on floorplanning focused on macro pack- 4.1 Solution Overview
ing and left the task of placing standard cells to downstream placers,
which may limit the optimization space of physical design [8]. Re- Figure 1 depicts the overall algorithmic flow of Flora. Given a cir-
cent work tackles the placement problem of macros and standard cuit netlist represented as a weighted undirected hypergraph, Flora
cells simultaneously as one mixed-size placement task [12]. partitions the netlist into subcircuit clusters using the proposed
The state-of-the-art mixed-size placers include ePlace [11], Re- c-spectral clustering algorithm. The partitioned subcircuit hyper-
PlAce [4] and DREAMPlace [7], which formulate the mixed-size graph, consisting of macros and clusters of standard cells, is fed
placement problem as a constrained nonlinear optimization prob- into the GAT-based model to generate chip floorplan, which is then
lem. The objective function consists of a convex wirelength function fed into downstream placer to produce final chip placement. The
and a weighted non-convex density function. Chip placement is an GAT model is trained using synthetic dataset without the need of
iterative optimization process driven by wirelength-induced force real-world chip designs. The dataset covers a wide range of circuit
and density-induced force. Studies show that these methods yield connectivity distributions to ensure generalization of the proposed
high-quality solutions but require a long iteration time. GAT-based floorplanner.
Recently, Google developed a floorplanner based on deep rein-
forcement learning [13]. It places macros sequentially and calculates 4.2 C-Spectral Clustering Algorithm
the reward iteratively, which is then back-propagated to train the This section describes the proposed c-spectral clustering algorithm.
policy network. It has reported better and faster than human ex- Prior work has shown that spectral clustering can effectively min-
pert results on TPU designs. However, with the complete circuit imize inter-cluster weighted connectivity, but with long runtime

1304
Floorplanning with Graph Attention DAC ’22, July 10–14, 2022, San Francisco, CA, USA

Clustering GAT-based Model Nonlinear Optimization Solver


Shared Dtask-model
Encoder
Netlist Encoder Decoder
Node distances
Feature Floorplan
graph
vector GAT
Layer

Encoder Decoder
Node positions

Ltask-model

Figure 1: Overview of the proposed method.


for large-scale graphs [15]. The proposed c-spectral clustering algo- into embeddings using GAT [16]. Dtask-model aims to learn embed-
rithm aims to leverage the benefit of spectral clustering yet avoid dings to establish an optimized mapping between circuit connectiv-
its high computation cost via bottom-up hyperedge coarsening. ity and the physical wirelength between connected objects in the
floorplan. The embeddings of Dtask-model are concatenated with
Algorithm 1 C-Spectral Clustering Algorithm the embeddings of Ltask-model, and then fed to the Ltask-model
Input: A circuit netlist modeled by an undirected hypergraph 𝐺 = (𝐶, 𝐸) decoder to generate object physical positions, or the chip floorplan.
Output: Cluster-level netlist Shared GAT encoder. The input to the shared GAT encoder
1: Sort hyperedges in a non-increasing order based on edge weights
2: Merge the objects in the same hyperedge to construct a hypergraph 𝑆 with smaller is the cluster-level netlist generated by c-spectral clustering (cf.
scale Section 4.2), represented as a graph 𝐺 = (𝑉 , 𝐴) defined by the
3: Compute the Laplacian matrix 𝐿 and the eigenvector 𝑞 corresponding to the
second smallest eigenvalue of 𝐿 for 𝑆
set of objects 𝑉 and the adjacency matrix 𝐴. Each object has a
4: Sort 𝑞 and cluster to 𝑘 groups by partitioning at the biggest 𝑘 − 1 gaps of 𝑞 𝑁 -dimensional feature vector representing the connectivity dis-
5: Extract macros as single clusters tribution. Let 𝐹 ∈ R𝑁 ×𝑁 be the feature matrix containing feature
6: return Cluster-level netlist.
vectors of all objects as rows. 𝐴 is an 𝑁 × 𝑁 binary matrix with
The proposed c-spectral clustering algorithm is summarized in 𝐴𝑖,𝑗 = 1 if objects indexed by 𝑖 and 𝑗 are connected by edges,
Alg. 1. Line 1-2 describes the bottom-up hyperedge coarsening 𝐴𝑖,𝑗 = 0 otherwise, and 𝐴𝑖,𝑖 = 1 for all 𝑖.
process. Since objects with high connectivity weight need to be The shared GAT-based encoder learns a new representation
placed closely on the chip, the coarsening process performs bottom- for each object by aggregating messages from its local neighbors,
up clustering based on hyperedge weights, thereby reducing the referred to as the first-level object embedding 𝑒 𝑓 ∈ R𝑁 ×𝑀 , where 𝑀
scale of the graph and alleviating the computation cost of spec- is the number of new features in each object. 𝑒 𝑓 is then transferred
tral clustering. Specifically, given the circuit netlist described in to Dtask-model and Ltask-model to complete the following specific
an undirected hypergraph 𝐺 = (𝐶, 𝐸), the hyperedge coarsening tasks.
process first sorts hyperedges in a non-increasing order based on Dtask-model. It aims to learn an optimized mapping between
edge weights. Following the sorted order, objects connected by the the clustered netlist connectivity and physical wirelength. The
same hyperedge are merged into a cluster. This process continues model consists of a GAT layer to compute a 𝐻 -dimensional distance
until a predefined bound in terms of the total number of clusters is embedding 𝑒𝑑 ∈ R𝑁 ×𝐻 and a Multi-Layer Perceptron (MLP) [6] to
reached. decode 𝑒𝑑 to the distances of connected objects 𝐷 ∈ R𝑁 ×𝑁 . Since
The output of the coarsening process, i.e., a hyper graph 𝑆 with GAT can assign different attentions to neighbors, it is capable to
much smaller scale, is then fed into spectral clustering (Line 3-4). We learn the correlation between objects and encode the neighbors
adopt the spectral-based clustering approach [15]. We first compute with more correlations to be closer to the object in the new fea-
the Laplacian matrix 𝐿 of graph 𝑆 and calculate the eigenvector 𝑞 ture space, thus enabling the model to learn the mapping between
corresponding to the second smallest eigenvalue of 𝐿. Then, we sort interconnections and distances for all connected objects.
the eigenvector 𝑞 and partition the objects to 𝑘 clusters by finding Let 𝑍 𝑑 ∈ R𝑁 ×𝑁 be the output matrix of Dtask-model containing
the biggest 𝑘 − 1 gaps of the sorted eigenvectors. We also apply distance information. We define the model computing 𝑍 𝑑 as follows:
the Lanczos algorithm [5] to improve the calculation efficiency of
eigenvalues and eigenvectors. Finally, we extract macros as single 𝑍 𝑑 = 𝐴𝑓𝜃𝑑 [𝑡𝜁𝑑 (𝑒 𝑓 )], (3)
clusters to balance the total areas of cells within each cluster.
where 𝑡𝜁𝑑 : R𝑀 → R𝐻 is a GAT layer with learnable parameters
4.3 GAT-based Model 𝜁𝑑 to compute the distance embedding 𝑒𝑑 , and 𝐴𝑓𝜃𝑑 : R𝐻 → R𝑁 is
This section presents the proposed GAT-based model for floorplan a MLP with learnable parameters 𝜃𝑑 multiplied by 𝐴 to decode the
generation with optimized mapping between circuit connectivity distances of connected objects. The model is trained by minimizing
and physical wirelength. the mean-squared loss (MSELoss) between the ground truth 𝐷𝐿𝑎𝑏𝑒𝑙
Architecture. As shown in Figure 1, the model contains three and the predicted value 𝑍 𝑑 as follows:
key components, including a shared encoder, and two task-specific
autoencoders referred as Dtask-model and Ltask-model, respec- 1 ∑︁ 𝑑
L𝑑 = (𝑍𝑖 − 𝐷𝐿𝑎𝑏𝑒𝑙𝑖 ) 2 . (4)
tively. The shared encoder encodes the circuit netlist information 𝑁
𝑖 ∈𝑉

1305
DAC ’22, July 10–14, 2022, San Francisco, CA, USA Liu, et al.

Ltask-model. It aims to generate the chip floorplan, i.e., the lo- Algorithm 2 Synthetic Training Dataset Generation Algorithm
cations of macros and clusters of standard cells. The architecture of Input: The number of objects 𝑛 , the distribution of the number of interconnections
Ltask-model is similar to that of Dtask-model, a GAT layer followed 𝑃 (𝑒) and neighbors 𝑃 (𝑏)
Output: Synthetic training dataset with optimal netlist-floorplan pair
by a MLP, defined as follows: 1: Determine the locations of 𝑛 objects
2: Generate the count number of neighbors 𝑏𝑖 to each object 𝑜𝑖 ∈ 𝐺 and 𝐵 =
𝑍 𝑙 = 𝑓𝜃𝑙 [𝑡𝜁𝑙 (𝑒 𝑓 ) ⊕ 𝑒𝑑 ], (5) (𝑏 1 , . . . , 𝑏𝑖 , . . . , 𝑏𝑛 ) ∼ 𝑃 (𝑏)
3: for 𝑜𝑖 in 𝐺 do
where 𝑍 𝑙 represents the object locations predicted by Ltask-model, 4: 𝑒𝑖 = (𝑒𝑖,1 , . . . , 𝑒𝑖,𝑗 , . . . , 𝑒𝑖,𝑏𝑖 ) ∼ 𝑃 (𝑒) .
𝑡𝜁𝑙 : R𝑀 → R𝑇 is a GAT layer with learnable parameters 𝜁𝑙 to 5: end for
6: 𝐸 = (𝑒 1 , . . . , 𝑒𝑖 , . . . , 𝑒𝑛 )
compute the location embedding 𝑒 𝑙 , ⊕ denotes the concatenation 7: Sort 𝑒𝑖 ∈ 𝐸 in descending order.
operator and 𝑓𝜃𝑙 : R𝑇 +𝐻 → R2 is a MLP with learnable parameters 8: Assign connections to neighbors based on the distance
9: return Synthetic training dataset
𝜃𝑙 to decode the object locations. In particular, to establish the map-
ping between relative distance and absolute coordinates for objects,
we concatenate the location embedding 𝑒 𝑙 and distance embedding
𝑒𝑑 as 𝑒 𝑐 and feed it to the following MLP layer to generate the
object distribution.
The model is trained using MSELoss between the ground truth
𝐿𝐿𝑎𝑏𝑒𝑙 and the predicted value 𝑍 𝑙 as follows:
1 ∑︁ 𝑙
L𝑙 = (𝑍𝑖 − 𝐿𝐿𝑎𝑏𝑒𝑙𝑖 ) 2 . (6)
𝑁
𝑖 ∈𝑉 (a) (b)
By combining the loss functions from the Dtask-model and the Figure 2: The connectivity statistics of ISPD2005 and real-
Ltask-model, the final loss function for the GAT-based model is world industry designs. (a) the distribution of the number of
defined as follows: connections 𝐸. (b) the distribution of the number of neigh-
L = L𝑑 + L𝑙 . (7) bors 𝐵. The red dotted line shows the connectivity statistics
of the created synthetic dataset.
4.4 Synthetic Training Dataset Generation
As such, we can obtain the synthetic training dataset with opti-
Real-world optimal floorplans are hard to obtain. Even assuming mal floorplan solution (cf. Proof) while ensuring a wide coverage
their optimal floorplans are known, the limited number of public of the statistical connectivity distribution, thus leading the trained
benchmarks are far from enough to support the training of a deep model to be generalized well on various benchmarks.
model. We propose a methodology to create a synthetic training
dataset to solve this problem [3]. Theorem 4.1. The netlist-floorplan pair generated in Alg. 2 is
The creation of synthetic training dataset needs to guarantee optimal.
that each created netlist-floorplan pair shall contain the optimal
mapping between circuit connectivity and physical wirelength. As Proof. The total wirelength between object 𝑖 and its neighbors
described in Section 3, the primary optimization goal of a floorplan- 𝐵 = {𝑏 1, 𝑏 2, . . . , 𝑏𝑛 } can be calculated as
ner is to minimize the total wirelength which can be calculated as 𝑤𝑖 = 𝑒𝑖,𝑏 1 𝑑𝑖,𝑏 1 + 𝑒𝑖,𝑏 1 𝑑𝑖,𝑏 1 + · · · + 𝑒𝑖,𝑏𝑛 𝑑𝑖,𝑏𝑛 , (8)
Eq. 2. The key features we can extract from a given clustered netlist
where 𝑑𝑖,𝑏𝑛 and 𝑒𝑖,𝑏𝑛 denote the distance and the number of connec-
are 𝑒 and 𝑎, which can be transferred to the statistical distribution
tions between 𝑖 and 𝑏𝑛 . Since 𝑒 is known in advance, to minimize
of connectivity, more specifically, the distribution of the number of
𝑤𝑖 while without violating the density constraint (i.e. non-overlap
connections between two neighbors 𝐸 and the distribution of the
between objects), 𝑑 needs to be inversely proportional to 𝑒, that is,
number of neighbors for each object 𝐵. To construct the training
dataset, we start with creating an empty chip region and placing all 𝑒𝑖,𝑏 1 > 𝑒𝑖,𝑏 2 > · · · > 𝑒𝑖,𝑏𝑛 ∧ 𝑑𝑖,𝑏 1 < 𝑑𝑖,𝑏 2 < · · · < 𝑑𝑖,𝑏𝑛 , (9)
objects on the region. Then we calculate the probability distribu- the mapping of which is guaranteed in the synthetic training dataset.
tion of 𝐸 and 𝐵 in various clustered circuits, which can be denoted Swapping any neighbors’ positions will lead 𝑤𝑖 to increase, thus
as 𝑃 (𝑒) and 𝑃 (𝑏), respectively. Next, we generate the number of proving the optimality of the synthetic training dataset [3].
neighbors 𝐵 = (𝑏 1, . . . , 𝑏𝑖 , . . . , 𝑏𝑛 ) ∼ 𝑃 (𝑏) for each objects and the □
number of connections 𝐸 = (𝑒 1, . . . , 𝑒𝑖 , . . . , 𝑒𝑛 ), ∀𝑒𝑖 ∼ 𝑃 (𝑒) between
connected objects, where 𝑏𝑖 denotes the number of neighbors for In addition, the synthetic dataset shall provide a broad coverage
objects 𝑖 and 𝑒𝑖 denotes the connection distribution between the in terms of circuit connectivity to ensure the generalization of
object 𝑖 and its neighbors. We sort 𝑒𝑖 ∈ 𝐸 in descending order and trained model. As shown in Figure 2, the distributions of the two
allocate more connections to neighbors with closer distances. The key features 𝐸 and 𝐵 vary significantly among different benchmarks
final number of connections between objects is set to the average (represented by different colors). In this work, the connectivity
connections of object-pairs. Then the coordinate of each object statistics of the synthetic dataset sufficiently covers that of academic
𝑂 (𝑥, 𝑦) can be regarded as the location label 𝐿𝐿𝑎𝑏𝑒𝑙 and the physi- benchmarks, e.g., ISPD2005, and real-world industry designs, which
cal distances between each connected object pair are the distance can be further extended to accommodate new designs with different
label 𝐷𝐿𝑎𝑏𝑒𝑙. The construction of the dataset is depicted in Alg. 2. connectivity statistics.

1306
Floorplanning with Graph Attention DAC ’22, July 10–14, 2022, San Francisco, CA, USA

Table 1: Experimental results on ISPD2005 benchmarks

Flora DREAMPlace
Benchmark #cell #net #Movable #Fixed #Fixed/#cell
HPWL (10^7) Iteration Runtime (s) HPWL (10^7) Iteration Runtime (s)
ADAPTEC1 211447 221142 210967 480 0.20% 6.46 460 180 6.55 605 227
ADAPTEC2 255023 266009 254584 407 0.16% 7.77 520 275 10.11 620 300
ADAPTEC3 451650 466758 450985 59 0.12% 15.65 521 374 15.63 726 445
ADAPTEC4 496054 515951 494785 1260 0.25% 14.28 527 427 14.29 742 545
BIGBLUE1 278164 284479 277636 528 0.19% 8.51 424 196 8.52 646 336
BIGBLUE2 557866 577235 535741 22125 3.97% 12.51 471 273 12.52 630 326
BIGBLUE3 1096812 1123170 1095583 1229 0.09% - - - - - -
BIGBLUE4 2177353 2229886 2169382 7970 0.33% 74.76 949 888 75.90 989 906
ratio - - - - - 1 1 1 1.02 1.33 1.18

Table 2: Experimental results on real-world industrial designs

Flora RePlAce
Benchmark #cell #net #Movable #Fixed
HPWL (10^9) Iteration Runtime (s) HPWL (10^9) Iteration Runtime (s)
design1 134990 141016 134265 725 4.83 440 200 4.89 560 231
design2 234106 218056 233382 724 15.80 521 260 16.31 810 388
design3 82185 81737 81578 607 1.17 463 166 1.19 547 205
ratio - - - - 1 1 1 1.03 1.35 1.32

5 EXPERIMENTAL RESULTS These results demonstrate that Flora can efficiently produce high-
This section evaluates the performance of Flora using academic quality chip floorplan, which when used by downstream placers as
benchmarks and real-world industrial designs. the initial solution, can effectively reduces required optimization
iterations of placers. Figure 3 shows the floorplanning and place-
5.1 Experiment Setting ment results of ADAPTEC2. Figure 3 (a) is the floorplan generated
by Flora, which is then used by DREAMPlace as the initial chip
Flora is implemented in Python with PyTorch. The model is trained
placement to produce the final placement (Figure 3 (b)). For com-
on a Linux server with 16-core Inter Xeon Gold 6226R @ 2.9GHz
parison purpose, Figure 3 (c) shows the placement result produced
and NVIDIA 2080 Ti GPU. We integrate Flora with two state-of-
by DREAMPlace alone.
the-art placers, RePlAce [4] and DREAMPlace version 3.0 [7], as
two integrated floorplan-placement physical optimization flows.
We evaluate the performance of Flora by comparing the two
integrated floorplanning-placement flows, i.e., Flora+RePlAce and
Flora+DREAMPlace, with RePlAce and DREAMPlace alone. We use
the widely used ISPD2005 benchmark suite and three real-world
industrial designs provided by IC Bench, Inc. The three industrial
SoC designs contain 135k, 234k, 82k blocks, and 725, 724, 607 IOs
respectively (see Table 2). To evaluate the performance of the pro-
(a) (b) (c)
posed floorplanning method, the locations of macros in ISPD2005
benchmarks and industrial designs are set to be movable. ISPD2005 Figure 3: (a) floorplan by Flora. (b) placement by
benchmarks are in GSRC Bookshelf placement format [2]. Since the Flora+DREAMPlace. (c) placement by DREAMPlace.
publicly available version of RePlAce does not support the book- To further investigate the efficacy of Flora, Figure 4 shows the
shelf format, we only consider the three industrial designs using HPWL and density metrics during the iterative placement opti-
RePlAce. For consistent comparison, both Flora-DREAMPlace and mization process of Flora+DREAMPlace and DREAMPlace alone
DREAMPlace alone are running the CPU version. The following using ADAPTEC1 benchmark. As pointed out by ePlace-MS [11],
experiments are performed on a workstation with Intel i7-7700 for modern mixed-size placers, the initial solution is of vital impor-
3.6GHz CPU and 16GB memory. tance toward the quality of the final placement results. To this end,
RePlAce and DREAMPlace both consist of an initial phase by first
5.2 Performance gathering blocks to the center of the chip, and then adjusting the
5.2.1 Comparison to the state-of-the-art mixed-size placers. Table 1 initial locations of the blocks around the center region of the chip
and Table 2 present the performance comparison results using by applying large weight to HPWL and small weight to density.
ISPD2005 benchmarks and three industrial designs, respectively. As This process indeed improves the final placement quality, which
shown in Table 1, on ISPD2005 benchmarks, Flora+DREAMPlace however significantly slows down the overall placement process.
delivers 33%, 18% and 2% improvement on average, in terms of As shown in Figure 4, for ADAPTEC1, such initial phase accounts
the number of placement iterations, total runtime, and HPWL, for approximately 25% of the total iterative placement process. On
compared against DREAMPlace alone. As shown in Table 2, on the other hand, Flora produces an optimized floorplan using highly
three industrial designs, Flora+RePlAce delivers 35%, 32%, and 3% efficient model inference. Using Flora, downstream placers can
improvement on average, in terms of the number of placement iter- bypass such time-consuming initialization phase, and effectively
ations, total runtime, and HPWL, compared against RePlAce alone. reduce the overall iterations and runtime with high-quality results.

1307
DAC ’22, July 10–14, 2022, San Francisco, CA, USA Liu, et al.

6 CONCLUSION
This paper presents Flora, a GAT-based floorplanner. Flora is equipped
with an efficient graph clustering algorithm and a graph attention
network to learn an optimized mapping between circuit connectiv-
ity and physical wirelength, and produce high-quality chip floorplan
via efficient model inference. Flora is trained using synthetic dataset
without the need of large real-world circuit dataset. Experimental
studies demonstrate that Flora can effectively improve the efficacy
of downstream placement tasks.
Figure 4: HPWL curve (a) and density curve (b) of ADAPTEC1
during the iterative placement process. ACKNOWLEDGMENTS
Table 3: Experimental results with movable IOs This research is supported by National Natural Science Foundation
of China under Grant No. 62090025, National Natural Science Foun-
Flora DREAMPlace dation of China under Grant No. 62141407 and the young scientist
Benchmark
HPWL (10^7) Runtime (s) HPWL (10^7) Runtime (s)
ADAPTEC1 6.90 186.11 8.53 262.33 project of MOE innovation platform. We thank IC Bench, Inc. for
ADAPTEC2 8.93 279.19 13.22 279.45 providing industrial designs for experiments, and their efficient
ADAPTEC3 14.82 354.26 16.14 449.01
ADAPTEC4 14.61 345.76 29.05 516.89 router for supporting our future work to extend the GNN-based
BIGBLUE1 8.43 213.33 12.74 281.43 model to target post-routing design parameters.
BIGBLUE2 10.12 408.90 18.12 762.10
BIGBLUE3 26.65 988.12 28.31 1126.34
BIGBLUE4 77.28 994.71 106.71 1223.01 REFERENCES
ratio 1 1 1.37 1.30
[1] Igor L.Markov Andrew Kahng, Jens Leinig and Jin Hu. 2011. VLSI Physical
Design: From Graph Partitioning to Timing Closure. (2011).
Table 4: Comparison between the proposed c-spectral clus- [2] Andrew E. Caldwell, Andrew B. Kahng, and Igor L. Markov. 2000. Toward CAD-IP
Reuse: The MARCO GSRC Bookshelf of Fundamental CAD Algorithms. IEEE
tering and spectral clustering. Design & Test (2000), 72–81.
[3] Chin-Chih Chang, J. Cong, and Min Xie. 2003. Optimality and scalability study
Spectral clustering C-spectral clustering of existing placement algorithms. In Proceedings of the ASP-DAC Asia and South
Benchmark
Time Edge Cut HPWL Time Edge Cut HPWL Pacific Design Automation Conference, 2003. 621–627.
ADAPTEC1 15.56 0.05 6.44 4.62 0.06 6.46 [4] C. Cheng, A. B. Kahng, I. Kang, and L. Wang. 2019. RePlAce: Advancing Solution
ADAPTEC2 35.87 0.07 7.78 6.42 0.06 7.77 Quality and Routability Validation in Global Placement. IEEE Transactions on
ADAPTEC3 55.99 0.16 15.64 10.91 0.18 15.65 Computer-Aided Design of Integrated Circuits and Systems 38, 9 (2019), 1717–1730.
ADAPTEC4 55.80 0.11 14.30 11.11 0.11 14.28 [5] G. Golub and C.V. Loan. 1983. Matrix computations. In Baltimore: Johns Hopkins
BIGBLUE1 21.18 0.07 8.49 6.73 0.08 8.51 University Press.
BIGBLUE2 64.69 0.03 12.00 16.36 0.05 12.51 [6] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT
BIGBLUE4 317.95 0.21 74.70 21.92 0.22 74.76 Press. http://www.deeplearningbook.org.
ratio 7.51 0.95 0.996 1 1 1
[7] Jiaqi Gu, Zixuan Jiang, Yibo Lin, and David Z. Pan. 2020. DREAMPlace 3.0:
Multi-Electrostatics Based Robust VLSI Placement with Region Constraints. In
2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).
To further investigate the impact of the initial phase of mixed-size 1–9.
[8] Andrew B. Kahng. 2000. Classical Floorplanning Harmful?. In Proceedings of
placers, we extend the experiments by setting all fixed IO cells to be the 2000 International Symposium on Physical Design (ISPD ’00). Association for
movable. Table 3 compares the performance of Flora+DREAMPlace Computing Machinery, New York, NY, USA, 207–213.
with DREAMPlace alone. As we can see, the placement quality of [9] K. Kiyota and K. Fujiyoshi. 2000. Simulated annealing search through general
structure floorplans using sequence-pair. In 2000 IEEE International Symposium
DREAMPlace significantly deteriorates, with 37% increase in HPWL on Circuits and Systems (ISCAS), Vol. 3. 77–80.
compared to Flora+DREAMPlace. This result implies that IOs with [10] Jingwei Lu. 2010. Fundamental Research on Electronic Design Automation in
fixed locations, albeit only accounts for a small percentage of the VLSI Design - Routability.
[11] J. Lu, H. Zhuang, P. Chen, H. Chang, C. C. Chang, Y. C. Wong, L. Sha, D. Huang,
total number of blocks in each design (Table 1, Column 6), have Y. Luo, C. C. Teng, and C. K. Cheng. 2015. ePlace-MS: Electrostatics-Based
significant impact on the quality of the initial phase of mixed-size Placement for Mixed-Size Circuits. IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems 34, 5 (2015), 685–698.
placers. On the other hand, Flora is insensitive to the initial IO [12] Igor L. Markov, Jin Hu, and Myung-Chul Kim. 2015. Progress and Challenges in
locations, and consistently produces high-quality chip floorplan. VLSI Placement Research. Proc. IEEE 103, 11 (2015), 1985–2003.
[13] Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim
5.2.2 Evaluation of the proposed c-spectral clustering approach. Ta- Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi,
ble 4 evaluates the performance of the proposed c-spectral cluster- et al. 2021. A graph placement methodology for fast chip design. Nature 594,
7862 (2021), 207–212.
ing method compared against spectral clustering, which is known to [14] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani. 1996. VLSI Module Place-
be one of the most effective graph clustering methods. The quality ment Based on Rectangle-Packing by the Sequence-Pair. Trans. Comp.-Aided Des.
of the clustering methods is measured by three metrics, includ- Integ. Cir. Sys. 15, 12 (1996), 1518–1524.
[15] Yangfeng Su, Fan Yang, and Xuan Zeng. 2012. AMOR: An Efficient Aggregating
ing time, edge cut and HPWL, where time represents the time (s) Based Model Order Reduction Method for Many-Terminal Interconnect Circuits.
required for clustering, edge cut denotes the ratio of the cut to In Proceedings of the 49th Annual Design Automation Conference. 295–300.
[16] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro
the total number of edges, and HPWL represents the final HPWL Lio, and Yoshua Bengio. 2018. Graph attention networks. In ICLR.
(×107 ) after placement. As shown in Table 4, with similar HPWL
after placement (within 0.4% difference on average), the proposed
c-spectral clustering achieves over 7x speedup compared with spec-
tral clustering.

1308

You might also like