Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

applied

sciences
Article
A Deep Reinforcement Learning Floorplanning Algorithm Based
on Sequence Pairs †
Shenglu Yu 1,2 , Shimin Du 1, * and Chang Yang 1,2

1 Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China;
2111082190@nbu.edu.cn (S.Y.); 2311170013@nbu.edu.cn (C.Y.)
2 College of Science & Technology, Ningbo University, Ningbo 315300, China
* Correspondence: dushimin@nbu.edu.cn
† This manuscript is an extended version of the conference paper titled Yu, S.; Du, S. VLSI Floorplanning
Algorithm Based on Reinforcement Learning with Obstacles. In Proceedings of the Biologically Inspired
Cognitive Architectures 2023—BICA 2023, Ningbo, China, 13–15 October 2023; Springer Nature: Cham,
Switzerland, 2023; pp. 1034–1043.

Abstract: In integrated circuit (IC) design, floorplanning is an important stage in obtaining the
floorplan of the circuit to be designed. Floorplanning determines the performance, size, yield, and
reliability of very large-scale integration circuit (VLSI) ICs. The results obtained in this step are
necessary for the subsequent continuous processes of chip design. From a computational perspective,
VLSI floorplanning is an NP-hard problem, making it difficult to be efficiently solved by classical
optimization techniques. In this paper, we propose a deep reinforcement learning floorplanning
algorithm based on sequence pairs (SP) to address the placement problem. Reinforcement learning
utilizes an agent to explore the search space in sequence pairs to find the optimal solution. Experi-
mental results on the international standard test circuit benchmarks, MCNC and GSRC, demonstrate
that the proposed deep reinforcement learning floorplanning algorithm based on sequence pairs can
produce a superior solution.

Keywords: VLSI; floorplanning; sequence pair; deep reinforcement learning; MCNC; GSRC

Citation: Yu, S.; Du, S.; Yang, C. A


Deep Reinforcement Learning
1. Introduction
Floorplanning Algorithm Based on In recent years, the rapid advancement of integrated circuit technology has led to a
Sequence Pairs. Appl. Sci. 2024, 14, significant increase in the complexity of very large-scale integrated circuit (VLSI) circuits.
2905. https://doi.org/10.3390/app According to Moore’s law [1], the number of transistors on a chip doubles every 18 months.
14072905 With the continuous advancement in semiconductor technology, the scale and complexity
Academic Editor: Gerard Ghibaudo
of integrated circuits have been increasing. Faced with such a vast scale of chip design,
traditional manual design methods are no longer able to meet growing design demands.
Received: 4 March 2024 Therefore, electronic design automation (EDA) [2] technology has become an indispensable
Revised: 26 March 2024 trend for the future. In the design and optimization process of VLSIs, physical design [3]
Accepted: 27 March 2024 plays a crucial role as an essential part of the VLSI design flow, serving as both a key link
Published: 29 March 2024 and the core of electronic design automation technology. The floorplan phase [4], as a
critical part of the physical design flow, not only directly determines the area and overall
floorplan of the integrated circuit chip, influencing the subsequent routing work, but also
Copyright: © 2024 by the authors.
directly dictates the final performance of the entire circuit. The VLSI floorplanning problem,
Licensee MDPI, Basel, Switzerland. being a classic NP-hard problem, has a significant impact on performance metrics such as
This article is an open access article circuit delay, power consumption, congestion, and reliability [5]. Despite being a classical
distributed under the terms and problem [6] and the subject of previous algorithms, block placement continues to pose
conditions of the Creative Commons significant challenges [7].
Attribution (CC BY) license (https:// As the first stage of the physical design flow, the quality of floorplanning significantly
creativecommons.org/licenses/by/ impacts the subsequent floorplan and routing. Generally, research on floorplanning can
4.0/). be divided into two categories. One category is based on planar graph representations.

Appl. Sci. 2024, 14, 2905. https://doi.org/10.3390/app14072905 https://www.mdpi.com/journal/applsci


Appl. Sci. 2024, 14, 2905 2 of 14

For floorplan graphs with slicing structures [8], binary trees are widely used, where leaves
correspond to blocks and internal nodes define the vertical or horizontal merge operations
of their respective descendants. For more general non-slicing floorplan representations, sev-
eral effective forms have been developed, including sequence pairs (SP) [9], the bounded
slicing grid (BSG) [10], O-trees [11], transitive closure graphs with packed sequences
(TCG-S) [12], and B*-trees [13]. Among these, the representation of block placement with
sequence pairs, which uses positive and negative sequences to represent the geometric
relationships between any two modules, has been extended in subsequent work to handle
obstacles [14], soft modules, rectilinear blocks, and analog floorplans [15–18]. The decod-
ing time complexity of the sequence pair representation is O(N2 ). In order to reduce the
decoding complexity, Tang et al. [19]. utilized the longest common subsequence algorithm
to decrease the decoding complexity to O(NlogN). Subsequently, Tang and Wong [20]
proposed an enhanced Fast Sequence Pair (FSP) algorithm, further reducing the decoding
time complexity to O(NloglogN). Another category involves the study of floorplanning
algorithms. By employing suitable planar graph representations and/or efficient perturba-
tion methods, high-quality floorplans can be achieved through linear programming [21] or
some metaheuristic methods such as simulated annealing (SA) [22,23], genetic algorithms
(GA) [24,25], memetic algorithms (MA) [26], and ant colony optimization [27].
Despite decades of research on VLSI floorplanning problems, the existing studies
indicate that current EDA floorplan tools still struggle to achieve a floorplan close to opti-
mal. These tools continue to face numerous limitations, making it challenging to obtain
satisfactory design outcomes. Existing floorplan tools generally require long runtimes and
experienced experts to spend weeks designing integrated circuit floorplans. Furthermore,
these tools have a limited scalability and often require a time-consuming redesign when
faced with new problems or different constraints. Reinforcement learning (RL) [28] pro-
vides a promising direction to address these challenges. Reinforcement learning possesses
autonomy and generalization capabilities, allowing the agent in reinforcement learning,
through interactions with the environment, to automatically extract knowledge about the
space it operates in. In addition to breakthroughs in gaming [29] and robot control [30],
reinforcement learning has been applied to solve combinatorial optimization problems.
Ref. [31] proposed deep reinforcement learning (DRL) for solving the Traveling Salesman
Problem (TSP). Moreover, significant progress has been made in the application of reinforce-
ment learning to task scheduling [32], vehicle routing problems [33], graph coloring [34],
and more. Recently, integrating reinforcement learning into electronic design automation
(EDA) has become a trend. For example, the Google team [35] formulated macro-module
placement as a reinforcement learning problem and trained an agent using reinforcement
learning algorithms to place macro-modules on chips. He et al. [36] utilized the Q-learning
algorithm to train an agent that selects the best neighboring solution at each search step.
Cheng et al. [37] introduced cooperative learning to address floorplan and routing problems
in chip design. Agnesina et al. [38] proposed a deep reinforcement learning method for
VLSI placement parameter optimization. Vashisht et al. [39] utilized iterative reinforcement
learning combined with simulated annealing to place modules. Xu et al. [40] employed
graph convolutional networks and reinforcement learning methods for floorplanning under
fixed-outline constraints.
This paper proposes a deep reinforcement learning-based floorplanning algorithm
utilizing sequence pairs for the floorplanning problem. The algorithm aims to optimize
the area and wirelength of the floorplan. To evaluate the effectiveness of our algorithm,
we conduct experiments on the internationally recognized benchmark circuits MCNC
and GSRC, comparing our approach with simulated annealing and the deep Q-learning
algorithm proposed by He et al. [36]. In terms of dead space on the MCNC benchmark
circuits, our algorithm outperforms simulated annealing and the literature [36] by an
average improvement of 2.7% and 1.1%, respectively. Additionally, concerning wirelength,
our algorithm shows an average improvement of 9.1% compared to simulated annealing.
On the GSRC benchmark circuits, our algorithm demonstrates an average improvement of
Appl. Sci. 2024, 14, 2905 3 of 14

7.0% and 3.7% in dead space to simulated annealing and the literature [36], respectively.
Furthermore, for wirelength, our algorithm exhibits an average improvement of 8.8% over
simulated annealing. These results validate the superior performance and robustness of
our algorithm in handling ultra-large-scale circuit designs.

2. Description of Floorplanning Problem


Generally, floorplanning involves determining the relative positions of modules. Let
B = {bi |1 ≤ i ≤ n} be a set of rectangular modules, where each module bi has a specified
width wi and height hi . N = {ni |1 ≤ i ≤ m} represents a netlist that describes the connections
between modules. The goal of floorplanning is to assign a set of coordinates to each module
bi , while ensuring that no two modules overlap.
Let (xi , yi ) denote the coordinates of the bottom-left corner of module bi . The floorplan
area A is defined as the minimum rectangular area that encompasses all modules, and it
can be calculated as follows:

A = max( xi + wi ) × max(yi + hi ) (1)

We employ the widely used Half-Perimeter Wirelength (HPWL) model [41] as the
method to estimate the total wirelength, which is defined as follows:
!
m
W= ∑ max xi − x j + max yi − y j
bi ,b j ∈ni bi ,b j ∈ni
(2)
i =1

Based on the optimization objective defined by the minimum rectangle area A and the
wirelength W, the formulation is as follows:

F = αA + βW (3)

Among these, F is a feasible floorplan diagram, indicating the weighted sum of the
total area A and the total wirelength W. The coefficients α and β are weight factors ranging
from 0 to 1.

3. Sequence Pair Representation


Tamarana et al. [9] proposed a graph-encoding method called sequence pair (SP) for
encoding non-sliced planar graphs. Given a non-sliced planar graph with n modules,
a sequence pair consists of a positive sequence Г+ and a negative sequence Г− , which
contains all the information about which subsets of modules are located above, below,
to the right, and to the left of a given module. Through the analysis of different graph
representation methods, we believe that sequence pair has unique advantages compared to
other graph representations. Firstly, the sequence pair representation is concise and easy
to understand, making it highly suitable for integration with reinforcement learning to
jointly solve graph-planning problems. Secondly, it can represent the complete solution
space and has a one-to-one correspondence with non-sliced graphs, allowing for the unique
reconstruction of non-sliced graphs from it. Lastly, compared with other methods, the
introduction of fast sequences significantly reduces the complexity of the decoding time.

3.1. Properties of Sequence Pair


Sequence pair is a method used to describe the relative order between sequences. In
each pair of sequences, each sequence consists of a set of module names, where the module
names are the same in the positive sequence Г+ and the negative sequence Г− , but their
order is inconsistent between Г+ and Г− . For example, in a given pair of sequences (Г+ , Г− ),
there are four possible positional relationships between any two modules, bi and bj :
(1) If bi is positioned before bj in Г+ , i.e., <....bi ....bj ....>, and bi is also positioned before bj
in Г− , i.e., <....bi ....bj ....>, it indicates that bi is located on the left side of bj .
Appl. Sci. 2024, 14, 2905 4 of 15

order is inconsistent between Г+ and Г−. For example, in a given pair of sequences (Г+, Г−),
Appl. Sci. 2024, 14, 2905 there are four possible positional relationships between any two modules, bi and bj: 4 of 14
(1) If bi is positioned before bj in Г+, i.e., <....bi....bj....>, and bi is also positioned before bj in
Г−, i.e., <....bi....bj....>, it indicates that bi is located on the left side of bj.
(2) If b(2) If bj is positioned
j is positioned before bbefore i in Г+, b i in<....b
i.e., Г+ , i.e.,
j....bi<....b
....>, andj ....bib ....>, bj is also positioned
andpositioned
j is also before bi inbefore bi
in Г−
Г−, i.e., <....b , i.e.,
j....b <....b
i....>, j ....bi ....>, that
it indicates it indicates
bi is located that bon i isthe locatedright on
side the bj. side of bj .
ofright
(3) If (3) If bi is positioned
bi is positioned before before bj in Г+b, ji.e.,
in Г<....b
+ , i.e., <....b
i....b i ....b
j....>, andj ....>,
bj isand bj is positioned
positioned bi in Г−b, i in Г− ,
before before
i.e., <....b ....b ....>,
i.e., <....bj....bi....>,j it indicates
i it indicates that
that bi is located b is located
i above bj. above bj .
(4) If (4) If bj is positioned
bj is positioned before before bi in Г+b, ii.e.,
in Г<....b
+ , i.e., <....b
j....b j ....b
i....>, andi ....>,
bi isand bi is positioned
positioned bj in Г−b, j in Г− ,
before before
i.e., <....bi.e., <....b
i....b j....>,i ....b ....>,
it indicates
j it indicates
that bi is that
locatedb i is located
below b j below
. b j .
As an example,
As an example, Figure Figure
1 shows1 shows an inclined
an inclined grid representing
grid representing the relative
the relative positionspositions
between
between modules
modules in a sequence
in a sequence pair (Г+pair
, Г−) (Г + , Г−
= (<4, 3,)1,
= 6,
(<4, 3, 1,
2, 5) and6, 2, 5) 3,
(<6, and (<6,
5, 4, 3, 5, 4, 1, 2>).
1, 2>).

4 4
1
2 1 2
3 3
4 2
3 1
1 6 5 4
6 5 6 5
2 3
5 6
(a) (b)
Figure 1. (a) displays
Figure an inclined
1. (a) displays an grid representing
inclined the relative
grid representing thepositions
relative between
positionsmodules
betweeninmodules
a se- in a
quence sequence
pair (Г+, Г−pair
) = (<4, 3, 1, 6, 2, 5) and (<6, 3, 5, 4, 1, 2>); (b) corresponds to the floorplan of the
(Г+ , Г− ) = (<4, 3, 1, 6, 2, 5) and (<6, 3, 5, 4, 1, 2>); (b) corresponds to the floorplan of
sequence pair. Each module has the following dimensions: 1 (4 × 6), 2 (3 × 7), 3 (3 × 3), 4 (2 × 3), 5 (4
the sequence pair. Each module has the following dimensions: 1 (4 × 6), 2 (3 × 7), 3 (3 × 3), 4 (2 × 3),
× 3), 6 (6 × 4).
5 (4 × 3), 6 (6 × 4).

From the figure,


From the it can beit observed
figure, that all modules
can be observed satisfy thesatisfy
that all modules requirements of se-
the requirements of
quencesequence
pairs. In pairs.
fact, for any given sequence pair, the positions of each module
In fact, for any given sequence pair, the positions of each modulecan be can be
efficiently determined
efficiently by calculating
determined the weighted
by calculating longestlongest
the weighted common subsequence
common (LCS). (LCS).
subsequence
The time complexity of this algorithm is O(nlog(logn)).
The time complexity of this algorithm is O(nlog(logn)).

3.2. Sequence Pair Representation


3.2. Sequence Floorplan
Pair Representation Floorplan
To obtain a floorplan
To obtain from a from
a floorplan sequence pair, we
a sequence firstwe
pair, construct a geometric
first construct constraint
a geometric constraint
graph corresponding to the sequence
graph corresponding pair. This
to the sequence constraint
pair. graph consists
This constraint of a setof
graph consists ofaedges
set of edges
(E) and(E)
a set
andofa vertices (V). The
set of vertices vertex
(V). The set (V) set
vertex represents nodes for
(V) represents eachfor
nodes module name, name,
each module
source source and S,
node S,node added receiving
and added node T.node
receiving T. A horizontal
A horizontal constraint graph (HCG)
constraint and a and a
graph (HCG)
verticalvertical constraint
constraint graph graph
(VCG)(VCG)can becan be constructed,
constructed, basedbasedon theonpositional
the positional relationships of
relationships
of eacheach module.
module. The specific
The specific stepssteps for construction
for construction are asarefollows:
as follows:
(1) For (1)a given
For amodule x in thex sequence
given module in the sequence
pair (Гpair (Гwe
+, Г−), + , Гobtain
− ), weaobtain
list ofamodules
list of modules
that that
appear before x in Г and
appear before x in Г+ and Г−. +These modules Г − . These modules are positioned to the
are positioned to the left of x in the plane left of x in the
graph. A plane
group graph. A group
of modules of appear
that modules thatx appear
after in Г+ and after x inpositioned
Г− are Г+ and Г−toare thepositioned
right to
the plane
of x in the right of x in the
graph. planeof
A group graph.
modulesA group of modules
that appear after xthat
in Гappear after xxin
+ and before inГ+ and
before x inbelow
Г− are positioned Г− arex positioned
in the planebelowgraph.x Finally,
in the plane graph.
a group Finally, athat
of modules group of modules
appear
that appear before x in Г and after x in Г are positioned
before x in Г+ and after x in Г− are positioned above x in the plane graph.
+ − above x in the plane graph.
(2) Next, we construct a directed graph for the horizontal
(2) Next, we construct a directed graph for the horizontal constraint graph based on the constraint graph based on
the left and right relationships. A directed edge E (a, b)
left and right relationships. A directed edge E (a, b) represents module a being posi- represents module a being
tioned topositioned
the left oftomodule
the left b.
ofWe add ab.source
module We add a source
node node S to
S connected connected
all nodestoinallthenodes in
the horizontal
horizontal constraint constraint
graph, andgraph,we also andadd
we also add a receiving
a receiving node T connected
node T connected to all to all
nodes. The longest path length from the source node S to each node in the horizontal
constraint graph represents the x coordinate of the modules in the floorplan.
(3) By computing the longest path length from the source node S to the added receiving
node T, we can obtain the width of the floorplan. Similarly, we construct the vertical
constraint graph based on the relationships above and below, and calculate the y
nodes. The longest pathnodes. lengthThefrom longest
the source
nodes.
path length
node
TheSlongest
from
to each thepath
node
source
length
in the
node from
horizontal
S totheeach
sourcenodenodein th
constraint graph represents constraint
the x coordinate
graph represents
constraint
of the modules
the
graph
x coordinate
represents nodes.
in the floorplan.
of the
the The longest
xmodules
coordinate in the path
of the
floorp
m
nodes.
(3) By The longest
computing pathBy
the (3)
longest length
path from
lengththe
computing (3)
the source
from longest
Bythe node
pathS length
computing
source tothe
node each node
Slongest
from
to in
thethe the
added
path
sourcehorizontal
constraint
length
receiving
node graph
from S tothe represe
thesourc
add
nodes. The longest path length from the source node S to each node in the horizontal
constraint
node T, wegraph represents
can obtain the
node the
T, we
width xof
coordinate
can
theobtain nodeof
floorplan. the
the modules
T,width
we can
Similarly,ofobtain
the
weinfloorplan.
the
thefloorplan.
(3)
construct BySimilarly,
width computing
theofvertical wethe
the floorplan. longe
construcSim
constraint graph represents the x coordinate of the modules in the floorplan.
Appl. Sci. 2024, 14, 2905
(3) By computing
constraint graph thebased
longest on path
constraint length
the relationships
graph from
basedthe source
constraint
above
on the
and node
graph
relationships
below, S to
based andthe
on added
thenode
calculate
above receiving
T,below,
we
relationships
andthe y co-canand obtain
abovecalcuth
an
(3) By computing the longest path length from the source node S to the added receiving 5 of 14
node T, we
ordinate canmodules
of the obtain the width
ordinate
and the ofofthe
heightthemodules
floorplan.
of the
ordinate
plane
and Similarly,
of
graph
the
the in we
height
modules ofconstruct
a similar
the
andplane the
manner.
the vertical
constraint
graph
height
Figuregraph
in
of athe based
similar
plane magro
node T, we can obtain the width of the floorplan. Similarly, we construct the vertical
constraint graph
2 illustrates the based 2onillustrates
constructed the horizontal
relationships 2 above
the constructed
and and
illustrates
vertical below,
horizontal
constraint and
the constructed andcalculate
graphs forthe
ordinate
vertical
horizontal
the yofco-
the
constraint
se- andmodules
vertica
graphs
constraint graph based on the relationships above and below, and calculate the y co-
ordinatepair
quence of the
(Г+,modules
Г−) = (<4, and
3, 1,the
quence height
6,pair
2, (Г(<6,
5), +, of
Г−)the plane
quence
3,
= 5,
(<4, graph
1,pair
4, 3, 1,
2>) (Г
6,as 2,+anin
,5), a)(<6,
=similar
Г−example.
(<4,
3, 5, manner.
1,21,
3, 4, 2>) Figure
6,illustrates
2, 5),
as (<6, the
an example.
3, constru
5, 4, 1, 2>
ordinate of theof
coordinate modules and theand
the modules height
the of the plane
height of thegraph
plane in graph
a similar in manner.
a similarFiguremanner.
2 illustrates the constructed horizontal and vertical constraint graphs for the
quence pair se-(Г+, Г−) = (<4
Figure 2 illustrates
2 illustrates the constructed
the constructed horizontal horizontal
and verticaland vertical
constraint constraint
graphs graphs
for thefor se-the
quence pair (Г+, Г−) = (<4, 3, 1, 6, 2, 5), (<6, 3, 5, 4, 1, 2>) as an example.
sequence
quence pair
pair (Г +, Г(Г + ,(<4,
−) = Г− )3,=1,
(<4, 3, 5),
6, 2, 1, 6,(<6,2, 5),
3, 5,(<6, 5, 4,asT1,an
3,2>)
4, 1, 2>) as an example. T
example.

T
T
4 4 4 4 4 4
1 1 1 1 1 4
4
4 2 4
4 2 2 2 2 1
S 3 1 1
1S 3 T S 3 3 1 T 3 T 3
2 2 S 3
S 3 2 T 3 2
S 3 6 5 6T 35 6 6 5 5 6 5 6
6 5
6 5 6 5
6 5 6 5
S S
(a) (a) (b) (a) (b)
S (a)
Figure 2. (a) represents the
Figure
horizontal
2. (a) represents
constraint
Figure
graph
the horizontal
2.and S the horizontal
(a) represents
(b) represents
constraint the
graph
vertical
and constraint
(b)
constraint
represents
graphtheand
vertical
(b) r
graph. (a) (b)
(a)graph. graph. Figure 2. (a) represents the horiz
(b)
graph.
Figure 2. (a) represents the horizontal constraint graph and (b) represents the vertical constraint
Figure 2. (a)
Figure
So, 2. represents
by the horizontal
(a) represents
constructing the byconstraint
So,horizontal
horizontal graph
andconstraint
constructing
vertical and (b)
graph represents
So,horizontalandand
constraint
by constructing
graphs the vertical
(b) vertical
represents
horizontal
and constraint
the vertical
constraint
calculating
and con- and
vertical
graphs
the constra
cal
graph.
graph.
straint path
graph.lengths for
longest longest
both directions,
path lengths
we
longest
for
canboth
determine
pathdirections,
lengthsthefor
width
weboth
canand So, by of
directions,
determine
height constructing
we
thethe
can
width horh
determi
and
minimumSo, bybounding
constructing horizontal
rectangle
minimum of the and vertical
bounding
floorplan.minimum
rectangle constraint
Subsequently,
bounding graphs
of the floorplan.
floorplanning
rectangle and ofcalculating
longest
Subsequently,
the path lengths
is performed
floorplan. theSubsequen
floorplanning for both i
So,So,byby constructing
constructing horizontal
horizontal and and vertical
vertical constraint
constraint graphs
graphs and and calculating
calculating thethe
longest
for path
the pair oflengths
sequences, for
forboth
the
therebydirections,
pair determining
of sequences,we
for canthethe determine
pair
thereby
size the
of sequences,
and
determining width
position and
minimum
thereby
of
thethe height
size ofposition
bounding
determining
non-slicing
and the the rectangl
of
size
thea
longest
longest path
path lengths
lengths forforbothboth directions,
directions, wewe cancan determine
determine thethe widthwidth and and height
height of of thethe
minimum
plane graph. bounding rectangle plane graph. of the floorplan. planeSubsequently,
graph. floorplanning for the is performed
pair of sequences, th
minimum
minimum bounding
bounding rectangle
rectangle of the
of the floorplan.
floorplan. Subsequently,
Subsequently, floorplanning
floorplanning is performed
is performed
for the pair of sequences, thereby determining the size and position plane of thegraph. non-slicing
forfor
thethe pair of of
pair sequences,
sequences, thereby
thereby determining
determining thethe sizesizeand and position
position of of thethe non-slicing
non-slicing
4. Reinforcement
plane graph. Learning4. Reinforcement Learning 4. Reinforcement Learning
plane
planegraph.
graph.
Reinforcement learningReinforcement is a machine learning Reinforcement
approach
is a machine learning
aimed at4.islearning
learning Reinforcement
a approach
machine howlearningtoLearning
aimed atappro
lear
4. Reinforcement
make decisions
4. Reinforcement to Learning
achieve make
Learning specific
decisions goals to through
achieve
make decisions
interaction
specific goals
to achieve
with through
the specific
environment. Reinforcement
interaction
goals through
with
In re- the learning
interacti
environ
4. Reinforcement Learning
Reinforcement
inforcement learning,
Reinforcement learning
an
inforcement
agent
learning is observes
a machine
a learning, the learning
inforcement
state
an agent approach
of the observes
learning,
environment, aimed
the
an at
agent
state makelearning
selects
of decisions
observes
the how
appropriate
environment,
the to achieve
to
state the sp
ofselectsen
Reinforcement learning is ais machine
machine learning
learning approach
approach aimedaimed at at learning
learning how how to to
make
actions,
make decisions
and
decisions to achieve
continuously
to achieve specific
actions,
optimizes and
specific goals goalsthrough
continuously
its strategy
actions,
through interaction
based
optimizes
and on with
continuously
interaction its
feedback the
strategy environment.
inforcement
optimizes
from based
the on
its In
strategy
environment
feedback re-
learning, based
from an age
on
the e
make decisions to achieve specific goals through interaction withwith the environment.
the environment. In re- In
inforcement
regarding its
reinforcement learning,
actions.
learning, an
This agent
regarding
feedback
an agent observes
its is
actions.
observes the
typically state
regarding
This of
provided the
feedback
its environment,
actions.
in is
thetypically
form
This selects
actions,
feedback
of provided
rewards appropriate
isand
[42,43]continuously
typically
in the form
or provided
of op
rewar i
inforcement learning, an agent observes the the state state
of the of the environment,
environment, selects
selects appropriate
appropriate
actions,
penalties,
actions, andand
and continuously
the agent’s
continuously optimizes
penalties,
objective
optimizesandits isthestrategy
to
its learn based
penalties,
agent’s
strategy the onon
objective
optimal
and
based feedback
the isstrategy
agent’s
to learn
feedback from the
regarding
objective
bythe
from environment
maximizing
optimal
the is toitsstrategy
environment actions.
the theThis
learn by fee
optim
max
actions, and continuously optimizes its strategy based on feedback from the environment
regarding
long-term
regarding its actions.
cumulative This feedback
reward.
long-term is typically
cumulative long-term provided
reward. in
cumulative the form
reward. of rewards
penalties, [42,43]
and or
the agent’s ob
regarding its its actions.
actions. ThisThis feedback
feedback is typically
is typically providedprovided in the inform
the formof rewardsof rewards [42,43] [42,43]
or
penalties,
or Almost and
penalties, allthe
and agent’s
reinforcement
the agent’s objective
Almostlearningall
objective is reinforcement
to learn
satisfies the
the optimal
Almost framework
learning
all strategy
reinforcement
satisfies
of Markov by the maximizing
long-term
learning
framework
Decision cumulative
satisfiesthe
Pro- thereward
of Markov frame D
penalties, and the agent’s objective is toislearn to learnthe the optimal
optimal strategy
strategy by by maximizing
maximizing thethe
long-term
cesses
long-term cumulative
(MDPs). A typical
cumulative reward.
cesses
MDP,
reward. (MDPs).
as shown A typical
cesses
in Figure (MDPs).
MDP, 3, consistsA typical
as shown of four
in Figure
MDP, 3,Almost
key elements:
as shown
consistsin all reinforcemen
ofFigure
four key 3, cons
elem
long-term cumulative reward.
Almost
(1) AlmostAlmost
States all
S:all reinforcement
a all reinforcement
finite set(1) learning
learning
of environmental
States S: a finite satisfies
satisfies
(1)
states. the
setStates
of theframework
S:framework
environmental
a finite set of ofMarkov cesses
Markov Decision
(MDPs).
Decision Pro-APro-
typical MD
reinforcement learning satisfies the framework ofstates.environmental
Markov Decision states.
Pro-
cesses
(2)cesses (MDPs).
(MDPs). A typical
A typical MDP,MDP, as shown in Figure 3, consists of four key elements:
cesses Actions
(MDPs). A:Aa typical
finite (2)
set of
MDP, Actionsas as
actions shown
A:
shown taken
a finite
in(2)in
by Figure
the
set
FigureActions
of 3, consists
reinforcement
3,actions
A:
consists of of
a finite
taken four
learning
set
by
four of
the
key key
(1)
actions elements:
States
reinforcement
agent.
elements: taken S: aby finite
learningset ofage
the reinforc en
(1)
(3)(1) States
State S: a finite
transition set
modelof
(3) environmental
State
P (s, a,
transition
s′): states.
representing
(3) modelState P the
transition
(s, a,
probability
s′): representing
model of P (2)(s, a,Actions
the
transitionings′): A:
probability
representing
from a finite
of set
the of
pr
transit
(1) States S: aS:finite
States a finiteset set of environmental
of environmental states.
states.
(2)(2) Actions
state s Є A:
S a
to finite
the set
next of actions
state,
state s′
s Є taken
S,
S to
whenthe by the
next
action
state reinforcement
state,
Actions A: a finite set of actions taken by the reinforcement learning agent. a
s Є s′
A
S Єto
is S,
taken.
the when learning
next (3)
action
state, agent.
s′
a State
Є S,
A istransition
when taken. action model
a Є A P
(2) Actions A: a finite set of actions taken ′ by the reinforcement learning agent.
(3)
(4)(3) State
Reward transition
State functionmodel
transition R(4)
model P (s,
(s, a):
Reward
P a, a,
(s, s′):
representing s representing
function (4)the RReward
): representing (s, a):the
numerical probability
representing
function
the reward
probability R for
(s,ofa):
the transitioning
taking
numerical
of state
representing
action
transitioning Єafrom
sreward SЄthe
toAthe
from for next
numerical
taking sta
(3) State transition model P (s, a, s′): representing the probability of transitioning from
state
in s Єs SЄto
state S the
S. This
to next
the state,
reward
next instate,
state
can s′ S,
s′ Єsbe when
Єpositive,
S.
S, This
when action
reward
in
negative,
state
action a Єcan
asAЄoris
S.
A taken.
bezero.
This
ispositive,
reward
taken. (4)canReward
negative, be orpositive,function
zero. R (s, a)
negative, o
state s Є S to the next state, s′ Є S, when action a Є A is taken.
(4)(4) Reward
Reward function
function RR (s,(s,
a):a):representing
representingthe thenumerical
numericalreward reward for for taking
takinginaction state as Є Є S.AA inThis rewar
(4) Reward function R (s, a): representing the numerical reward for taking action a Є A
in state s Є S. This reward can be positive, negative, or zero.
in state s Є S. This reward can be positive, negative, or zero.
The goal of an MDP is to find a policy π that maximizes the total accumulated
numerical reward. The expression for the total cumulative reward is as follows:

Rt = ∑ γt rt (4)
t
4
Appl. Sci. 2024, 14, 2905 1 6 of 14
2
S 3 T
where γ represents the reward discount factor, t denotes the time step, and r represents the
reward value at time step t. The state value function Vπ (s) in an MDP is defined as the
6
expected reward value of state s under policy π, as defined in Equation (5). 5


" #
Appl. Sci. 2024, 14, 2905 6 of 15
Vπ (s) = Eπ [ Rt |st = s]= Eπ ∑ γt rt st = s (5)
t

(a)
Figure 2. (a) represents the horizontal constraint graph
graph.

So, by constructing horizontal and vertical


longest path lengths for both directions, we can
minimum bounding rectangle of the floorplan. Su
for the pair of sequences, thereby determining t
plane graph.

4. Reinforcement Learning
Figure3.3.AAtypical
Figure typicalframework
frameworkfor
forMDPs.
MDPs.
Reinforcement learning is a machine learnin
In this context,
The goal of an MDP E represents the expected value
make
π is to find a policy π that maximizes of the reward
decisions thetototalfunction
achieve under goals
specific
accumulated policy
nu- through i
π. Similarly, the state–action value function
merical reward. The expression for the total cumulative Q (s, a) is
π inforcement the
reward expected
learning, reward value when
an agent observes the state
is as follows:
action a is taken in state s under policy π, defined as follows:
actions, and continuously optimizes its strategy b


Rt = γregarding
rt ∞ its actions. This#feedback is(4)
t "
penalties, and the agent’s objective is
typically pr
to learn th
Qπ (s, a) = Eπ [ Rt |st = s, at = at]= Eπ ∑ γ rt st = s, at = a
t
(6)
long-term
t cumulative reward.
where γ represents the reward discount factor, t denotes Almost the time step, and r represents
all reinforcement learning satisfies th
the reward value at time step t. The state value
4.1. The MDP Framework for Solving Floorplanningcesses function
ProblemsV
(MDPs). A typicalisMDP,
π (s) in an MDP defined as as the in Figur
shown
expected reward value of state s under policy π,
In floorplanning problems, the agent in reinforcementas defined in Equation (5).
(1) States S: alearning finite setinteracts with the states.
of environmental
environment by selecting a perturbation to iteratively ∞
generate
(2) Actions A: anew finitefloorplan solutions.
set of actions taken by the
Vπ (s) =theEπtotal st =and
[Rt |area = E π [  γ t rt | s t = s ]
s] total (5)
The objective is to minimize wirelength,
(3) State t transition which modelserveP as
(s, rewards
a, s′): representin
to encourage the agent to learn better strategies and ultimately state s Є S to find thean optimal
next state,floorplan
s′ Є S, when actio
In thisTo
solution. context,
exploreEπbetter
represents the expected
floorplan solutions,value
the of the reward function under policy
(4)following
RewardMDP function is defined:
R (s, a): representing the nu
π. Similarly, the state–action value function Qπ (s, a) is the expected reward value when
(1) State space S: for the floorplanning problem, a instate
state ss Є S. This rewarda can
S represents be positive, ne
floorplan
action a is taken in state s under policy π, defined as follows:
solution, including a complete sequence of gates (Г+ , Г− ), and the orientation of each

module.
(2) ActionQ π (s, a
space =neighboring
A:) A Eπ [Rt |st =solution
s, at = ofa]a floorplan
= E π [  t
γ t rt | s t = s , a t = a ]
is generated by predefined (6)
perturbations in the action space. The following five perturbations are defined:

4.1. The(a)MDPSwap any twofor


Framework modules
Solvingin the Г+ sequence.
Floorplanning Problems
(b) Swap any two modules in the Г− sequence.
In(c)
floorplanning problems,
Swap one module thethe
from agent in reinforcement
Г+ sequence with onelearning
moduleinteracts
from thewith the en-
Г− sequence.
vironment by selecting
(d) Randomly a perturbation
move a module to to aiteratively
new position generate
in bothnewГ+floorplan
and Г− . solutions. The
objective is to minimize the total area and total
(e) Rotate any module in the sequence pair by 90 . wirelength,
◦ which serve as rewards to
encourage the agent to learn better strategies and ultimately find an optimal floorplan
(3) State transition P: given a state, applying any of the above perturbations will result
solution. To explore better floorplan solutions, the following MDP is defined:
in the agent transitioning to another state, simplifying the probabilistic setting in
(1) State
the space
MDP. S: for the floorplanning problem, a state s Є S represents a floorplan solu-
tion, including
(4) Reward R: Allocatinga complete sequence
rewards of gatestaken
for actions (Г+, Г−in), and the is
a state orientation
crucial in of each mod-
reinforcement
ule.
learning. In this floorplanning problem, the objective is to minimize the area and
(2) Action space A:Thus,
wirelength. A neighboring
the reward solution of a floorplan
is assigned is generated
as the reduction by objective
in the predefined per-A
cost.
turbations in the action space. The following five perturbations
positive reward is assigned whenever the agent discovers a better solution, while no are defined:
(a)reward
Swapisany
assigned otherwise.
two modules TheГ+reward
in the sequence. function is defined as follows:
(b) Swap any two modules  in the Г− sequence.
(c) Swap one module from the − F (s′ ), with
F (Гs)+ sequence F (s′ ) one
< Fmodule
(s) from the Г− sequence.
r= ′ (7)
0, F ( s ) ≥
(d) Randomly move a module to a new position in both Г+ and Г−. F ( s )
(e) Rotate any module in the sequence pair by 90°.
(3) State transition P: given a state, applying any of the above perturbations will result
in the agent transitioning to another state, simplifying the probabilistic setting in the
MDP.
(4) Reward R: Allocating rewards for actions taken in a state is crucial in reinforcement
Appl. Sci. 2024, 14, 2905 7 of 14

The reward r in this context refers to a local reward, representing the reward value ob-
tained when the current floorplan transitions from state s to state s′ through a perturbation.
Here, F represents the optimization objective function defined in Equation (3).

4.2. Deep Reinforcement Learning Algorithm


After defining the MDP framework for solving the floorplanning problem, we choose
to utilize the deep reinforcement learning algorithm to train the agent. In this paper, we
employ the policy gradient (PG) algorithm based on Actor–Critic (AC) architecture, which
is a model-free policy-based algorithm. In comparison to value-based methods, policy-
based methods ensure a faster convergence. The gradient method utilizes gradient descent
to optimize the policy π. Specifically, the parameter θ is associated with the policy π that is
to be optimized. We define an objective function J(θ), as illustrated in Equation (8), where
the objective function J(θ) represents the target of obtaining the expected discounted total
reward by following the parameterized policy π. Hence, our objective is to learn an θ that
maximizes the function J(θ).
J (θ ) = Eπθ [∑ Υr ] (8)
Reinforcement learning algorithms are a type of Monte Carlo method, allowing them
to learn from a sequence of episodes or a series of steps taken by an agent during its
exploration, without prior knowledge of the transition function in the MDP framework. The
parameter vector θ is learned through a deep neural network. This network is referred to as
the policy gradient network, and its weights represent the θ parameters. The policy network,
or agent, is a deep neural network with a set of hidden layers. The activation function of the
last layer of this network is “Softmax”, as the network’s output is a probability distribution
of environmental actions. The policy gradient algorithm updates the parameters θ in the
direction of actions with the highest rewards. The weights are then updated using the
computed gradient, defined as follows:

θ ← θ + α∇θ J (θ ) (9)

α represents the learning rate. By taking the derivative of Formula (8), Formula (10) is
derived, which will be utilized to update the values of the θ parameters. The definition of
this formula is as follows:

∇θ J (θ ) = Eπ [∇θ (log π (τ |θ )) R(τ )] (10)

This is the calculation of the expected trajectory τ obtained by sampling the policy π θ .
R(τ) represents the reward accumulated over a single episode.
To train the policy network, the deep reinforcement learning algorithm is employed
as shown in Algorithm 1, At each step of the episode, the policy network will predict a
probability distribution assigned to the different actions available in the environment given
the state description. The set of state, action, reward and next state in each episode are
recorded. The set of discounted rewards are employed to calculate the gradient and update
the weights of the policy network.
Algorithm 1: Deep Reinforcement Learning Algorithm
Input: number of episodes, number of steps
Output: Policy π
1: Initialize θ (policy network weights) randomly
2: for e in episodes do
3: for s in steps do
4: Perform an action as predicted by the policy network
5: Record s, a, r, s′
6: Calculate the gradient as per Equation (10)
7: end
8: Update θ as per Equation (9)
9: end
Appl. Sci. 2024, 14, 2905 8 of 14

After numerous experiments for parameter tuning and optimization, the hyperparam-
eters settings of the algorithm in this paper are presented in Table 1.

Table 1. Reinforcement learning hyperparameters.

discount factor γ 0.99


learning rate αθ 3 × 10−4
episode length 50
number of iterations 100
number of samples 64

5. Experimental Results
5.1. Experimental Environment and Test Data
The experiment was conducted on a computer with a 12th Gen Intel(R) Core(TM)
(Intel(R), Santa Clara, CA, USA) i5-12500 3.00 GHz CPU and 16.00 GB of RAM memory. The
algorithm we proposed was implemented in Python3.8 using the PyTorch library [44], with
the Adam optimizer [45] applied to train the neural network model. During training, to
save time and prevent overfitting, we employed a stopping mechanism where the training
process would halt if no better solution was found in the final 50 steps. For the simulated
annealing algorithm, we adjusted its parameters through multiple experiments and selected
the best parameter set based on the floorplan results.
Our algorithm was tested on two standard test circuit sets, MCNC and GSRC, and
compared with the simulated annealing algorithm and the deep Q-learning algorithm
proposed by He et al. [36]. The MCNC benchmark comprises five hard modules and five
soft modules, while the GSRC benchmark consists of six hard modules and six soft modules.
Hard modules allow rotation but cannot change their shape, whereas soft modules provide
area and aspect ratio, allowing multiple shapes. In this experiment, our test circuits
consisted of fixed-size hard modules. The basic information about the three test circuits in
the MCNC and GSRC test circuit sets is shown in Tables 2 and 3, respectively.

Table 2. Basic information about MCNC test circuit.

MCNC Blocks I/O Pad Nets Area (×106 )


apte 9 73 97 46.56
xerox 10 2 203 19.35
hp 11 309 83 8.83
ami33 33 42 123 1.16
ami49 49 22 408 35.45

Table 3. Basic information about GSRC test circuit.

GSRC Blocks I/O Pad Nets Area


n100 100 334 885 179,501
n200 200 564 1585 175,696
n300 300 569 1893 273,170

5.2. Experimental Results and Analysis


We conducted experiments using a reinforcement learning-based VLSI floorplanning
algorithm on two internationally standardized test circuit sets, MCNC and GSRC. The
experimental results were compared with the deep Q-learning algorithm and simulated an-
nealing algorithm presented in Ref. [36]. Additionally, floorplan visualization experiments
were performed. Building upon these experiments, we extended our study by incorpo-
rating an additional experiment. The proposed algorithm and the simulated annealing
algorithm were tested on the MCNC and GSRC standard test circuit sets. Three blocks,
namely RAM, ROM, and CPU, were selected from each test circuit and placed as fixed
Appl. Sci. 2024, 14, 2905 9 of 14

modules in predetermined locations. Subsequently, the placement results of our algorithm


were compared with those of the simulated annealing algorithm, followed by floorplan
visualization experiments.

5.2.1. Experimental Results of MCNC and GSRC Test Sets


First, a synergistic optimization of area and wirelength was performed on the MCNC
test set, and the experimental results are presented in Table 4. The dead space (DS) repre-
sents the proportion of gaps between macro-modules within the floorplan border to the
total floorplan area. As shown in the table, the proposed algorithm generated a floorplan
in the MCNC test circuits with a smaller area, DS, and wirelength compared to the simu-
lated annealing algorithm. In terms of layout area and DS, the proposed algorithm also
outperformed the approach described in Reference [36]. However, a direct comparison of
wirelength was not made due to the different estimation method used in Reference [36].
Regarding DS and wirelength, the proposed algorithm achieved an average improve-
ment of 2.7% and 9.1%, respectively, compared to the simulated annealing algorithm.
Furthermore, compared to Reference [36], the proposed algorithm achieved an average
improvement of 1.1% in DS. Therefore, the proposed algorithm demonstrates certain ad-
vantages over the simulated annealing algorithm and Reference [36] in area optimization
for small-scale circuits, as well as outperforming the simulated annealing algorithm in
wirelength optimization.

Table 4. Comparison of experimental results of MCNC test circuit.

Area (×106 ) DS (%) Wirelength (×105 )


MCNC
Ours SA Ref. [36] Ours SA Ref. [36] Ours SA
apte 46.94 47.65 47.08 0.81 2.29 1.10 0.45 0.51
xerox 20.40 20.91 20.42 5.15 7.46 5.24 0.49 0.55
hp 9.20 9.42 9.21 4.02 6.26 4.13 0.32 0.47
ami33 1.22 1.27 1.24 4.92 8.66 6.45 0.61 0.75
ami49 37.31 38.86 38.65 4.99 8.78 8.28 6.33 6.65
Average 23.01 23.62 23.32 3.98 6.68 5.04 1.64 1.79
Normalization 1.000 1.027 1.014 1.000 1.027 1.011 1.000 1.091

Table 5 presents the experimental results comparison for three GSRC test circuits,
revealing that the proposed algorithm outperforms the simulated annealing algorithm
and Reference [36] in terms of floorplan area and DS and is also superior to the simulated
annealing algorithm in wirelength. In comparison with the simulated annealing algorithm,
the proposed algorithm achieves an average improvement of 7.0% in DS and 8.8% in
wirelength. Furthermore, when compared to Reference [36], the proposed algorithm
obtains an average improvement of 3.7% in DS. From both tables, it can be observed
that, as the size of the test circuits increases, the DS for all three methods also increases,
indicating an increased difficulty in floorplan placement. However, the proposed algorithm
demonstrates a further improvement in performance compared to the simulated annealing
algorithm and Reference [36] for large-scale circuits, offering more pronounced advantages
in floorplan area and wirelength optimization.

Table 5. Comparison of experimental results of GSRC test circuit.

Area (×105 ) DS (%) Wirelength (×105 )


GSRC
Ours SA Ref. [36] Ours SA Ref. [36] Ours SA
n100 1.93 1.99 1.95 6.99 9.91 7.95 1.87 1.99
n200 1.99 2.25 2.15 11.71 21.91 18.14 3.37 3.62
n300 3.25 3.58 3.40 15.95 23.70 19.71 4.95 5.49
Average 2.39 2.61 2.50 11.55 18.51 15.27 3.40 3.70
Normalization 1.000 1.092 1.050 1.000 1.070 1.037 1.000 1.088
Appl. Sci. 2024, 14, 2905 10 of 14

5.2.2. Experimental Results of MCNC and GSRC Test Sets with Obstacles
The optimization of area and wirelength for MCNC benchmark circuits was con-
ducted, and the experimental results are presented in Table 6. Since the fixed module
placement constraint was employed, the proposed algorithm is only compared with the
simulated annealing algorithm in this context. It can be observed from the table that, for
all five test circuits, the proposed algorithm achieves a smaller floorplan area, DS, and
wirelength than the simulated annealing algorithm. The proposed algorithm demonstrates
an average improvement of 9.2% in wirelength and 3.4% in DS compared to the simulated
annealing algorithm. Therefore, regarding the MCNC benchmark circuits under the fixed
placement constraint of three modules, the proposed algorithm exhibits more significant
advantages in optimizing the floorplan area and wirelength compared to the simulated
annealing algorithm.

Table 6. Comparison of experimental results of MCNC test circuits with obstacles.

Area (×106 ) Wirelength (×105 ) DS (%)


MCNC
Ours SA Ours SA Ours SA
apte 47.82 48.73 0.52 0.61 2.63 4.45
xerox 20.75 21.59 0.57 0.68 6.75 10.38
hp 9.26 9.52 0.41 0.55 4.66 7.25
ami33 1.27 1.32 0.68 0.81 8.66 12.12
ami49 38.95 41.52 6.47 6.79 8.99 14.62
Average 23.46 24.51 1.73 1.89 6.34 9.76
Normalization 1.000 1.045 1.000 1.092 1.000 1.034

Table 7 compares the experimental results of three GSRC benchmark circuits, under
the constraint of fixed module placement. It can be observed that the proposed algorithm
in this paper outperforms the simulated annealing algorithm in terms of planar floorplan
area, wirelength, and DS. Compared to the simulated annealing algorithm, the proposed
algorithm achieves an average improvement of 11.2% in wirelength and 8.5% in DS. From
Tables 5 and 6, it can be seen that, as the size of the test circuits increases, both methods
experience an increase in wirelength and DS, making the floorplan and routing more chal-
lenging. However, under the constraint of pre-placing three fixed modules, the proposed
algorithm in this paper demonstrates superior performance compared to the simulated an-
nealing algorithm in large-scale circuit testing. Lastly, comparing the experimental results
of pre-placing fixed modules in the MCNC and GSRC benchmark circuits, we can observe
that the former outperforms the latter, indicating that pre-placing fixed modules makes the
floorplanning problem more complex and challenging.

Table 7. Comparison of experimental results of GSRC test circuits with obstacles.

Area (×105 ) Wirelength (×105 ) DS (%)


GSRC
Ours SA Ours SA Ours SA
n100 1.98 2.23 1.97 2.21 9.09 19.28
n200 2.15 2.37 3.46 3.87 18.14 25.74
n300 3.42 3.78 5.03 5.66 20.18 27.78
Average 2.52 2.79 3.49 3.91 15.80 24.27
Normalization 1.000 1.107 1.000 1.112 1.000 1.085
normalization 1.000 1.000 1.000

5.3. Floorplan Visualization


5.3.1. Visualization of MCNC and GSRC Circuit Floorplan
Comparison of MCNC circuit floorplan results are shown in Figure 4. The DS of
the floorplan generated by the algorithm in this paper is only 5.0%, while the simulated
annealing algorithm produces a floorplan with a DS of 8.8%. Comparison of GSRC circuit
5.3. Floorplan Visualization
5.3.1. Visualization of MCNC and GSRC Circuit Floorplan
Appl. Sci. 2024, 14, 2905 Comparison of MCNC circuit floorplan results are shown in Figure 4. The DS of the 11 of 14
floorplan generated by the algorithm in this paper is only 5.0%, while the simulated an-
nealing algorithm produces a floorplan with a DS of 8.8%. Comparison of GSRC circuit
floorplan results are
floorplan shown
results in Figure
are shown in 5. The DS
Figure of the
5. The DSfloorplan generated
of the floorplan by the algorithm
generated by the algorithm
in this paper
in this paper for the n100 test circuit is 7.0%, which is lower thanDS
for the n100 test circuit is 7.0%, which is lower than the 9.9% theachieved
9.9% DSby achieved
the simulated annealing algorithm.
by the simulated From the figures,
annealing algorithm. From theit isfigures,
evidentitthat the floorplans
is evident that thegen-
floorplans
erated by the algorithm
generated by the in this paper
algorithm in have a lower
this paper DSaand
have loweroverall floorplan
DS and overallareas than the
floorplan areas than
floorplans generated generated
the floorplans by the simulated annealingannealing
by the simulated algorithm, both in the
algorithm, bothsmall-scale and
in the small-scale and
large-scale test circuits
large-scale in both in
test circuits benchmark sets. This
both benchmark confirms
sets. the superior
This confirms performance
the superior of
performance of
the algorithm proposed in this paper.
the algorithm proposed in this paper.

(a) (b)
Appl. Sci. 2024, 14, 2905 Figure 4.Figure
the floorplans generatedgenerated
by the algorithm proposed proposed
in this paper (a) and the(a) 12 of 15
simulated
4. The floorplans by the algorithm in this paper and the simulated
annealing algorithm
annealing (b) for the
algorithm (b)ami49
for thetest circuit.
ami49 test circuit.

(a) (b)
Figure 5.Figure
the floorplan
5. Thegenerated
floorplan by the algorithm
generated by theproposed
algorithminproposed
this paperin(a) and
this the simulated
paper (a) and the simulated
annealingannealing
algorithmalgorithm
(b) for the n100 test circuit.
(b) for the n100 test circuit.

5.3.2. Visualization
5.3.2. Visualization of MCNC of and
MCNCGSRC andCircuit
GSRCFloorplan
Circuit Floorplan with Obstacles
with obstacles
Comparison
Comparison of MCNC of circuit
MCNCfloorplan
circuit floorplan results
results with with obstacles
obstacles are shownareinshown
Figurein6.Figure 6.
The DS of the planar floorplan generated by the algorithm in this paper
The DS of the planar floorplan generated by the algorithm in this paper is only 9.0%, while is only 9.0%,
the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC circuit floor- circuit
while the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC
floorplan
plan results results with
with obstacles areobstacles
shown inare shown
Figure 7. in
The Figure
DS of7.the
Thefloorplan
DS of thegenerated
floorplanbygenerated
by theinalgorithm
the algorithm this paperinisthis
onlypaper
9.1%,iswhereas
only 9.1%,
the whereas
DS of thethe DS of the
simulated simulated
annealing annealing
algo-
rithm is algorithm
19.3%. Thisisdemonstrates
19.3%. This demonstrates
the effectivenesstheofeffectiveness
the algorithm ofproposed
the algorithm
in thisproposed
pa- in
per. this paper.

(a) (b)
Comparison of MCNC circuit floorplan results with obstacles are shown in Figure 6.
The DS of the planar floorplan generated by the algorithm in this paper is only 9.0%, while
The DS of the planar floorplan generated by the algorithm in this paper is only 9.0%, while
the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC circuit floor-
the DS of the simulated annealing algorithm is 14.6%. Comparison of GSRC circuit floor-
plan results with obstacles are shown in Figure 7. The DS of the floorplan generated by
plan results with obstacles are shown in Figure 7. The DS of the floorplan generated by
the algorithm in this paper is only 9.1%, whereas the DS of the simulated annealing algo-
Appl. Sci. 2024, 14, 2905
the algorithm in this paper is only 9.1%, whereas the DS of the simulated annealing algo-
rithm is 19.3%. This demonstrates the effectiveness of the algorithm proposed in this pa- 12 of 14
rithm is 19.3%. This demonstrates the effectiveness of the algorithm proposed in this pa-
per.
per.

(a) (b)
(a) (b)
Figure 6. Figure
the floorplans of the ami49
6. The floorplans test
of the circuit
ami49 generated
test by the algorithm
circuit generated proposed
by the algorithm in this paper
proposed in this paper (a)
Figure 6. the floorplans of the ami49 test circuit generated by the algorithm proposed in this paper
(a) and the simulated
and annealing
the simulated algorithm
annealing (b).
algorithm
(a) and the simulated annealing algorithm (b). (b).

(a) (b)
(a) (b)

Figure 7. The floorplans of the n100 test circuit generated by the algorithm proposed in this paper (a)
and the simulated annealing algorithm (b).

6. Conclusions
In this paper, we investigate the floorplanning problem in the integrated circuit design
flow and propose a sequence pair-based deep reinforcement learning floorplanning algo-
rithm. Experimental results on the MCNC and GSRC benchmark circuit sets demonstrate
that our algorithm outperforms the deep Q-learning algorithm and the simulated annealing
algorithm in terms of both DS and wirelength. Moreover, as the circuit size increases
and the difficulty of the floorplan and wiring grows, the advantages of our algorithm
become more pronounced. In recent years, machine learning-based methods have been
increasingly applied in the EDA field. However, the algorithm in this paper also has some
limitations, such as the long optimization time of the algorithm. Next, we aim to explore
novel approaches, such as graph neural networks, within deep learning algorithms to
address floorplanning problems. This integration may potentially enhance the intelligence
and precision of algorithms, thereby significantly improving the quality of the floorplan
optimization results.

Author Contributions: Conceptualization, S.Y.; methodology, S.Y.; software, S.Y.; validation, C.Y.;
formal analysis, C.Y.; investigation, S.D.; data curation, S.D.; writing—original draft preparation, S.Y.;
writing—review and editing, S.D.; visualization, S.Y.; project administration, S.D. All authors have
read and agreed to the published version of the manuscript.
Funding: This work was financially supported by the National Natural Science Foundation of
China (grant no. 61871244, 61874078, 62134002), the Fundamental Research Funds for the Provincial
Universities of Zhejiang (grant no. SJLY2020015), the S&T Plan of Ningbo Science and Technology
Appl. Sci. 2024, 14, 2905 13 of 14

Department (grant no. 202002N3134), and the K. C. Wong Magna Fund in Ningbo University
of Science.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to further study.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Fleetwood, D.M. Evolution of total ionizing dose effects in MOS devices with Moore’s law scaling. IEEE Trans. Nucl. Sci. 2017, 65,
1465–1481. [CrossRef]
2. Wang, L.T.; Chang, Y.W.; Cheng, K.T. (Eds.) Electronic Design Automation: Synthesis, Verification, and Test; Morgan Kaufmann: San
Francisco, CA, USA, 2009.
3. Sherwani, N.A. Algorithms for VLSI Physical Design Automation; Springer Science & Business Media: Berlin/Heidelberg, Germany,
2012.
4. Adya, S.N.; Chaturvedi, S.; Roy, J.A.; Papa, D.A.; Markov, I.L. Unification of partitioning, placement and floorplanning. In
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, ICCAD-2004, San Jose, CA, USA, 7–11
November 2004; pp. 550–557.
5. Markov, I.L.; Hu, J.; Kim, M.C. Progress and challenges in VLSI placement research. In Proceedings of the International Conference
on Computer-Aided Design, San Jose, CA, USA, 5–8 November 2012; pp. 275–282.
6. Gubbi, K.I.; Beheshti-Shirazi, S.A.; Sheaves, T.; Salehi, S.; Pd, S.M.; Rafatirad, S.; Sasan, A.; Homayoun, H. Survey of machine
learning for electronic design automation. In Proceedings of the Great Lakes Symposium on VLSI 2022, Irvine, CA, USA, 6–8
June 2022; pp. 513–518.
7. Garg, S.; Shukla, N.K. A Study of Floorplanning Challenges and Analysis of macro placement approaches in Physical Aware
Synthesis. Int. J. Hybrid Inf. Technol. 2016, 9, 279–290. [CrossRef]
8. Subbulakshmi, N.; Pradeep, M.; Kumar, P.S.; Kumar, M.V.; Rajeswaran, N. Floorplanning for thermal consideration: Slicing with
low power on field programmable gate array. Meas. Sens. 2022, 24, 100491.
9. Tamarana, P.; Kumari, A.K. Floorplanning for optimizing area using sequence pair and hybrid optimization. Multimed. Tools Appl.
2023, 1–23. [CrossRef]
10. Nakatake, S.; Fujiyoshi, K.; Murata, H.; Kajitanic, Y. Module packing based on the BSG-structure and IC layout applications. IEEE
Trans. Comput.-Aided Des. Integr. Circuits Syst. 1998, 17, 519–530. [CrossRef]
11. Guo, P.N.; Cheng, C.K.; Yoshimura, T. An O-tree representation of non-slicing floorplan and its applications. In Proceedings of
the 36th annual ACM/IEEE Design Automation Conference, New Orleans, LA, USA, 21–25 June 1999; pp. 268–273.
12. Lin, J.M.; Chang, Y.W. TCG-S: Orthogonal coupling of P*-admissible representation with worst case linear-time packing scheme.
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2004, 23, 968–980. [CrossRef]
13. Chang, Y.C.; Chang, Y.W.; Wu, G.M.; Wu, S.-W. B*-trees: A new representation for non-slicing floorplans. In Proceedings of the
37th Annual Design Automation Conference, Los Angeles, CA, USA, 5–9 June 2000; pp. 458–463.
14. Yu, S.; Du, S. VLSI Floorplanning Algorithm Based on Reinforcement Learning with Obstacles. In Proceedings of the Biologically
Inspired Cognitive Architectures 2023—BICA 2023, Ningbo, China, 13–15 October 2023; Springer Nature: Cham, Switzerland,
2023; pp. 1034–1043.
15. Zou, D.; Wang, G.G.; Sangaiah, A.K.; Kong, X. A memory-based simulated annealing algorithm and a new auxiliary function for
the fixed-outline floorplanning with soft blocks. J. Ambient. Intell. Humaniz. Comput. 2017, 15, 1613–1624. [CrossRef]
16. Liu, J.; Zhong, W.; Jiao, L.; Li, X. Moving block sequence and organizational evolutionary algorithm for general floorplanning
with arbitrarily shaped rectilinear blocks. IEEE Trans. Evol. Comput. 2008, 12, 630–646. [CrossRef]
17. Fischbach, R.; Knechtel, J.; Lienig, J. Utilizing 2D and 3D rectilinear blocks for efficient IP reuse and floorplanning of 3D-integrated
systems. In Proceedings of the 2013 ACM International symposium on Physical Design, Stateline, NV, USA, 24–27 March 2013;
pp. 11–16.
18. Fang, Z.; Han, J.; Wang, H. Deep reinforcement learning assisted reticle floorplanning with rectilinear polygon modules for
multiple-project wafer. Integration 2023, 91, 144–152. [CrossRef]
19. Tang, X.; Tian, R.; Wong, D.F. Fast evaluation of sequence pair in block placement by longest common subsequence computation.
In Proceedings of the Conference on Design, Automation and Test in Europe, Paris, France, 27–30 March 2000; pp. 106–111.
20. Tang, X.; Wong, D.F. FAST-SP: A fast algorithm for block placement based on sequence pair. In Proceedings of the 2001 Asia and
South Pacific design automation conference, Yokohama, Japan, 2 February 2001; pp. 521–526.
21. Dayasagar Chowdary, S.; Sudhakar, M.S. Linear programming-based multi-objective floorplanning optimization for system-on-
chip. J. Supercomput. 2023, 1–24. [CrossRef]
22. Tabrizi, A.F.; Behjat, L.; Swartz, W.; Rakai, L. A fast force-directed simulated annealing for 3D IC partitioning. Integration 2016, 55,
202–211. [CrossRef]
Appl. Sci. 2024, 14, 2905 14 of 14

23. Tung-Chieh, C.; Yao-Wen, C. Modern floorplanning based on B*-tree and fast simulated annealing. IEEE Trans. Comput.-Aided
Des. Integr. Circuits Syst. 2006, 25, 637–650. [CrossRef]
24. Sadeghi, A.; Lighvan, M.Z.; Prinetto, P. Automatic and simultaneous floorplanning and placement in field-programmable gate
arrays with dynamic partial reconfiguration based on genetic algorithm. Can. J. Electr. Comput. Eng. 2020, 43, 224–234. [CrossRef]
25. Chang, Y.F.; Ting, C.K. Multiple Crossover and Mutation Operators Enabled Genetic Algorithm for Non-slicing VLSI Floorplan-
ning. In Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 18–23 July 2022; pp. 1–8.
26. Tang, M.; Yao, X. A memetic algorithm for VLSI floorplanning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2007, 37, 62–69.
[CrossRef]
27. Xu, Q.; Chen, S.; Li, B. Combining the ant system algorithm and simulated annealing for 3D/2D fixed-outline floorplanning.
Appl. Soft Comput. 2016, 40, 150–160. [CrossRef]
28. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018.
29. Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al.
Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [CrossRef]
30. Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy
updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3
June 2017; pp. 3389–3396.
31. Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016,
arXiv:1611.09940.
32. Zhou, C.; Wu, W.; He, H.; Yang, P.; Lyu, F.; Cheng, N.; Shen, X. Deep reinforcement learning for delay-oriented IoT task scheduling
in SAGIN. IEEE Trans. Wirel. Commun. 2020, 20, 911–925. [CrossRef]
33. Nazari, M.; Oroojlooy, A.; Snyder, L.; Takac, M. Reinforcement learning for solving the vehicle routing problem. Adv. Neural Inf.
Process. Syst. 2018, 31.
34. Huang, J.; Patwary, M.; Diamos, G. Coloring big graphs with alphagozero. arXiv 2019, arXiv:1902.10162.
35. Mirhoseini, A.; Goldie, A.; Yazgan, M.; Jiang, J.W.; Songhori, E.; Wang, S.; Lee, Y.-J.; Johnson, E.; Pathak, O.; Nazi, A.; et al. A
graph placement methodology for fast chip design. Nature 2021, 594, 207–212. [CrossRef]
36. He, Z.; Ma, Y.; Zhang, L.; Liao, P.; Wong, N.; Yu, B.; Wong, M.D.F. Learn to floorplan through acquisition of effective local search
heuristics. In Proceedings of the 2020 IEEE 38th International Conference on Computer Design (ICCD), Hartford, CT, USA, 18–21
October 2020; pp. 324–331.
37. Cheng, R.; Yan, J. On joint learning for solving placement and routing in chip design. Adv. Neural Inf. Process. Syst. 2021, 34,
16508–16519.
38. Agnesina, A.; Chang, K.; Lim, S.K. VLSI placement parameter optimization using deep reinforcement learning. In Proceedings of
the 39th International Conference on Computer-Aided Design, Virtual, 2–5 November 2020; pp. 1–9.
39. Vashisht, D.; Rampal, H.; Liao, H.; Lu, Y.; Shanbhag, D.; Fallon, E.; Kara, L.B. Placement in integrated circuits using cyclic
reinforcement learning and simulated annealing. arXiv 2020, arXiv:2011.07577.
40. Xu, Q.; Geng, H.; Chen, S.; Yuan, B.; Zhuo, C.; Kang, Y.; Wen, X. GoodFloorplan: Graph Convolutional Network and Reinforcement
Learning-Based Floorplanning. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2021, 41, 3492–3502. [CrossRef]
41. Shahookar, K.; Mazumder, P. VLSI cell placement techniques. ACM Comput. Surv. (CSUR) 1991, 23, 143–220. [CrossRef]
42. Gaon, M.; Brafman, R. Reinforcement learning with non-markovian rewards. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3980–3987.
[CrossRef]
43. Bacchus, F.; Boutilier, C.; Grove, A. Rewarding behaviors. Proc. Natl. Conf. Artif. Intell. 1996, 13, 1160–1167.
44. Zimmer, L.; Lindauer, M.; Hutter, F. Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl. IEEE Trans. Pattern
Anal. Mach. Intell. 2021, 43, 3079–3090. [CrossRef]
45. Zhang, Z. Improved adam optimizer for deep neural networks. In Proceedings of the 2018 IEEE/ACM 26th International
Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like