16 MCH 1

Transactions on Design Automation of Electronic Systems
A PUS Based Nets Weighting Mechanism for Power, Hold,

and Setup Timing Optimization
Journal: Transactions on Design Automation of Electronic Systems
Manuscript ID Draft
Manuscript Type: Paper

Fo
Date Submitted by the
n/a
Author:
Complete List of Authors: CHENTOUF, Mohamed; Siemens PLM Software, Siemes EDA; L'Ecole
rP
Nationale Supérieure d'Informatique et d'Analyse des Systèmes,

University Mohammed V
Alaoui Ismaili, Zine El Abidine; L'Ecole Nationale Supérieure
d'Informatique et d'Analyse des Systèmes, University Mohammed V
ee
Computing Classification Application-specific integrated circuits, Timing Driven Placement, hold

Systems : timing optimization, Predictive Useful-Skew, physical design
rR
ev
iew
Page 1 of 11 Transactions on Design Automation of Electronic Systems
1
2
3
4
A PUS Based Nets Weighting Mechanism for
5
6 Power, Hold, and Setup Timing Optimization
7 Mohamed Chentouf1, 2 (Corresponding author) (mohamed_chentouf@mentor.com),
8
9 Zine El Abidine Alaoui Ismaili2 (z.alaoui@um5s.net.ma)
10 1 Mentor a Siemens Business/CSD Calypto Division, Rabat, 10010, Morocco
11 2 Information, Communication and Embedded Systems (ICES) Team, University Mohammed V, Rabat, 10010, Morocco
12
13
Abstract— Power consumption has become a major constraint have used the timing information to calculate the net weighting
14
in VLSI design. A considerable power increase is usually seen to drive the placer to be timing aware (TDP) [1].
15 during the hold closure step of the physical design done in post-
16 Many algorithms were proposed to improve the placement
CTS and post-route stages. Hold optimization is performed by
17 quality in terms of timing, routability, area or power. Kong T.
applying some circuit-level changes such as buffer insertion, cell
18 sizing, useful-skew or cell movement. Moving the hold fixing proposed a new weighting algorithm that takes into
19 problem to the pre-CTS stage represents a big opportunity for consideration the number of critical paths that share a common
segment and assigns a hi0gher weight to the edges of this
Fo
20 power saving and design closure improvement. In this paper, we
present a novel power, hold, and setup driven placement common segment [2]. Another approach to overcome the timing
21 algorithm. The objective is to reduce not only the setup, but also closure problem was proposed by Papa D. et al., it showed that
22 the hold violations while keeping the power consumption under a linear-wire-delay model is sufficient to model the impact of
23
rP
control. This objective is achieved by changing the weighting
buffering in the placement stage, then developed RUMBLE, a
24 mechanism of a commercial Power and Timing Driven Placement
linear programming based TDP which includes buffering for
25 (PTDP) engine to include power, hold and electrical Design Rule
slack-optimal placement [3]. Another approach was used by
26 Constraints (eDRC) in the weighting equation which will drive the
ee
placer to place the cells that are in the setup critical paths or Wang Q. et al. to improve the placement timing by optimizing
27
connected with high power nets close to each other and relax the iteratively the timing-critical sub-circuit by Linear
28 weight of the cells that are on hold critical paths, so the placer may Programming and timing driven legalization [4].
29 place them far from each other. As a consequence, critical setup,
rR
With the increase in designs complexity and QoR

30 power or eDRC nets will be shortened to reduce the delay, and
requirements, new parameters were introduced in the weight
31 critical hold nets will be elongated to add delay and hence improve
calculation such as power and routability. [5] Proposed a
32 the placement overall Quality of Results (QoR). This approach was
weighting formulation that includes the power metric to
ev
33 deployed on 40 industrial designs of different customers, sizes,

technologies, and complexities and showed very good improve the traditional TDP. We proposed a new enhancement
34
improvement, not only in timing (setup and hold) and power in [6] to prioritize nets in a non-linear formulation and to
35 consumption but also in total area and design routability. The improve the design routability. Our previously presented works
36
iew
timing gain is about 15% and 13% in TNS and THS respectively. did not take into account the hold timing requirement due to the
37 The total power gain is about 9%, distributed as 7% in leakage ideal nature of the clock at this early stage of the design
38 power and 9% in dynamic power.
implementation. Although, hold closure after the Clock Tree
39 Synthesis (CTS) causes a big power increase due to the inserted
40 Index Terms— Application-specific integrated circuits (ASIC),
delay elements. Taking the hold requirement into the weighting
41 Timing Driven Placement (TDP), hold timing optimization, setup
timing optimization, Predictive Useful-Skew (PUS), static timing formulation is an opportunity to reduce further the power
42 analysis, electrical design rule constraints, electronic design consumption and to improve the design closure.
43 automation, physical design, global routing, power optimization,
44 Total Hold Slack (THS), Worst Hold Slack (WHS), Total Negative In this paper, we propose a new linear programming (LP) that
45 Slack (TNS), Worst Negative Slack (WNS), Clock Skew. includes the hold parameter in the weight calculation based on
46 the predictive useful-skew methodology. In [7], Chan et al.
47 showed that the application of useful-skew at the pre-CTS stage
I. INTRODUCTION
48 improves the timing correlation between pre-CTS and post-CTS
49 The nets weighting is a technique that has been extensively
stages. Thus, we will use the predictive useful-skew (PUS) to
50 studied in recent decades, it is used to drive the placer to
perform the STA (Static Timing Analysis) and estimate the hold
51 produce different results depending on the objective to
timing at the pre-CTS stage, then we will include the estimated
52 minimize. It was originally used to reduce the Total Wire-
hold in the weight calculation formula to relax the constraints
53 Length (TWL). With the technology scale, the complexity of
on nets that are in hold critical paths and to drive the TDP to be
54 designs has increased considerably and the interconnect delay
hold-aware. The main contributions of this work are
55 exceeded the cell delay. Thus, designers and EDA providers
summarized as follows:
56
57 1
58
59
60
Transactions on Design Automation of Electronic Systems Page 2 of 11
1
2 1. A novel nets weighting calculation formula that includes its placement model [15] [16] [17]. In this context, Tsay R. et
3 the hold factor, besides the setup and the power al. have implemented an analytic weighting mechanism that
4 parameters. transforms the timing information into net weight and compiles
5 a weighted wire-length minimization engine [18], the results
2. A flow that integrates the new formula in the placement were significant in terms of runtime and timing. Dunlop A. et
6 stage and measures its benefits at the post-CTS stage of the
7 al. have proposed an iterative update of the net weights with a
PnR flow. continuous model to improve the placement convergence and to
8
9 The remainder of this paper is organized as follows: overcome the limitation of static net weighting. [39]
10 Section II gives a global overview of the TDP and its related In the last decade, the IC market focus has shifted from circuit
11 work. It also gives the state of art of the predictive useful-skew speed-only to circuit speed-power efficiency. More parameters
12 application in modern VLSI designs. Section III describes the have been added to the placement formulation to reduce the
13 new weighting equations to calculate the nets weight based on power consumption early in the design process.
14 their power, setup, and hold timing characteristics, and gives a In general, the sources of power dissipation in an IC are
15 detailed explanation of the new weighting mechanism divided into three board categories. Switching power, Short-
16 integration in the Place and Route (PnR) flow. Section IV circuit power, and Leakage power [19]. The leakage power is
17 presents the results achieved with this new approach, especially the energy consumed due to leakage current in the MOS
18 the power, area and timing gains. Finally, section V gives a technologies. The short-circuit power is the energy consumed
19 conclusion and draws the perspectives of our study. due to the short circuit current that flows during the transition
Fo
20 time of the MOS transistors . While the dynamic power is the
21 II. OVERVIEW OF TIMING DRIVEN PLACEMENT AND power needed to perform the circuit computations by charging
22 USEFUL-SKEW PREDICTION and discharging all parasitic capacitances of the design. Many
23 types of research and development were carried out to include
rP
A. Placement Overview power reduction in the placement stage. In [20] Obermier et al.
24
To overcome the placement challenges, many approaches have introduced the power density into the placement
25
were developed to simplify the task and make it formulation which led to a flat temperature distribution and a
26
ee
computationally less intensive, to produce a good solution for good power and heat reduction. In [5] Cheon T. et al. proposed
27
designs with multi-millions of objects (gates, pins, nets, a register clustering technique to reduce the clock power
28
macros), in a reasonable runtime. Historically, the placement dissipation by included the power in the weight equation to
29
rR
algorithms can be roughly grouped into four classes: Partition shorten high power nets, and was able to achieve a gain of
30 based placement [8] [9] [10], quadratic placement [11] [12], 11.4% in power with a minor timing degradation of 2% and area
31 simulated annealing based placement [13] [14] and nonlinear overhead of 1.2%.
32 placement [9].
ev
33
Initially, EDA placement algorithms were timing-driven to B. Useful-skew Overview
34
maximize circuit performance. Net-weighting was a known
35 Two decades ago, the skew minimization was of high
technique used to drive the placement to minimize a specific
36
iew
importance in the physical implementation flow. The skew was

objective based on the weight formulation. It has two forms of
37 implementation: Static nets weighting and dynamic nets
considered as a limiting factor of the maximum operating
38 weighting.
frequency. This is based on the well-known inequality (1) used
39 to deduce the minimum clock period [21], [22].
Static nets weighting computes the weights of nets once at
40
the beginning of placement. The disadvantages of this approach
41 clockperiod ≥ td + tskew + tsu + tds (1)
are that timing and wire length change during and after the
42
placement, which makes the timing picture obsolete.
43 Where td is the longest data path delay, tskew is the clock skew,
The dynamic nets weighting was proposed as a solution to
44 tsu is the setup time, tds is the propagation delay through the
update the nets weight along with the placement progression
45 synchronous elements.
based on the current state of the design.
46
Usually, there are two passes of TDP in a standard PnR flow.
47 So, to maximize the circuit speed, designers strive to
The first one is net-based and the second is path-based. The net-
48 minimize skews between different leaves of the clock tree.
based approach prioritizes nets based on their timing criticality
49 Many implementations’ algorithms were developed to reduce
to implicitly optimize critical paths delay. The path based
50 the clock skew in different flow stages.
placement overcomes the shortcoming of the net-based
51
algorithm to improve further the weight objective and considers
52
all paths or a subset of the most critical paths simultaneously in
53
54
55
56 2
57
58
59
60
1
2 Placement: Skew minimization was considered from the Start with a fully placed and
3 placement stage. In [23] and [24], sequential cell sites are legalized design database
4 mapped to predefined locations of a template clock tree in the
5 middle of the quadratic placement. [25] Proposed a modified
6 scheme to perform iterative placement modification based on Run the Predictive Useful Skew
7 [26] and skew optimization such as in [27] to produce a skew (PUS)
8 aware placement.
9
10 CTS: F. Niu et al. proposed an obstacle aware zero skew clock Run the CTS to realize the
11 tree synthesis flow which consists of two steps: the first step estimated PUS
12 generates the topology of the clock tree. Then an Obstacle-
13 aware Deferred Merge Embedding (ODME) algorithm is
14 applied to complete the clock tree routing. [40] Optimize the design to correct the
15 setup and hold timing
Routing: Several works have applied wire sizing to reduce
16
clock skew [28][29][30]. Guthaus et al. [31] proposed a
17
sequential linear programming as well as quadratic
18
programming based clock buffer/wire sizing to minimize clock QoR Assessment
19
skew. Shu et al. [32] performed wire sizing for skew
Fo
20 Fig. 1. Reference Flow
minimization.
21
22
23 Shift from Zero Skew to Useful-skew: Zero-skew Clock tree
rP
24 was an active field of research, but more recently, it has been Start with a fully placed and
25 proved that ”exact zero-skew ” comes at the cost of increased legalized design database
26 power consumption and wire length. Friedman et al. pointed out
ee
27 that the “zero-skew” objective is over constraining. They

28 proposed a linear programming formulation that combines the Run the Predictive Useful Skew
29 clock scheduling and clock tree topology synthesis. The (PUS)
rR
30 proposed algorithm achieved a performance improvement of

31 above 60% on a variety of industrial and benchmark designs
32 [33]. This has led to a paradigm shift from skew minimization Net weighting and Incremental Placement
to useful-skew optimization, and the useful-skew has become a
ev
33 based on PUS
34 technique for timing and power optimization where the clock Based net weighting
35 latencies of FFs are skewed intentionally to increase the clock Apply dynamic Net-Weighting and incremental
36 frequency and timing margins of the design [34] [13].
iew
Global Placement
37 B. Placement and Useful-skew Combination
38 Legalization and Global Route reparation
Recently, some placement optimization techniques were
39
introduced after clock tree synthesis to improve the early slack
40
while preserving an optimized late slack. In [35], Huang et al.
41
proposed some placement modifications (in place Run the CTS to realize the
42
optimizations) to predict the optimal Steiner tree topology after estimated PUS
43
each move and then optimize the clock tree by a clock tree re-
44
connection mechanism. The main limitations of this approach
45
are its focus on the placement optimization of FF only, and the Optimize the design to correct the
46
number of clock tree modifications that could impact the design setup and hold timing
47
closure negatively, especially in complex SoCs. To overcome
48
this limitation, we will take the early slack optimization to the
49
pre-CTS stage based on the predictive useful-skew timing QoR Assessment
50
information and we will combine [35], [6] and [7] to generate a
51 Fig. 2. New PUS Driven Flow
placement (setup, hold and power)-aware, and to give the CTS
52
engine a fully formulated problem to get the maximum benefit
53
from it.
54
55
56 3
57
58
59
60
1
2 III. OUR APPROACH FOR NETS WEIGHTING 16nm, the weighting is dominated by the power factor.
3 CALCULATION Applying the normalization helps to standardize the weighting
4 process and provide more controllability over the nets
As shown in section II, traditional PnR flows optimize the criticality.
5 hold timing after the CTS stage, and use the useful-skew
6 scheduling to drive the clock tree synthesis engine to realize the The next step in the process is to calculate a timing-based
7 previously computed offsets instead of targeting a zero-skew weight which is a combination of setup and hold timing slacks
8 clock tree. In this section, we will use the predictive useful-skew (Algorithm 1). The algorithm calculates first the setup and the
9 to calculate the nets weight and to perform an iterative (setup, hold timing criticality of the net based on the slack and the
10 hold, and power)-driven incremental placement. number of critical paths going through it. The netsetup is the sum
11 of the negative setup slacks of the violated setup timing paths
12 A. New Hold Aware Incremental Placement Flow traversing the net. Similarly, the nethold is the sum of the negative
13 Usually, hold fixing is performed at the post-CTS stage, hold slacks of the violated hold timing paths traversing the net.
14 where the clocks are fully propagated. The main technique for
15 The timing-based weight is then calculated depending on
hold fixing is the insertion of delay elements to slow down the
16 whether the net is setup or hold critical. A prioritization
data signal. This technique comes with a non-negligible cost of
17 parameter α is used to control the ratio of each factor to the final
power consumption.
18 timing-based weight and to provide a knob for the setup-hold
By treating the hold fixing problem from the placement stage,
19 trade-off.
we will reduce the number of buffers inserted during the post-
Fo
20 CTS hold optimization. Which is very beneficial for power and
21 area reduction as well as routability improvement. The outlines Algorithm 1: Timing based weight formula
22 of our new flow, as well as the reference flow, are illustrated in
23 1: for net ∈ design data nets do
rP
Fig. 1 and Fig. 2. 𝑘
24 Our reference flow is a traditional flow that uses the 2: 𝑛𝑒𝑡𝑠𝑒𝑡𝑢𝑝 = ∑ 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒_𝑠𝑒𝑡𝑢𝑝(𝑝𝑎𝑡ℎ𝑖)
𝑖=0
25 predictive useful-skew and drives the clock tree synthesis 𝑘
3: 𝑛𝑒𝑡ℎ𝑜𝑙𝑑 = ∑𝑖=0 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒_ℎ𝑜𝑙𝑑(𝑝𝑎𝑡ℎ𝑖 )
26 engine to fully exploit the potential of useful-skew. In the new
ee
27 flow, a step is added to perform a dynamic net weighting and 4: if 𝑛𝑒𝑡𝑠𝑒𝑡𝑢𝑝 < 0 and 𝑛𝑒𝑡ℎ𝑜𝑙𝑑 < 0 then
28 incremental global placement based on the predictive useful- 5: wt = (1 − α) · 𝑛𝑒𝑡ℎ𝑜𝑙𝑑 − α · 𝑛𝑒𝑡𝑠𝑒𝑡𝑢𝑝
29 skew timing information. The weighting calculation formula 6: else
rR
30 from [6] is modified to include the hold factor in addition to 7: if 𝑛𝑒𝑡𝑠𝑒𝑡𝑢𝑝 ≥ 0 and 𝑛𝑒𝑡ℎ𝑜𝑙𝑑 < 0 then
31 setup, power and eDRC parameters. A pass of legalization and 8: wt = (1 − α) · 𝑛𝑒𝑡ℎ𝑜𝑙𝑑
32 global routing repair is performed to clean all illegal cells and 9: else
ev
33 repair the routing to have a good routing congestion estimation. 10: if 𝑛𝑒𝑡𝑠𝑒𝑡𝑢𝑝 < 0 and 𝑛𝑒𝑡ℎ𝑜𝑙𝑑 ≥ 0 then
34 The CTS engine is called afterward to realize the previously 11: wt = - α · 𝑛𝑒𝑡𝑠𝑒𝑡𝑢𝑝
35 calculated offsets and then a pass of timing optimization to 12: else
36
iew
correct the remaining violations. 13: wt = 0

37
B. New nets-Weighting Algorithm: 14: end if
38
15: end if
39 The weighting algorithm is based on a new formulation that
16: end if
40 combines setup, hold, eDRC and power information to drive the
17: end for
41 incremental placement. The algorithm estimates first the power
42 of each data net and assigns it to a power-based weight (wp)
43 projected in an interval of [0-100] (2).
44 100
After calculating the timing-based weight, an eDRC-based
45 wp = np . (2) weight is calculated using the max transition violation (mtv) of
𝑀𝑎𝑥(𝑛𝑒𝑡𝑠𝑝𝑜𝑤𝑒𝑟 )
46 the sink cells and the max capacitance violation (mcv) of the
47 Where wp is the power-based weight, np is the net power, and driver cell (Algorithm 2). The transition-based factor (netmtv) is
48 𝑀𝑎𝑥(𝑛𝑒𝑡𝑠𝑝𝑜𝑤𝑒𝑟 ) is the maximum net power. chosen to be the maximum of all fan-out cells mtvs, while the
49 capacitance-based factor (netmcv) is the fan-in cell mcv. Both
50 In [6], no normalization was applied during power and timing factors are combined to generate the eDRC-based weight.
51 weights calculations which makes the nets criticality very
52 technology-dependent. It was noticed that for some designs the
53 weighting is dominated by the timing factor while in some other
54 designs where the frequency is high or the technology is below
55
56 4
57
58
59
60
1
2 Algorithm 2: eDRC based weight formula topology. To maximize the benefit from the new weighting
3 1: for net ∈ design data nets do approach, no displacement constraints are applied on the
4 movable cells during the incremental global placement, this
2: netmtv=Max (Max_transiton violations of fan-out cells)
5 allows cells on the hold critical paths to be moved by the
3: netmcv= Max_capacitance violation of the fan-in cell
6 necessary distances to meet or reduce the hold violations. On
4: if netmtv < 0 and netmcv < 0 then
7 𝑀𝑎𝑥(𝑛𝑒𝑡𝑠𝑚𝑡𝑣) the other hand, the legalization engine of Nitro-SoC is designed
5: wdrc = - (netmtv + (netmcv. )) to limit the displacement with a default maximum allowable
8 𝑀𝑎𝑥(𝑛𝑒𝑡𝑠𝑚𝑐𝑣)
9 6: else displacement of 10 rows. So, after the global placement, no cell

10 7: if netmtv < 0 and netmcv > 0 then will be moved more than 10 rows at the legalization process
8: wdrc = - netmtv which helps to preserve the results achieved by the global
11
9: else placement engine. As for the congestion, it was treated at the
12
10: if netmtv > 0 and netmcv < 0 then core placement engine by introducing the fan-out parameter to
13 𝑀𝑎𝑥(𝑛𝑒𝑡𝑠𝑚𝑡𝑣)
wdrc = - (netmcv. )) the weighting equation and by calling Nitro-SoC congestion
14 11:
𝑀𝑎𝑥(𝑛𝑒𝑡𝑠𝑚𝑐𝑣)
driven global router. Also, since the hold parameter is included
15 12: else
in the placement, less buffers and inverters will be required to
16 13: wdrc = 0
close the hold timing at the post-CTS stage which will reduce
17 14: end if
the area, the number of nets, the number of pins, and will
18 15: end if
improve further the congestion metric.
19 16: end if
Fo
20 17: end for A QoR assessment is done afterward to check whether to
21 accept the placement changes or not, the TNS, THS and Power
22 are combined to calculate a placement score. This operation is
23
rP
A pass of normalization is done after calculating both timing- repeated multiple times with different α and β values to improve
24
related factors to project wt and wdrc parameters into an interval
25 Record Placement coordinates of each
of [0-100]. After normalization, all parameters are combined cell and design QoR
26
ee
along with a fan-out factor to calculate the nets placement

27 weight (3). A new prioritization parameter (β) is added to
28 control the ratio of the power weight to the final weight and
Set random values for α and β
29
rR
provides a knob to trade-off between timing and power

30 objectives in the placer. The fan-out number is used to reduce
31 the weight of the high fan-out net. High weight for a high fan- Net weighting + Incremental placement
32 out net will lead the placer to put all the cells connected to that
ev
33 net in the same bin, which can lead to a severe congestion and
34 legalization problems and consequently, a non-routable design. Legalization and Global Route
35 reparation
36
iew
1
𝑤𝑛𝑒𝑡 = . 𝑒(𝛽.(𝑤𝑡 + 𝑤𝑑𝑟𝑐 )+(1−𝛽 ).𝑤𝑝 ) (3)
37 𝐹𝑎𝑛−𝑜𝑢𝑡
38 Finally, a new pass of normalization is performed to project

NO YES
39 all placement weight values into an interval of [0-1000]. If QoR
Improves
40 Thousands of experiments were carried out to define the best
41 values of α and β, but it was observed that these values are very Revert placement Accept changes, Record new
design-dependent and there are no values that give the best
42 changes placement, Record new QoR
power performance for all designs, although in the context of
43
one design, α and β knobs could be adjusted to achieve the
44
optimum of this specific design case. The following dynamic
45
net weighting incremental placement combination was
46 YES Iteration < Max
proposed to overcome this obstacle (Fig. 3). It starts by Iteration
47 (default 10)
recording the original placement and QoR. Then, random values
48
between 0 and 1 are assigned to the prioritization parameters α
49 NO
and β.
50 Exit
51 The net weighting (3) is applied to drive the incremental
52 global placement to improve the hold, setup and power Fig. 3. Net weighting and Incremental
53 consumption, followed by a legalization and a global routing Placement based on PUS
54 calls to legalize the placement and to repair the routing
55
56 5
57
58
59
60
1
2 the QoR iteratively. It was seen through multiple experiments
3 that the values of α and β are very design dependent and that
4 there are no specific values that give the optimum for all
5 designs. The randomization and acceptance/reverting process
6 was proven to be a good solution, due to the runtime/QoR gains
7 trade-offs. The outcome of this flow is a new placement with an
8 improved or similar QoR. The QoR improvements achieved by
9 this approach are due to the introduction of the hold parameter
10 in the placement formulation and to the multiple incremental
placement iterations that improve the convergence after each
11
accepted iteration.
12 Fig. 4a. Std cells placement after the Fig. 4b. Design congestion map after
13 TDP. the TDP.
IV. CASE STUDY – NEW VS DEFAULTS PLACEMENT OF A
14
SIMPLE DESIGN:
15
16 In this section, we will use a simple design of around 7k
17 standard cells to show and explain the benefits of our placement
18 flow (Fig. 2) compared to the default flow (Fig. 1). The starting
19 point of both flows is a placed and legalized design from [6],
Fo
20 which means that the additional gains achieved by the new
21 approach are due to the PUS introduction in the weighting
22 mechanism. In [6], power, timing and fan-out factors are
23 already used for weight calculation.
rP
24
The original placement as shown in Fig. 4a is a well spread
25
placement with a good congestion (Fig. 4b) that is easily Fig. 5a. Std cells placement after the Fig. 5b. Design congestion map after
26
ee
routable. The routing difficulty is reflected by the colors in the CTS and Post-CTS. the CTS and Post-CTS.
27
congestion map, blue color means that the design is easily
28
routable, green means routable, yellow means hardly routable,
29
rR
and red means unroutable.

30
31 Running the reference flow on the case-study design
32 resulted in a post-CTS implementation with a distributed clock
ev
33 network and with no major changes in the overall placement

34 picture. Figs. 5a and 5b. Some congestion increase is noticed
35 around the macro corner due to the cells added in the CTS and
36
iew
post-CTS optimizations.
37
38 The new algorithm optimizes the placement based on the
39 predicted useful-skew before balancing the clock network and Fig. 6a. Std cells placement after the Fig. 6b. Design congestion map after
optimizing the setup and hold timings. It was evaluated on the PUS driven placement, CTS and Post- the PUS driven placement, CTS and
40 CTS. Post-CTS.
41 same testcase to generate a post-CTS implementation (Fig. 6a
42 and Fig. 6b). It is clear that the design went through strong It can be noticed that the new flow has clustered the cells in
43 placement modifications to improve the setup and hold timings, several clusters after several placement iterations based on the
44 but since the congestion is monitored during each pass, the predicted useful skew mechanism. Running the incremental
45 design is still routable. Although the green zones have increased placement with different α and β values while monitoring the
46 in the new flow, the yellow zone is smaller than the one seen timing and congestion has helped to shorten critical setup
47 with default flow due to the automatic reduction of cells density timing paths and to insert useful skew delays in the clock
48 around the macro corner due to the congestion feedback loop, network to balance the setup and hold timings without
49 which means that the design generated with the new flow is degrading the congestion. Also, less buffers and inverters are
50 easily routable compared to the default flow. needed to close the hold timing at the post-CTS stage which
51 reduced the area, the number of nets, the number of pins, and
52 improved further the congestion
53
54
55
56 6
57
58
59
60
1
2 The default flow has spread the cells over the design in order Another benefit of treating the hold at the placement stage is
3 to achieve the best congestion-timing trade-off in one pass, the area and utilization reductions achieved since less delay
4 which resulted in more data cells, clock cells and wire-length. elements are needed to correct the hold timing, and we have
5 14% less buffers/inverters in the feature run compared with the
Using the PUS based nets weighting mechanism in this baseline (Fig. 10). This reduction represents an average 5%
6 small design has helped to reduce the TWL (Total Wire Length)
7 reduction in designs’ utilization (Fig. 10).
by 21% for data nets and 30% for clock nets. The number of
8 buffers/inverters used for CTS and post-CTS optimizations has Reducing the utilization has given white-space for setup and
9 decreased by 7%, which resulted in a power gain of 6% in eDRC optimizations to do more circuit transformations and the
10 leakage power and 5% in dynamic power. This power gain was setup timing has improved by 26% for WNS and 15% for TNS
11 achieved without compromising the design timing, since the (Fig. 8), while the max capacitance and max transition
12 useful skew was used for setup and hold driven placement violations are reduced by 33% and 41% respectively (Fig. 12).
13 adjustment in addition to post-CTS timing optimization, a gain This improvement is also partially due to the multiple
14 of 2% in TNS and 4% in THS was realized. placement improvement iterations performed before the CTS.
15
16 In the next chapter, we had run the flow on multiple The clock metrics are intentionally not reported here, since
17 industrial designs of different technologies for study we don’t expect any reduction in the clock repeaters, latencies,
18 generalization. skews, or the clock wire-length. We have noticed that there is
19 no specific pattern or correlation between the clock metrics and
Fo
20 V. EXPERIMENTAL RESULTS the applied weights. This is an expected behavior depending
21 To evaluate the effectiveness of the proposed flow, we have mainly on the placement and the calculated clock offsets by the
22 implemented the algorithm using TCL programming language PUS engine. Our objective is to converge the setup and hold
23
rP
and integrated it into Nitro Reference Flow as shown in Fig. 2. timings with less total power consumption which includes all
24 Our initial databases are generated using Nitro-SoC’s default the clock elements. The wirelength (data and clock) reduction
25 PnR flow (NRF) with the net weighting mechanism presented of 13% (Fig. 11) along with the achieved area reductions have
26 [6]. The baseline results are generated by running the CTS and yielded an expected average total power reduction of 9% (9%
ee
27 post-CTS steps on the initial database as shown in Fig. 1. While average dynamic power reduction, and 7% leakage power
28 the new results are generated with a modified version of the reduction) as shown in Fig. 9.
29 same flow including our new weighting mechanism, the
rR
It can be noted that TC3 has achieved a very good hold gain
30 incremental placement iterations algorithm, and the same CTS but with a negative setup timing impact, this is due to the fact
31 and post-CTS flows as in the baseline (Fig. 2). Since the starting that the hold was the dominant timing factor (design has hold
32 point for each testcase is the database generated by the flow convergence issue) and the net weights were relaxed which
ev
33 presented in [6], and all the aspects and engines of the flow are derived the placer to spread more the cells in order to reduce the
34 the same except the weighting and the incremental placement, hold timing, but has resulted in more setup violations.
35 this allows us to assess the benefits of the PUS usage in the
36
iew
weight calculation. Design Nb nb nb Max Freq nb Cells nb techno

37 Modes Corners clocks (MHz) (kilo) Macros (nm)
TC1 3 4 35 213 375 11 14
38 Both flows were executed on a set of 40 designs (Table I) of TC2 2 4 2 200 67 0 14
39 different characteristics. The benchmark circuits contain TC3 2 3 11 941 316 56 28
40 approximately 23k to 2.7M standard cells, and include designs TC4 2 5 7 251 1077 238 28
TC5 3 8 12 641 439 203 28
41 from 180nm down to 7nm, and operating frequency ranging
TC6 2 3 4 300 682 6 7
42 from 10MHz to 1GHz. TC7 2 4 2 200 157 0 28
43 TC8 5 4 1051 833 1628 235 90
Figs 7, 8, 9, 10, 11, 12, and 13 summarize different QoR TC9 2 12 67 500 2346 182 180
44 C10 1 4 5 602 293 32 90
achievements for each metric. The following results are
45 TC11 2 4 70 671 565 0 14
presented in the Gains figures: (1) WNS (worst setup negative
46 TC12 3 8 12 500 1069 26 28
slack), (2) TNS (total setup negative slack), (3) WHS (worst TC13 3 7 158 556 1546 47 7
47 TC14 2 4 2 250 130 0 7
hold negative slack), (4) THS (total hold negative slack), and
48 TC15 2 4 2 250 127 0 7
(5) eDRC (electrical Design Rule Constraints).
49 TC16 3 8 4 765 1261 0 7
TC17 2 4 1 556 53 0 7
50 As shown in Fig 7, the hold timing gain achieved with the TC18 1 2 24 752 201 60 180
51 new approach is 15%, and 13% in WHS and THS respectively. TC19 3 14 29 267 1203 101 28
52 This shows the benefits of taking hold into account in the TC20 1 4 9 313 1268 64 90
TC21 9 5 80 200 593 41 28
53 placement stage and producing a hold aware placement. TC22 2 4 6 1000 632 445 28
54 TC23 3 14 105 200 665 124 28
55
56 7
57
58
59
60
1
TC24 1 2 2 500 286 8 180
2 TC25 1 5 5 11 139 40 180 Setup Timing Gain (%)
3 TC26 2 1 3 1000 225 0 28
4 TC27 3 5 4 1000 183 0 28 130%
TC28 5 3 500 826 693 168 7
5 TC29 2 2 3 952 79 32 28
6 TC30 3 6 25 200 1219 173 28
80%
7 TC31 3 6 11 265 1430 165 28

TC32 2 18 41 645 2700 149 28
8 30%
Gain (%)
TC33 2 1 4 10 23 0 180
9 TC34 3 5 8 500 824 82 28
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
TC10
TC11
TC12
TC13
TC14
TC15
TC16
TC17
TC18
TC19
TC20
TC22
TC23
TC24
TC25
TC26
TC27
TC28
TC29
TC30
TC31
TC32
TC33
TC34
TC35
TC36
TC37
TC38
TC39
TC40
10 TC35 1 2 6 200 105 44 180 -20%
TC36 1 2 3 500 103 4 90
11 TC37 2 18 3 833 710 40 7
12 TC38 2 3 7 940 750 164 28
-70%
13 TC39 3 10 58 333 729 87 28

TC40 1 4 5 909 1220 75 28
14 -120%
Table I: Designs Characteristics
15 WNS Gain (%): Average Gain is 26% TNS Gain (%): Average Gain is 15%
16 The cells number reduction has resulted in a good timing and
17 Fig. 8. Setup Timing Gain (%).
power reductions. This achieved results were at the expense of
18 41% runtime increase (the baseline CTS runtime vs the feature
19 incremental placement plus CTS runtime) due to multiple Power Consumption Gain (%)
Fo
20 iterations of incremental placement in the feature runs before 60%
21 the CTS stage. It could be noticed from Fig. 13 that the runtime
22
50%
has improved in some designs (TC20 and TC26), this is mainly
23
rP
40%
due to the good timing improvement in the incremental
24 placement stage which has helped to speed-up the optimization Gain(%) 30%
25 process since it has to work on a simplified problem compared 20%
26 to the default flow. The runtime could be reduced further by

ee
10%
27 multi-threading, the algorithm implementation or distributing 0%
28
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
TC10
TC11
TC12
TC13
TC14
TC15
TC16
TC17
TC18
TC19
TC20
TC22
TC23
TC24
TC25
TC26
TC27
TC28
TC29
TC30
TC31
TC32
TC33
TC34
TC35
TC36
TC37
TC38
TC39
TC40
incremental placement calls on different machines. For each -10%
29 combination of α and β, a thread can be started to evaluate the
rR
30 placement impact. The good values could be recorded and Leakage Power Gain (%): Average Gain is 7%
31 reapplied. Thus, the runtime will be reduced at the expense of Dynamic Power Gain (%): Average Gain is 9%
32 computational resources. Further runtime reduction could be Total Power Gain (%): Average Gain is 9%
ev
33 achieved by moving the flow from TCL to C++ level to have an Fig. 9. Power Consumption Gains (%).
34 apple to apple comparison with the reference flow.
35
36
iew
Design Area Gain (%)

Hold Timing Gain (%)
37 120%
38 200%
39
100%
40 150% 80%
Gain(%)
41 60%
42 100%
Gain (%)
43 40%
44 50% 20%
45 0%
46 0%
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
TC10
TC11
TC12
TC13
TC14
TC15
TC16
TC17
TC18
TC19
TC20
TC22
TC23
TC24
TC25
TC26
TC27
TC28
TC29
TC30
TC31
TC32
TC33
TC34
TC35
TC36
TC37
TC38
TC39
TC40
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
TC10
TC11
TC12
TC13
TC14
TC15
TC16
TC17
TC18
TC19
TC20
TC22
TC23
TC24
TC25
TC26
TC27
TC28
TC29
TC30
TC31
TC32
TC33
TC34
TC35
TC36
TC37
TC38
TC39
TC40
47
48 -50% Design Utilization Gain (%): Average Gain is 5%
49 WHS Gain (%): Average Gain is 15% THS Gain (%): Average Gain is 13%
Number of Buffers/Inverters Gain (%): Average Gain is 14%
50 Fig. 10. Design Area Gains (%).

51 Fig. 7. Hold Timing Gain (%).
52
53
54
55
56 8
57
58
59
60
1
2 Wirwlength Gain (%)
3
4 80% VI. CONCLUSION
5 70%
In this paper, we proposed a new weighting approach that
60%
6 50%
takes the hold timing factor in addition to the power, setup and
7 40%
eDRC factors while calculating the nets weight before and
8 during the incremental global placement. The new algorithm is
Gain(%)
30%
9 20% added before the CTS stage to generate a hold-friendly
10 10% placement without impacting the setup timing. Adding the hold
11 0% parameter in the weighting formulation using the PUS
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
TC10
TC11
TC12
TC13
TC14
TC15
TC16
TC17
TC18
TC19
TC20
TC22
TC23
TC24
TC25
TC26
TC27
TC28
TC29
TC30
TC31
TC32
TC33
TC34
TC35
TC36
TC37
TC38
TC39
TC40
12 -10% capabilities permits to reduce the power consumption of the
13 -20% design by reducing the number of delay elements needed for
14 -30% hold violations fixing in the post-CTS stage. By evaluating this
15 Global Route Data Nets Wirelength Gain (%): Average Gain is 13% new weighting approach on a wide variety of designs, we
16 achieved an additional average gain of 9% in total power
17 Fig. 11. Wire Length Gain (%). consumption compared to our approach proposed in [6]. The
18 power gain is achieved while keeping a better setup and hold
19 eDRC Gain (%) timings (TNS gain = 15%, WNS gain =26%, THS gain =13%,
Fo
20 WHS gain = 15%). By taking the hold timing into consideration
350%
21 early in the physical design process, we achieved a better power
300%
22 reduction and design closure throughout the PnR flow.
250%
23
rP
200%
Future work will focus on runtime reduction using machine
24 150%
learning algorithms to figure out the best α and β parameters
Gain(%)
25 100%
based on design characteristics to make sure that the QoR
26 50%
ee
improves in each iteration instead of looping through random

27 0%
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
TC10
TC11
TC12
TC13
TC14
TC15
TC16
TC17
TC18
TC19
TC20
TC22
TC23
TC24
TC25
TC26
TC27
TC28
TC29
TC30
TC31
TC32
TC33
TC34
TC35
TC36
TC37
TC38
TC39
TC40
values.
28 -50%
-100%
29
rR
-150% ACKNOWLEDGMENT
30
31 Max Capacitance Gain (%): Average Gain is 33% This research was supported by Mentor, a Siemens Business.
32 Max Transition Gain (%): Average Gain is 41% We thank our colleagues from the Digital Design
Implementation Solution (DDIS) division who provided insight
ev
33
Fig. 12. eDRC Gain (%). and expertise that greatly assisted this research.
34
35 We thank Dr. Hazem El Tahawy (Mentor Graphics,
36
iew
Runtime
Managing Director MENA Region) for initiating and
37 100% supporting this work. From the Place-and-Route Solutions
38 group in DDIS division, we thank David Chinnery (Architect,
39 50%
Optimization), and Nikitin Nikita (Member of Consulting Staff,
40 DDIS R&D CTS), for their assistance, help, and guidance
41 0%
Gain %
TC1
TC2
TC3
TC4
TC5
TC6
TC7
TC8
TC9
TC10
TC11
TC12
TC13
TC14
TC15
TC16
TC17
TC18
TC19
TC20
TC22
TC23
TC24
TC25
TC26
TC27
TC29
TC30
TC31
TC32
TC33
TC34
TC35
TC36
TC37
TC38
TC39
TC40
through this research.

42 -50%
43
44 -100% REFERENCES
45 [1] Pan, D., Halpin, B., & Ren, H. (2008). Timing-Driven Placement.
46 -150% Handbook of Algorithms for Physical Design Automation.
47 TestCases doi:10.1201/9781420013481.ch21
48 [2] Kong, T. (n.d.). A novel net weighting algorithm for timing-driven
Runtime Gain (%): Average runtime increase is 41%
49 placement. IEEE/ACM International Conference on Computer Aided
50 Fig. 13. Runtime Gain (%). Design, 2002. ICCAD 2002. doi:10.1109/iccad.2002.1167530
51
[3] Papa, D. A., Luo, T., Moffitt, M. D., Sze, C. N., Li, Z., Nam, G Markov,
52 I. L. (2008). Rumble. Proceedings of the 2008 International Symposium
53 on Physical Design ISPD 08. doi:10.1145/1353629.1353633
54
55
56 9
57
58
59
60
1
[4] Wang, Q. B., Lillis, J., & Sanyal, S. (n.d.). An LPbased methodology for [19] Krishnamoorthy, A.,(2004). Minimize IC Power without Sacrificing
2 improved timing-driven placement. Proceedings of the ASP-DAC 2005. Performance. EEdesign. Available at http://www.eedesign.com/article/
3 Asia and South Pacific Design Automation Conference, 2005. showArticle.jhtml?articleId=23901143
4 doi:10.1109/aspdac.2005.1466542
[20] Obermeier, B., Johannes, F. (n.d.). Temperature-aware global placement.
5
[5] Cheon, Y., Ho, P., Kahng, A., Reda, S., Wang, Q. (2005). Power-aware ASP-DAC 2004: Asia and South Pacific Design Automation Conference
6 placement. Proceedings. 42nd Design Automation Conference, 2005. 2004 (IEEE Cat. No.04EX753). doi:10.1109/aspdac.2004.1337555
7 doi:10.1109/dac.2005.193924
8 [21] Bakopla, H. (1990). Circuits, Interconnections, and Packaging for VLSI.
[6] Chentouf, M., & Ismaili, Z. E. (2018). A Novel Net Weighting Algorithm Addison-Wesley
9 for Power and Timing-Driven Placement. VLSI Design, 2018, 1-9.
10 doi:10.1155/2018/3905967 [22] Jackson, M., Srinivasan, A., Kuh, E. (n.d.). Clock routing for high-
11 performance ICs. 27th ACM/IEEE Design Automation Conference.
[7] Chan, T., Kahng, A. B., & Li, J. (2014). NOLO: A no-loop, predictive doi:10.1109/dac.1990.114920
12
useful-skew methodology for improved timing in IC implementation.
13 Fifteenth International Symposium on Quality Electronic Design. [23] Natesan, V., Bhatia, D. (n.d.). Clock-skew constrained cell placement.
14 doi:10.1109/isqed.2014.6783368 Proceedings of 9th International Conference on VLSI Design.
15 doi:10.1109/icvd.1996.489474
[8] Burstein, M., Youssef, M. (1985). Timing Influenced Layout Design.
16 22nd ACM/IEEE Design Automation Conference. [24] Venkateswaran, N., Bhatia, D. (n.d.). Clock-skew constrained placement
17 doi:10.1109/dac.1985.1585923 for row based designs. Proceedings International Conference on Computer
18 Design. VLSI in Computers and Processors (Cat. No.98CB36273).
[9] Hur, S., Cao, T., Rajagopal, K., Parasuram, Y., Chowdhary, A., Tiourin, doi:10.1109/iccd.1998.727053
19
V., Halpin, B. (n.d.). Force directed Mongrel with physical net constraints.
Fo
20 Proceedings 2003. Design Automation Conference (IEEE Cat. [25] Huang, L., Cai, Y., Zhou, Q., Hong, X., Hu, J., Lu, Y. (2005). Clock
21 No.03CH37451). doi:10.1109/dac.2003.1218966 network minimization methodology based on incremental placement.
22 Proceedings of the 2005 Conference on Asia South Pacific Design
[10] Ou, S., Pedram, M. (2000). Timing-driven placement based on Automation - ASP-DAC 05. doi:10.1145/1120725.1120755
23
rP
partitioning with dynamic cut-net control. Proceedings of the 37th

24 Conference on Design Automation - DAC 00. [26] Li, Z., Wu, W., Hong, X., Gu, J. (n.d.). Incremental placement algorithm
25 doi:10.1145/337292.337548 for standard-cell layout. 2002 IEEE International Symposium on Circuits
26 and Systems. Proceedings (Cat. No.02CH37353).
ee
[11] Riess, B., Ettelt, G. (n.d.). SPEED: Fast and efficient timing driven doi:10.1109/iscas.2002.1011495
27 placement. Proceedings of ISCAS95 - International Symposium on
28 Circuits and Systems. doi:10.1109/iscas.1995.521529 [27] Deokar, R., Sapatnekar, S. (n.d.). A graph-theoretic approach to clock
29 skew optimization. Proceedings of IEEE International Symposium on
rR
[12] Eisenmann, H., Johannes, F. (n.d.). Generic global placement and Circuits and Systems - ISCAS 94. doi:10.1109/iscas.1994.408825
30 floorplanning. Proceedings 1998 Design and Automation Conference.
31 35th DAC. (Cat. No.98CH36175). doi:10.1109/dac.1998.724480 [28] Boese, K., Kahng, A. (n.d.). Zero-skew clock routing trees with minimum
32 wirelength. [1992] Proceedings. Fifth Annual IEEE International ASIC
ev
33 [13] Swartz, W. (2008). Placement Using Simulated Annealing. Handbook of Conference and Exhibit. doi:10.1109/asic.1992.270316
Algorithms for Physical Design Automation.
34 doi:10.1201/9781420013481.ch16 [29] Guthaus, M., Sylvester, D., Brown, R. (2006). Clock buffer and wire
35 sizing using sequential programming. 2006 43rd ACM/IEEE Design
36
iew
[14] Bunglowala, A., Jain, M. (2014). Parallel Simulated Annealing Algorithm Automation Conference. doi:10.1109/dac.2006.229435
for Standard Cell Placement in VLSI Design. International Journal of
37 Computer Applications, 87(1), 23-26. doi:10.5120/15172-3047 [30] Zhu, Q., Dai, W. (1996). High-speed clock network sizing optimization
38 based on distributed RC and lossy RLC interconnect models. IEEE
39 [15] Jackson, M. A., Kuh, E. S. (1989). Performance-driven placement of cell Transactions on Computer-Aided Design of Integrated Circuits and
40 based ICs. Proceedings of the 1989 26th ACM/IEEE Conference on Systems, 15(9), 1106-1118. doi:10.1109/43.536716
Design Automation Conference - DAC 89. doi:10.1145/74382.74444
41 [31] L, W., Li, Y., Chen, H. (2010). Minimizing clock latency range in robust
42 [16] Srinivasan, A., Chaudhary, K., Kuh, E. (n.d.). RITUAL: A performance clock tree synthesis. 2010 15th Asia and South Pacific Design Automation
43 driven placement algorithm for small cell ICs. 1991 IEEE International Conference (ASP-DAC). doi:10.1109/aspdac.2010.5419849
Conference on Computer-Aided Design Digest of Technical Papers.
44 doi:10.1109/iccad.1991.185188 [32] Lee, D., Markov, I. L. (2010). Contango: Integrated optimization of SoC
45 clock networks. 2010 Design, Automation Test in Europe Conference
46 [17] Donath, W. E., Norman, R. J., Agrawal, B. K., Bello, S. E., Han, S. Y., Exhibition (DATE 2010). doi:10.1109/date.2010.5457043
47 Kurtzberg, J. M., . . . Mcmillan, R. I. (1990). Timing driven placement
using complete path delays. Conference Proceedings on 27th ACM/IEEE [33] Friedman, E. G. (1989). Performance Limitations in synchronous Digital
48 Design Automation Conference - DAC 90. doi:10.1145/123186.123232 systems. University California, Irvine
49
50 [18] Tsay, R., Koehl, J. (1991). An analytic net weighting approach for [34] Chou, H., Yu, H., Chang, S. (2011). Useful-skew clock optimization for
performance optimization in circuit placement. Proceedings of the 28th multi-power mode designs. 2011 IEEE/ACM International Conference on
51 Conference on ACM/IEEE Design Automation Conference - DAC 91. Computer-Aided Design (ICCAD). doi:10.1109/iccad.2011.6105398
52 doi:10.1145/127601.122882
53 [35] Huang, C., Liu, Y., Lu, Y., Kuo, Y., Chang, Y., Kuo, S. (2016). Timing-
driven cell placement optimization for early slack histogram compression.
54
55
56 10
57
58
59
60
1
Proceedings of the 53rd Annual Design Automation Conference on - DAC
2 16. doi:10.1145/2897937.2898105
3
4 [36] Nitro-SoC™ and Olympus-SoC™ User’s Manual, Software Version
2017, August 2017.
5
6 [37] Nitro-SoC™ and Olympus-SoC™ Advanced Design Flows Guide,
7 Software Version 2017, August 2017.
8
[38] Nitro-SoC™ and Olympus-SoC™ Software Version 2017.1.R2, August
9 2017.
10
11 [39] Dunlop, A., Agrawal, V., Deutsch, D., Jukl, M., Kozak, P., & Wiesel, M.
(1984). Chip Layout Optimization Using Critical Path Weighting. 21st
12 Design Automation Conference Proceedings.
13 doi:10.1109/dac.1984.1585786
14
15 [40] Niu, F., Zhou, Q., Yao, H., Cai, Y., Yang, J., & Sze, C. N. (2011).
Obstacle-avoiding and slew-constrained buffered clock tree synthesis for
16 skew optimization. Proceedings of the 21st Edition of the Great Lakes
17 Symposium on Great Lakes Symposium on VLSI - GLSVLSI 11.
18 doi:10.1145/1973009.1973049
19
Fo
20
21
22
23
rP
24
25
26
ee
27
28
29
rR
30
31
32
ev
33
34
35
36
iew
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56 11
57
58
59
60

16 MCH 1

Uploaded by

Copyright:

Available Formats

You might also like

16 MCH 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

16 MCH 1

Uploaded by

Copyright:

Available Formats

Transactions on Design Automation of Electronic Systems

A PUS Based Nets Weighting Mechanism for Power, Hold,

Journal: Transactions on Design Automation of Electronic Systems

Manuscript Type: Paper

Nationale Supérieure d'Informatique et d'Analyse des Systèmes,

Computing Classification Application-specific integrated circuits, Timing Driven Placement, hold

With the increase in designs complexity and QoR

33 deployed on 40 industrial designs of different customers, sizes,

importance in the physical implementation flow. The skew was

27 that the “zero-skew” objective is over constraining. They

30 proposed algorithm achieved a performance improvement of

correct the remaining violations. 13: wt = 0

9 6: else displacement of 10 rows. So, after the global placement, no cell

along with a fan-out factor to calculate the nets placement

provides a knob to trade-off between timing and power

38 Finally, a new pass of normalization is performed to project

and red means unroutable.

33 network and with no major changes in the overall placement

weight calculation. Design Nb nb nb Max Freq nb Cells nb techno

7 TC31 3 6 11 265 1430 165 28

13 TC39 3 10 58 333 729 87 28

25 process since it has to work on a simplified problem compared 20%

26 to the default flow. The runtime could be reduced further by

Design Area Gain (%)

50 Fig. 10. Design Area Gains (%).

improves in each iteration instead of looping through random

through this research.

partitioning with dynamic cut-net control. Proceedings of the 37th

You might also like