Professional Documents
Culture Documents
Timing Closure
Timing Closure
1
Timing Closure Guide for EDI System 11
Contents
Introduction
Recommended Timing Closure Flow
Data Preparation and Validation
Flow Preparation
Pre-Placement Optimization
Floorplanning and Initial Placement
Pre-CTS Optimization
Clock Tree Synthesis
Post-CTS Optimization
Detailed Routing
Post-Route Optimization
Timing Sign Off
Additional Resources
Introduction
Achieving timing closure on a design is the process of creating a design implementation that is free from logical,
physical, and design rule violations and meets or exceeds the timing specifications for the design. For a production
chip, all physical effects, such as metal fill and coupling, must be taken into account before you can confirm that timing
closure has been achieved.
Timing Closure is not just about timing optimization. It is a complete flow that has to converge, including placement,
timing optimization, clock tree synthesis (CTS), routing and SI fixing. Each step has to reach the expected targets or else
timing closure will likely not be achieved.
This document discusses each step in the implementation flow as it relates to timing closure in EDI 10.1, and provides
recommended settings specific to high performance, congested or high utilization designs.
Below is a high level diagram showing the steps in the timing closure flow:
Software
The EDI System software is constantly being improved to provide better quality of results, reliability and ease of use. To
ensure you're running with the latest improvement we recommend running with the latest software version available from
http://downloads.cadence.com.
Foundation Flow
If you've developed your own flow scripts, you know maintaining and updating them can be time consuming and error
prone. Also, ensuring you're running with the latest recommended commands and options can be challenging.
Therefore, we recommend using the Foundation Flow scripts. The Foundation Flow provides Cadence-recommended
procedures for implementing flat, hierarchical, and low-power/CPF designs using the EDI System software. The
Foundation Flow is a starting point for building an implementation environment, but you can augment them with designspecific content. Utilizing the Foundation Flow helps achieve timing closure because it provides the latest recommended
flow which you can further customize based on your design requirements.
If you are new to the Foundation Flow, we recommend you start with the Foundation Flow video demonstrations. These
provide examples of setting up and using the flows. They are accessible from within the EDI System GUI by selecting
Flows - Foundation Flow Demo.
April 2012
Data Preparation
Data Validation
Determining and Setting the RC Scaling Factors
Data Preparation and Validation for Low Power Designs
This section outlines the data (libraries, constraints, netlist, etc.) required for implementing the timing closure flow and
how to validate that data.
The goals of data preparation and validation include:
Confirming that the EDI System has a complete and consistent set of design data (all library views and versions
must be consistent).
Ensuring that all tools in the flow interpret the timing constraints consistently.
Making sure logically equivalent cells are defined properly.
Creating a capacitance table file that matches the process technology.
Correlating parasitics among the prototyping and sign-off extraction tools.
Data Preparation
This section lists the data required and data setup recommended for the timing closure flow.
Timing Libraries
Every cell used in the design should be defined in the timing library.
If multiple delay corners are being analyzed then each cell needs to be characterized for each corner.
EDI System supports Non-linear Delay Models (NLDM), ECSM and CCS.
ECSM/CCS require the Signal Storm delay calculator (setDelayCalMode -engine signalStorm)
CCS/ECSM are less pessimistic than NLDM and therefore you can gain about 5% to 10% of the
clock period on the slack using CCS/ECSM libraries.
Physical Libraries
You need to have an abstract defined for every cell in either a LEF file or OpenAccess database.
Define Non-Default Rules (NDRs) for routing as needed. These can be defined in the LEF or added within EDI
System.
The technology LEF should have an optimized set of vias to be used for routing. Confirm you have the latest
technology LEF from your library vendor or foundry.
Verilog Netlist
Check the Verilog netlist for assign statements. If assign statements exist, use the following procedure to enable
optimization to work on those nets:
Run "setDoAssign on" before loading the design data OR use "set rda_Input(assign_buffer) " in the
configuration file (these commands do the same thing).
And set "setImportMode -bufferFeedThruAssign true".
The netlist should also be unique.
Run uniquifyNetlist at the Linux command line to uniquify the netlist.
Timing Constraints
Timing constraints in the form of SDCs are required. You should have an SDC file for each operational mode required
for analysis.
Capacitance Table File
A capacitance table is used by EDI System to accurately extract parasitics. A cap table is required for each RC corner
to be analyzed. You may obtain the cap table(s) from your foundry. Alternatively, the generateCapTbl command can be
used to generate the cap table. generateCapTbl reads a detailed process description file in Interconnect Technology
(ICT) format and a LEF technology file, and outputs a capacitance table file. The Interconnect Technology (ICT) file
describes the detailed process information for a given technology (layer thicknesses, materials, profiles, dielectric
constants, and so forth). Any non-default rules for the design should be defined in the LEF file you are using.
The capacitance table file consists of two parts:
Basic Capacitance Table Provides a table of spacing versus capacitance information for each layer.
Extended Capacitance Table Provides encoded extraction patterns that are derived using a 3D field solver
which provides much greater accuracy during detailed extraction. A capacitance table file (basic or extended) is
process-specific and not design dependent, so it only needs to be created once per technology. If a capacitance
table already exists for your process, use it and do not recreate it.
The following example shows how to generate an extended capacitance table using the default field solver (Coyote).
generateCapTbl -lef tech.lef -ict process.ict -output process.xCapTbl
QRC Tech File
A QRC technology file (ICT file) is required in order to run turbo-QRC (tQRC), integrated QRC (iQRC) or standalone
QRC. A QRC tech file is required for each RC corner to be analyzed.
April 2012
Data Validation
This section explains how to identify problems when importing the data and checks you can run to catch data issues
early in the flow.
Loading the Design
Once you have prepared the necessary data you are ready to import it. Run loadConfig to import the libraries, netlist
and timing environment.
loadConfig design.conf
loadConfig executes a number of checks to validate the data and highlight problems. It is important you review the log
file to understand and resolve warnings and error messages it reports. We recommend using the Log Viewer (Tools - Log
Viewer) to review the loadConfig results. The Log Viewer expands/collapses the log output by command and color
codes messages to make them easy to identify.
Following are things to look for when reviewing the loadConfig output:
loadConfig reports cells in the LEF which are not defined in the timing libraries. Look for the following and
confirm if these cells need to be analyzed for timing: **WARN: (ENCSYC-2): Timing is not defined for cell
INVXL.
A blackbox is an instance declaration in the netlist for which no module or macro definition is found. Unless your
design is being done using a blackbox style of floorplanning, there should be no blackboxes in the design. If
there are blackboxes reported, be sure to load the Verilog file that defines the logic module, and make sure you
include the LEF file that defines the macro being referenced in the netlist. The following is reported for blackbox
(empty) modules: Found empty module (bbox).
Verify the netlist is unqiue. The following is reported if it is not unique: *** Netlist is NOT unique.
By default, EDI System utilizes a "footprintless" flow. This means instead of relying on the "footprint" definitions
inside the timing libraries, it uses the "function" statement to determine cells which are functionally equivalent and
can be swapped during optimization. Additionally, it identifies buffers, inverters and delay cells. Inconsistencies in
how the cell functions are defined can lead to sub-optimal or erroneous optimization results. Review the log file to
confirm the buffers, inverters and delay cells are properly identified. Below is an example of what you'll see:
List of usable buffers: BUFX2 BUFX1 BUFX12 BUFX16 BUFX20 BUFX3 BUFX4 BUFX8 BUFXL CLKBUFX2
CLKBUFX1 CLKBUFX12 CLKBUFX16 CLKBUFX20 CLKBUFX3 CLKBUFX4 CLKBUFX8 CLKBUFXL
Total number of usable buffers: 18
List of unusable buffers:
Total number of unusable buffers: 0
List of usable inverters: CLKINVX2 CLKINVX1 CLKINVX12 CLKINVX16 CLKINVX20 CLKINVX3 CLKINVX4
CLKINVX8 CLKINVXL INVX1 INVX2 INVX12 INVX16 INVX20 INVX3 INVXL INVX4 INVX8
Total number of usable inverters: 18
List of unusable inverters:
Total number of unusable inverters: 0
List of identified usable delay cells: DLY2X1 DLY1X1 DLY4X1 DLY3X1
Total number of identified usable delay cells: 4
List of identified unusable delay cells:
Total number of identified unusable delay cells: 0 Also, look for cells which do not have a function defined for
them: No function defined for cell 'HOLDX1'. The cell will only be used for analysis.
Checking Timing Constraint Syntax
In addition to checking the libraries, the loadConfig command also checks the syntax of timing constraints. After running
loadConfig, check for the following problems:
Unsupported constraints
The EDI System software may not support the SDC constraints being used with the design, or the
constraints may not match the netlist. If the constraints are not supported, they may need to be reexpressed (if possible) in constraints that the EDI System software does support.
Ignored timing constraints
Syntax errors can cause the tools to ignore certain constraints resulting in the misinterpretation of important
timing considerations. Check for warnings or errors about unaccepted SDC constraints.
The following are possible causes for ignored constraints.
A design object is not found. If the constraints refer to pins, cells, or nets that are not found in the
April 2012
April 2012
Optimization of leakage and/or dynamic power is typically on top of the presented flows:
For designs where leakage is a high priority, the recommended flow is to enable leakage optimzation at the
beginning of the flow and allow the tool to manage the optimization. This is done by setting: setOptMode leakagePowerEffort high
High effort leakage does come with a run time and area penalty and is particularly time consuming as the
ratio of higher VT cells drops.
Low leakage effort, enabled via "setOptMode -leakagePower low", only does postRoute reclaim.
This flow has the least impact to run time but typically results in a 5-10% loss of potential leakage
savings.
For designs where dynamic power is a high priority, dynamic power optimization can be enabled with the
following and also set at the beginning of the flow: setOptMode -dyanmicPowerEffort [ low | high ]
High effort does reclaim throughout the flow. Low effort only performs postRoute downsizing.
This method typically works best with a VCD or TCF to properly apply the activity rates.
Flow Preparation
Setting the Design Mode
Extraction
Timing Analysis
Setting the design mode and understanding how extraction and timing analysis are utilized during the flow are important
to achieving timing closure.
Extraction
Resistance and Capacitance (RC) extraction using the extractRC command is run frequently in the flow each time timing
analysis is performed. The setExtractRCMode options -engine and -effortLevel control which extractor is used by
extractRC. The lower the effort level, the faster extraction runs at the expense of being less accurate. Fast extraction is
used early in the flow to provide fast turnaround times so you can experiment with different floorplans and solutions. As
you progress through the flow, the effort level is increased which improves the accuracy of the extraction at the expense
of run time.
The -engine option indicates whether to use the preRoute or postRoute extraction engine.
Use "-engine preRoute" when the design has not been detail routed by NanoRoute yet. When "-engine preRoute"
is set RC extraction is done by the fast density measurements of the surrounding wires; coupling is not reported.
Use "-engine postRoute" after the design has been detail routed by NanoRoute. RC extraction is done by the
detailed measurement of the distance to the surrounding wires; coupling is reported. The -effortLevel parameter
further specifies which postroute engine is used for balancing performance versus accuracy needs.
The -effortLevel value controls which extractor is used when the postRoute engine is used.
low - Invokes the native detailed extraction engine. This is the same as specifying the "-engine postRoute" setting.
medium - Invokes the Turbo QRC (TQRC) extraction mode. TQRC performance and accuracy falls between
native detailed extraction and IQRC engine. This engine supports distributed processing.
TQRC engine is recommended for process nodes < 65nm. Note: This setting does not require a QRC
license.
high - Invokes the Integrated QRC (IQRC) extraction engine.
IQRC provides superior accuracy compared to TQRC. IQRC is recommended for extraction after ECO. In
addition, IQRC supports distributed processing. Note: IQRC requires a QRC license.
signoff - Invokes the Standalone QRC extraction engine. This engine choice provides the highest accuracy. The
engine has several run modes, thereby, providing maximum flexibility.
The default value for -effortLevel depends on the value of setDesignMode. The default for nodes above 65nm is low.
April 2012
TQRC (-effortLevel medium) is the default extraction engine in the postroute flow for 65nm and below design. However,
TQRC and IQRC do not support the obsoleted 3 corner flow (defineRCCorner flow) and require a QRC Tech file.
Therefore, Native Detailed (effortLevel low) engine remains the default engine if no QRC Tech file has been defined or if
the defineRCCorner command has been used.
Timing Analysis
Timing analysis is typically run after each step in the timing closure flow using the timeDesign command. If timing
violations exist, we recommend using the Global Timing Debug (GTD) GUI to analyze and debug the results:
GTD is an invaluable tool which provides forms and graphs to help you visually see timing problems.
To learn more about GTD we recommend completing the Workshop provided with EDI 10.1. Download
instructions are available at http://support.cadence.com/wps/mypoc/cos?
uri=deeplinkmin:ViewSolution;solutionNumber=11681606.
Initial timing analysis should not be performed after placement. Instead, we recommend always waiting until after preCTS optimization to report your initial timing:
After placement, there will be some basic buffering and resizing to get cells into their characterized region of
timing as defined by their look-up tables. Until the cells are in that region, any timing analysis results are highly
suspect.
A placement should always be followed by an optimization run to bring the cells into line prior to making a timing
and routability judgment. It is not uncommon to have the original "worse" placement come out better postoptimization than the "better" placement. This seems non-intuitive, but it is often the case.
Pre-Placement Optimization
Data preparation is complete and you are now ready to implement your design. It is important to note that using options
from a previous design might not necessarily apply to your current design. Therefore, we recommend starting with the
default flow (or Foundation Flow) then apply additional options based on your design requirements. Additionally, as you
proceed through the flow you should investigate each flow step and validate it before moving to the next one.
The goals of pre-placement optimization is to optimize the netlist to:
Improve the logic structure
Reduce congestion
Reduce area
Improve timing
In some situations, the input netlist (typically from a poor RTL synthesis) is not a good candidate for placement since it
might contain buffer trees or logic that is poorly structured for timing closure. In most cases, high-fanout nets should be
buffered after placement. It is more reliable to allow buffer insertion algorithms to build and place buffer trees rather than
to rely on the placer to put previously inserted trees in optimal locations. Additionally, having buffer trees in the initial
netlist can adversely affect the initial placement.
Because of these effects, it can be advantageous to run pre-placement optimization or simple buffer and doubleinverter removal (area reclamation) prior to initial placement. This can be accomplished by using the deleteBufferTree
and removeClockTree commands. Note deleteBufferTree is run by placeDesign by default.
For designs where the logical structure of the critical paths or high congestion are the sources of closure problems,
restructuring or remapping the elements on the paths (or related portions of the design) can provide better timing results.
This is done by running netlist-to-netlist optimization using the runN2NOpt command on the initial netlist prior to place
and route. Additional restructuring can be performed later in the flow using the EDI System optimization commands with
more physical information.
April 2012
1. Run the initial placement without any regions and guides. It is best to get a baseline placement without
constraining the placer.
Early on use "setPlaceMode -fp true" to run placement in prototyping mode for faster turnaround.
Prototype placement does not produce legal placement so make sure you run placement with
"setPlaceMode -fp false" as you converge on a floorplan.
Use Automatic Floorplan Synthesis if your design contains a large number of hard macros.
Automatic Floorplan Synthesis is a set of natively-integrated automatic floorplan capabilities that
can create a quick, prototype floorplan. Given a gate-level netlist and design physical boundary,
Automatic Floorplan Synthesis can analyze the signal flow and generate a floorplan that includes
automatic module and macro placement for large chips. See Automatic Floorplan Synthesis in the
EDI System User Guide for more information.
2. Analyze the placement for timing and routability issues, and make necessary adjustments.
3. Employ module guides, placement blockages, and other techniques to refine the floorplan. The placement engine
automatically detects low-utilized designs and turns on the options required to achieve an optimal placement.
Ensuring Routability
Initial prototype iterations should focus on routability as the key to achieving predictable timing closure. You should
attempt to resolve congestion before attempting timing closure. Designs which are congested are more likely to have
timing jumps during timing and Signal Integrity (SI) closure. Tools such as module guides, block placement, block
halos, obstructions, and partial placement blockages (density screens) are used to control the efficient routing of the
design.
Use the following guidelines during floorplanning and placement to avoid congestion.
1. Choose an appropriate floorplan style.
If possible, review a data flow diagram or a high-level design description from the chip designer to determine an
appropriate floorplan style.
Assess different floorplan styles such as hard macro placement in periphery, island, or doughnut (periphery and
island). Keep the macro depth at 1 to 2 for best CTS, optimization, and Design-for-test (DFT) results. If possible,
consider different aspect ratios to accommodate a shallower macro depth. Consider using Automatic Floorplan
Synthesis and relative floorplanning constraints to simplify floorplan iterations.
2. Preplace I/Os and macros.
Review hard macro connectivity and placement based on the minimum distance from a hard macro to its target
connectivity.
Preplace high-speed and analog cores based on their special requirements for noise isolation and power
domains.
3. Review I/O placement to identify I/O anchors and associated logic.
Verify that logic blocks and hard macros which communicate with I/O buffers are properly placed and have
optimal orientation for routability. Push down into module guides to further assess the quality of the floorplan and
resulting placement.
4. Allow enough space between preplaced blocks.
Allow space between I/Os and peripheral macros for critical logic such as JTAG, PCI, or Power management
logic. Use the specifyJtag and placeJtag commands prior to placing blocks.
Use block halos, placement obstructions or fences around blocks prior to optimization or Clock Tree Synthesis
(CTS). Placement generally does not do a good job of placing cells between macros. Reduce or remove halos
and obstructions after placement to make sufficient space available around macros for optimization, CTS, DRV,
or SI fixing to add buffers.
Place other cells such as endcaps, welltaps and decaps prior to placement as required.
5. Use module guides carefully.
Place module guides, regions, or fences only when greater control is required. Be careful not to place too many
module constraints early in the floorplanning process because it is time consuming and greatly constrains the
placement. Module guides should be used for floorplan refinement or hierarchical partitioning.
Review the placement of module guides related to datapath and control logic relative to the associated hard
macros. Datapath logic can be a source of congestion problems due to poor aspect ratios, high fanout, and
large amounts of shifting. Consider tuning the locations of these module guides and lowering the density to
reduce congestion.
Review the placement of module guides related to memories. These modules can typically have higher densities
due to the inclusion of the memories.
6. Reorder scan chains
When evaluating congestion, make sure that scan chains are reordered to eliminate "false" hot spots in the
design. Failing to reorder the scan chain can cause a routable floorplan to appear unroutable. By default,
placeDesign performs scan tracing and scan reordering based on global or user-specified scan reorder settings
and specified or imported scan chain information. If scan chain information is missing, no reordering is
performed.
April 2012
A congested or unroutable design at this stage will not get better during optimization. With a good floorplan you should
be able to:
Place the design in the floorplan without issues
Create a routeable placement
Make sure to consider the following when finalizing the floorplan:
The power grid should be defined:
Global net connections properly defined using globalNetConnect command.
Nets requiring optimization should be defined as signal (regular) nets. Optimization treats nets in the
SPECIALNETS as dont_touch.
PINS marked with + SPECIAL cannot be optimized.
Followpin routing should align to rows/cells with correct orientation (VDD pin to VDD followpin).
Gaps between standard cells and blocks should be covered with soft or hard blockages. Placement cannot
place cells in the area of soft blockages but optimization and CTS can.
All blocks should be marked fixed (setBlockPlacementStatus -allHardMacros -status fixed).
Tracks should match IO pins and placement grid (rows). Use generateTracks to update the tracks.
Timing-Driven Placement
As the routability of the floorplan stabilizes, you should shift your focus to timing-driven placement.
Timing-driven placement (setPlaceMode -timingDriven true) and pre-placement optimization (buffer tree deletion) are
enabled by default:
placeDesign
Following are additional tips for performing placement on high performance, high congestion and high utilization
designs.
Tips for Placing High Performance Designs
By default, optDesign optimizes only the most critical paths. For high performance designs it can help to optimize
all paths. "setOptMode -allEndPoints true" enables optimization of all paths and replaces "setOptMode criticalRange 1.0" from previous releases. setOptMode -allEndPoints true Note in EDI 10.1 we no longer
recommend running "placeDesign -inPlaceOpt". Furthermore, the -inPlaceOpt option is ignored if "setOptMode allEndPoints true" or "setDesignMode flowEffort high" is enabled. Pre-CTS optimization has been improved in
EDI 10.1 and therefore -inPlaceOpt is no longer needed. Overall, running "placeDesign + optDesign -preCTS"
with "setOptMode -allEndPoints true" enabled reduces the run time of placement and pre-CTS optimization while
achieving similar results as "placeDesign -inPlaceOpt + optDesign -preCTS".
The following will disable buffer tree deletion prior to placement. Usually, removing the existing buffer tree before
placement produces better timing result and wire length for the majority of designs. However, in some designs,
keeping the existing buffer tree generates better timing results. placeDesign -noPreplaceOpt
The following option is useful when clock gating checks are in the critical paths and to help limit jumps seen by
CTS moving gating elements. You must read in the clock tree specification file prior to placeDesign for this
option to work. Use "specifyClockTree -file specFile". setPlaceMode -clkGateAware true
Net weights can be used in specific situations to instruct the placer to minimize the distance between the driver
and sinks. Net weights should be used on a limited basis when the designer determines the placer needs to put
more effort placing specified logic closer together. For example, if there are half cycle paths to or from memories
or latches, these nets may benefit from higher net weights. Or if you need greater control over how the fanout of
clock gating cells are placed. Use specifyNetWeight to assign a greater weight to a net (default is 2). For
example: specifyNetWeight netName 10
Tips for Placing Congested or High Utilization Designs
When optimization contributes to the congestion or the design has high utilization, running in high effort mode
helps (High utilization is typically associated with utilizations > 80%, but depending on the technology, 70% might
already start to be difficult to handle): setDesignMode -flowEffort high
The following option increases the numerical iterations and makes the instance bloating more aggressive. It also
automatically enables the congRepair command. setPlaceMode -congEffort high
The following optimizes wire length by swapping cells. Optimization can reduce the total wire length up to two
percent without degrading placement. In high-effort mode, it might add up to ten percent to the run time.
setPlaceMode -wireLenOptEffort high
If clock routing is a source of congestion due to NDRs or other rules, setting the clock attributes before placement
and allowing the placer and trialRoute to see the effect can help. Here is some example code that can be called
before placement. The reason timeDesign is called is to make sure CTE does the clock net marking to identify all
the clocks: specifyClockTree -file <cts.spec>
changeClockStatus -all -useClock
timeDesign -prePlace -outDir RPT -prefix preplace
dbForEachCellNet [dbgTopCell] net }
There is a standalone command called congRepair which can be called in any part of the pre-route flow to
attempt to relieve congestion. The command uses trialRoute + incremental placement. This can have a
significantly detrimental effect on timing and often will require additional optDesign calls (and may not converge).
So congRepair should be used with caution.
Pre-CTS Optimization
Guidelines for Pre-CTS Optimization
April 2012
10
April 2012
11
(optional step)
ckCloneGate
# Perform preCTS optimization
setOptMode -clkGateAware true
optDesign -preCTS
# Clone the clock gates in time driven mode. This considers the setup timing
# to the clock gating cells.
ckCloneGate -timingDriven
timeDesign -preCTS
# Run preCTS optimization again (optional)
optDesign -preCTS Over-constraining Clock Gating Paths You can create separate path groups for the
clock gating cells and over constrain them. Following is a sample script to do this. Set the clkgate_target_slack
to your desired value. Use care when over constraining designs as it increases run time: # Set target overshoot
slack for clock gating elements preCTS (in nanoseconds):
set clkgate_target_slack 0.15
# Turn on pathgroup flow
setAnalysisMode -honorClockDomains false
# Create separate reg2reg and clkgate groups
group_path -name reg2reg -from [all_registers] -to [filter_collection [all_registers]
"is_integrated_clock_gating_cell != true"]
group_path -name clkgate -from [all_registers] -to [filter_collection [all_registers]
"is_integrated_clock_gating_cell == true"]
setPathGroupOptions reg2reg -effortLevel high -criticalRange 1
setPathGroupOptions clkgate -effortLevel high -targetSlack $clkgate_target_slack -criticalRange 1
Tips for Optimizing Congested and High Utilization Designs
The risk for timing optimization on designs with high routing congestion mainly comes from nets detouring that
are unpredictable. In addition to floorplan and placement guidances made previously, it is usually beneficial to
identify the weak cells and to set them as dont_use. setDontUse cellName(s) Note it's recommended to use
setDontUse rather than the SDC set_dont_use. This is because setDontUse applies to all operating modes while
set_dont_use only affects the operating mode(s) it is applied to.
The following reduces local congestion during timing optimization: setOptMode -congOpt true
The following option will run additional iterations for resolving routing congestion in trialRoute:
setTrialRouteMode -highEffort true
Allowing M1 routing can help to reduce congestion and thus removes some pessimism during RC extraction:
setTrialRouteMode -useM1 true
Monitor the density increase. If needed, reduce/remove extra margins on setup target and DRV. Usually the
density starts blowing up when trying to fix the last tens of picoseconds and the better timing seen pre-CTS will
lead to flow divergence later on. Properly setting the setup target slack will reduce area and improve runtime. For
example, if the target is -0.5ns, it might be helpful to set "setOptMode -setupTargetSlack -0.2". The same applies
to DRV fixing by using -drcMargin option: setOptMode -setupTargetSlack <slack> -drcMargin <value>
Depending on the quality of the initial netlist, it could be that some redundant or useless logic could be safely
removed to reduce area. setOptMode -simplifyNetlist true
April 2012
12
The goal of clock tree synthesis is to build a buffer distribution network to meet the timing requirements among the leaf
pins. It includes the following jobs.
Creating clock tree spec file
Building a buffer distribution network
Routing clock nets using NanoRoute
April 2012
13
clockDesign Setting
Default Setting
RouteClkNet
Yes
No
PostOpt
Yes
Yes
OptAddBuffer
Yes
No
The clockDesign command generates the default clock tree specification file (if not specified), deletes existing clock
trees, builds the clock tree, calls NanoRoute to route the clock nets, and then optimizes the clock tree to improve the
skew including resizing buffers or inverters, adding buffers, refining placement, and correcting routing.
If you performed useful skew optimization (setOptMode -usefulSkew true), clockDesign automatically checks for any
scheduling file in the working directory, or checks for "rda_Input ui_scheduling_file", and honors the scheduling file while
building the clock tree.
If the clockDesign command calls NanoRoute to route the clock nets, direct NanoRoute to follow the route guide by
using the command "setCTSMode -routeGuide true". This is enabled by default. This operation can improve the
correlation between pre-route and post-route clock nets.
Tips for Performing CTS on High Performance Designs
Skew, slew and latency are the primary methods to get the highest performance design but there is a balance between
tightening these constraints and getting the best clock tree (tighter values often increase the area/power of the tree).
Reducing latency often lessens the impact of derating, but sometimes can come at the expense of common paths which
get filtered via CPPR.
Mode settings to reduce latency:
The following affects actual tree construction. It can increase runtime considerably and ignores MinDelay
constructs: setCTSMode -synthLatencyEffort high
The following performs optimization after the tree construction mostly by optimizing the location of the tree
elements and can also add significant runtime: setCTSMode -optLatency true
The following reduces the size of the tree by performing optimization after the tree construction to delete and
downsize elements to recover area: setCTSMode -optArea true
April 2012
14
April 2012
15
Confirm the RC scaling factors for the clocks are set properly. See How to Generate Scaling Factors for RC
Correlation.
Constrain the routing to two upper routing layers using a RouteType in the CTS specification file. Constraining the
routing to two layers reduces differences in layer assignment between CTS and NanoRoute.
Use displayClockMinMaxPaths with the -preRoute and -clkRouteOnly options to compare pre-route and clock
route paths.
April 2012
16
+ ssfd2s/D rising
PreservePin
+ cgen/mod000043/A
ExcludedPin
+ freg/mod004048/CLK
ExcludedPort
+ DFF_B/CLK
End
Post-CTS Optimization
Post-CTS Optimization Command Sequences
Hold Optimization
Checking Timing
Hold Optimization
At this point run timing analysis to report hold violations:
timeDesign -postCTS -hold -outDir postctsHoldTimingReports
Timing optimization to fix hold violations can be performed at this point. This is recommended if your design has a
April 2012
17
significant number of hold violations because they are easier to fix prior to routing. If the number of hold violation is low
you can wait until after routing the signal nets.
To perform hold optimization:
optDesign -postCTS -hold -outDir postctsOptHoldTimingReports
Tips for Performing Hold Optimization on High Peformance Designs
Try to avoid providing very weak buffers for Hold fixing because they are more sensitive to routing detour and SI.
For Multi-Vth design, make sure to run leakage optimization (through optDesign or optLeakagePower) before
running Hold fixing since leakage reduction improves Hold timing.
For a design with many hold violated paths, it is highly recommended to run hold fixing at postCts stage already
(although there is no need to achieve 0ns slack at this stage). Then, use postRoute hold fixing to fix the remaining
violations. You can use a negative hold target slack to focus hold fixing on the paths with large violations and fix
the remaining hold violations after routing. For example, the following sets a hold target slack of -200ps:
setOptMode -holdTargetSlack -0.2
By default Hold fixing can degrade setup TNS (but not Setup WNS). This can be changed through:
setOptMode -fixHoldAllowSetupTnsDegrade true|false
To exclude some path_group/clockDomains from hold fixing you can apply: setOptMode ignorePathGroupsForHold
Things to watch out for:
Make sure that the clock trees are well balanced (inter and intra clock trees).
Avoid over constraining the clock uncertainty.
Reduce local placement congestions. Adding cell padding may sometime help.
If you want to allow Setup WNS degradation, you should set a negative setup target slack (setOptMode setupTargetSlack )
In Verbose mode (setOptMode -verbose true), Hold fixing will print a detailed report where each net not buffered
will be sorted through several categories.
This will allow the user to identify what are the most critical issues to resolve.
It will help the user in understanding why a given net was not buffered.
The different categories listed in the log file are : *info: XXX net(s): Could not be fixed because of 'no legal
location'
*info: XXX net(s): Could not be fixed because they would "degrade setup reg2reg WNS".
*info: XXX net(s): Could not be fixed because they would "degrade setup TNS"
*info: XXX net(s): Could not be fixed because they would "degrade setup WNS".
*info: XXX net(s): Could not be fixed because they would further "Degrade Hold".
*info: XXX net(s): Could not be fixed because they would "Degrade max_cap violations".
*info: XXX net(s): Could not be fixed because they would "Degrade max_tran violations".
*info: XXX net(s): Could not be fixed because they already have "Violating DRV".
*info: XXX net(s): Could not be fixed because they are set as dont_touch.
*info: XXX net(s): Could not be fixed because they are flagged as badly routed.
*info: XXX net(s): Could not be fixed because they are Multi-Driver nets.
*info: XXX net(s): Could not be fixed because they are clock nets.
*info: XXX net(s): Could not be fixed because no Always-On-Buffers are usable.
*info: XXX net(s): Could not be fixed because "no valid node" were found on net.
*info: XXX net(s): Could not be fixed because of "internal failure".
"setOptMode -fixHoldAllowOverlap" controls if hold fixing is limited to purely legal moves (no overlaps). When
set to "true" hold optimization allows initial cell insertion to overlap cells and then refinePlace legalizes the cells
placement. This provides optimization more opportunity to fix violations. When set to the default value of "auto"
hold optimization is allowed to create overlaps during post-CTS optimization but not during post-route
optimization. To allow overlaps during post-route optimization as well set the following: setOptMode fixHoldAllowOverlap true This can be also used post-route but should not be used with SI optimization because
the refinePlace call may not be made.
Tips for Performing Hold Optimization on Congested and High Utilization Designs
In a multi-Vth library scenario, make sure that Leakage Optimization was run before Hold fixing.
Hold fixing is allowed to overlap cells during post-CTS hold optimization but not during post-route optimization.
To enable overlaps during post-route optimization as well set the following: setOptMode -fixHoldAllowOverlap
true
Control hold time margins
Beyond certain hold margin, delay buffer addition rate increases exponentially
Try useful skew optimization for RAM and Register files
Checking Timing
You can use the following commands to report setup and hold time violations after post-CTS optimization.
timeDesign -postCTS -outDir postctsOptTimingReports
timeDesign -postCTS -hold -outDir postctsOptTimingReports
Detailed Routing
Improving Timing during Routing
Routing Command Sequence
April 2012
18
PostRoute Extraction
Checking Timing
After post-CTS optimization, there should be few, if any, timing violations left in the design. The goals of detailed routing
include:
Routing the design without DRC or LVS violations. NanoRoute performs a DRC and cleans up violations.
Routing the design without degrading timing or creating signal integrity violations.
NanoRoute can perform timing-driven and SI-driven routing concurrently. NanoRoute routes the signals which are
critical for signal integrity appropriately to minimize cross-coupling between these nets which would lead to post-route
signal integrity issues.
PostRoute Extraction
All nets are now routed. It is important to now set the extraction mode to post-route and specify the extractor to use.
Specify the engine to be postRoute: setExtractRCMode -engine postRoute
Set -effortLevel so extractRC uses your desired extractor: setExtractRCMode -effortLevel low|medium|high
low - Invokes the native detailed extraction engine. This is the same as specifying the "-engine postRoute"
setting.
medium - Invokes the Turbo QRC (TQRC) extraction mode. TQRC performance and accuracy falls
between native detailed extraction and IQRC engine. This engine supports distributed processing.
TQRC engine is recommended for process nodes < 65nm. Note: This setting does not require a
QRC license.
high - Invokes the Integrated QRC (IQRC) extraction engine.
IQRC provides superior accuracy compared to TQRC. IQRC is recommended for extraction after
ECO. In addition, IQRC supports distributed processing. Note: IQRC requires a QRC license.
April 2012
19
Checking Timing
Use the following command to do a post-route timing check: timeDesign -postRoute -outDir postrouteTimingReports
timeDesign -postRoute -hold -outDir postrouteTimingReports If timing jumps at this point compared to timing before
routing check the following:
Are the post-route RC scaling factors set properly?
Is the routing topology similar between trialRoute and NanoRoute? Compare the same paths between post-CTS
and post-Route databases and for large differences in loads.
Post-Route Optimization
Data Preparation
Post-Route Optimization Command Sequences
Checking Timing
Optimizing With 3rd Party SPEF or SDF
During post-route optimization, there should be very few violations that need correction. The primary sources of these
timing violations include:
Inaccurate prediction of the routing topology during pre-route optimization due to congestion-based detour
routing
Incremental delays due to parasitics coupling
Since the violations at this stage are due to inaccurate modeling of the final route topology and the attendant parasitics,
it is critical at this point not to introduce any additional topology changes beyond those needed to fix the existing
violations. Making unnecessary changes to the routing at this point can lead to a scenario where fixing one violation
leads to the creation of others. This cascading effect creates a situation where it becomes impossible to close on a final
timing solution with no design rule violations.
One of the strengths of post-route optimization is the ability to simultaneously cut a wire and insert buffers, create the new
RC graph at the corresponding point, and modify the graph to estimate the new parasitics for the cut wire without redoing extraction.
In addition to the timing violations caused by inaccurate route topology modeling, the parasitics cross-coupling of
neighboring nets can cause the following problems that need to be addressed in high speed designs:
An increase or decrease in incremental delay on a net due to the coupling of its neighbors and their switching
activity.
Glitches (voltage spikes) that can be caused in one signal route by the switching of a neighbor resulting in a
logic malfunction.
These effects need to be analyzed and corrected before a design is completed. They are magnified in designs with
small geometries and in designs with high clock speeds.
Data Preparation
SI optimization requires certain preparation:
Make sure cdB libraries are specified for each cell for each delay corner.
You must be in On Chip Variation (OCV) mode to see simultaneous clock pushout / pullin. Enable this using
"setAnalysisMode -analysisType onChipVariation"
Confirm your CeltIC settings match your signoff tool
Confirm coupling filter settings match your signoff tool
Enable SI CPPR through "set_global timing_enable_si_cppr true" (this is the default)
Additionally, consider the following techniques if you have difficulty achieving signal integrity closure on your design.
Watch for routing congestion during floorplanning and especially after detailed routing.
Consider running the congOpt command on the design to eliminate local hot spots or adjust your
floorplan
Use NanoRoute advanced timing with SI-driven routing options during detailed routing. For example:
setNanoRouteMode -routeWithTimingDriven true
setNanoRouteMode -routeWithSiDriven true
Fix transition time violations.
Slow transitions introduce a larger delay penalty or incremental delay.
Prepare all of the required noise models.
Highly-accurate CeltIC analysis requires the use of accurate noise models like cdB, ECHO, or xILM. The
use of blackbox (missing) models could lead to a significant number of false violations.
You must recalibrate the noise library with each release of CeltIC. New features may not function
properly when used with an old .cdB file.
Characterize your memory and analog block using the make_cdb utility.
Create XILM models for sub blocks to model their noise sensitivity at the chip level.
April 2012
20
The Advanced Analysis Engine (AAE) is a new unified delay calculation engine that simultaneously computes base and
SI delays. AAE can perform incremental SI delay computation during timing optimization; hence provide accurate
feedback to optimization engine. AAE also capable of multithreading thus gives substantial runtime gains in timing
closure.
Unlike the default flow, AAE based optimization does not require four steps optimization:
Default:
optDesign -postRoute
optDesign -postRoute -hold
optDesign -postRoute -si -hold
optDesign -postRoute -si
AAE (recommended):
optDesign -postRoute
optDesign -postRoute -hold
To run AAE based SI fixing:
<Load routed data base with MMMC setup>
setAnalysisMode -analysisType onChipVariation -cppr both
setDelayCalMode -engine default -SIAware true
optDesign -postRoute
optDesign -postRoute -hold
For more information on the AAE flow see the application note: PostRoute Optimization Using the Advanced Analysis
Engine (AAE).
Tips for Performing Post-Route Optimization High Performance Designs
For multi-VT designs, after routing there can be large WNS/TNS jumps due to the sensitivity of HVT cells
runtime of postRoute optimization can be significantly impacted by large TNS
Using optLeakagePower -fixTimingOnly prior to optDesign -postRoute can massively speed up closure
Only works for multi-VT
Does not have much impact when designs has already mainly LVT cells.
The registers by default are marked + FIXED by clockDesign and cannot be resized eventhough postRoute
optimization can resize flops. To temporarily set them as + PLACED during optDesign, you can apply:
setOptMode -unfixClkInstForOpt true
The local density by default is limited to 98%. If optimization is prematurely exiting due to local placement
hotspots this can be increased to 100% using
setOptMode -maxLocalDensity 1.0
Make sure you are in OCV analysis (MMMC setup required) to calculate pushin/pullout timing properly:
setAnalysisMode -analysisType onChipVariation Note if the design is not in OCV mode, clock SI
pushout/pullin will not be correct.
Make sure your extraction filters correlate to your signoff extraction
when using iQRC the filters typically can be set to the exact signoff values
make sure to use setExtractRCMode -capFilterMode relAndCoup
StarRC has no total_c_th equivalent so set this to 0 for these flows For postRoute optimization, you
may want to force the tool to only allow legal resizing to avoid placement legalization and thus limit
routing changes: setOptMode -postRouteAllowOverlap false
Checking Timing
AAE is a fast SI engine so it does not perform circuit simulations to compute SI delays. Being a fast SI engine, it does
not completely correlate with Celtic (signoff SI engine). AAE is tuned well to work with optimization engine in order to
provide better QOR. So there is a tendency to be pessimistic compared to Celtic.
AAE being pessimistic compared to Celtic means it does not need 'setSIMode -analysisType pessimistic' setting for
Primetime correlation. Please use 'setSIMode -analysisType default' in AAE and Celtic flows for better correlation.
Optimization final summary is AAE estimation based report only. So if you want to check your QOR, you need to run
signalStorm+Celtic based timeDesign between setup and hold optimization or at the end of hold optimization. For
example:
timeDesign -postRoute -si -outDir final_setup
timeDesign -postRoute -si -hold -outDir final_hold
If you are using Encounter Timing System (ETS) for signoff timing make sure the CeltIC settings are consistent between
EDI System and ETS.
If you are using PrimeTime-SI for signoff set "setSIMode -analysisType pessimistic" prior to timeDesign for better
correlation.
April 2012
21
factors are set properly within EDI System. If you are using a 3rd Party timing analysis tool, make sure it correlates with
EDI System by verifying SDC constraints are applied consistently between the tools. The delay calculation and timing
analysis results between EDI System and the 3rd Party tool should also correlate.
If timing violations occur when using SPEF from a 3rd Party extractor you can import the SPEF into EDI System and
perform optimization. You must import a SPEF for each RC corner. The flow is: spefIn rc_corner1.spef -rc_corner
rc_corner1
spefIn rc_corner2.spef -rc_corner rc_corner2
...
optDesign -postRoute [-si] [-hold] -outDir spefFlowTimingReports
The -si and -hold options are optional in the above command.
Use -hold if you need to perform hold fixing based on the the SPEF.
Use -si if you need to perform SI fixing based on the SPEF.
optDesign will use the SPEF for initial timing to determine the best location to optimize the paths.
optDesign can also optimize based on SDF from a 3rd Party timing analyzer. This is useful when the timing analyzer
reports timing violations EDI System did not. The flow to optimize based on SDF is: read_sdf -view view1 sdf1.sdf
read_sdf -view view2 sdf2.sdf
optDesign -postRoute -useSDF [-hold] -outDir sdfFlowTimingReports
Note in order to use SDF information in MMMC analysis mode, you should provide an SDF file for each active
view in the design. If a view was not given an SDF file, EDI System will run detail extraction and delay calculation
for that specific view.
optDesign will run extraction to calculate slew values because SDF doesn't contain slews. The delays in the SDF
will be used, but the slews must be generated. You can also spefIn to avoid the extraction.
Additional Resources
Following are additional application notes, training and documentation related to timing closure. These can be found at
http://support.cadence.com:
NanoRoute Recommended Options with emphasis on 32nm and below advance node/technology (EDI)
Encounter Digital Implementation System Foundation Flows Guide
How to Generate Scaling Factors for RC Correlation
PostRoute Optimization Using the Advanced Analysis Engine (AAE)
Guide to Clock Tree Synthesis (CTS) in a block or flat chip using EDI system
April 2012
22
EDI to ETS Correlation - Guidelines on How to Debug and Correlate Timing and SI results
April 2012
23