Download as pdf or txt
Download as pdf or txt
You are on page 1of 94

Institut für Integrierte Systeme

Integrated Systems Laboratory

Department of Information Technology and Electrical Engineering

VLSI II: Entwurf von hochintegrierten Schaltungen


227-0147-00

Training 1

SoC Encounter for Designers II


Prof. Dr. H. Kaeslin
Dr. N. Felber

SVN Rev.: 1016


Last Changed: 2013-10-15

Reminder:
With the execution of this training you declare that you understand and accept the regulations about
using CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://dz.ee.ethz.ch/regulations/index.en.html.
1 Overview

Unlike other exercises in the VLSI lectures, the back-end design flow requires you to learn how to
use a commercial Electronic Design Automation (EDA) tool, in our case C ADENCE S O C E NCOUNTER
from Cadence Design Systems. These exercises are therefore called ’Trainings’ and will teach you
the basics of C ADENCE S O C E NCOUNTER so that you can use it for your semester projects.
There will be three trainings:

• Training 1
Floorplanning, placement, clock tree synthesis, optimization, routing and timing analysis with
C ADENCE S O C E NCOUNTER .
• Training 2
Determining power consumption, IR drop analysis.
• Training 3
Tape-out preparation, performing Design Rule Check (DRC) and Layout Versus Schematic
(LVS) on your final database.

Students who plan to work on an ASIC semester project should make sure to visit all three trainings.

1.1 About the Style

We will try to use a number of different styles to identify different types of actions. These are summa-
rized below:
Student Task: Parts of the text that have a gray background, like the current paragraph, indicate
steps required to complete the exercise.

Actions that require you to select a specific menu fill be shown like the following:
menu→sub-menu→sub-sub-menu
Whenever there is an option or a tab that can be found in the current view/menu we will use a BUTTON
to indicate such an option.
Throughout the exercise you will be asked to enter certain commands using the commandline1 . The
following is an example of the linux command line.
sh > command to be entered on the linux command line

Whereas some of the commands will be entered on the command line of the C ADENCE S O C E N -
COUNTER tool such as:

enc > this command is an encounter command

1
There are many reasons for using a commandline. Some functionality can not not be accessed through GUI commands,
and in some cases, using the commandline will be much faster. Most importantly, things you enter on the commandline
can be converted into a script and executed repeatedly

2
2 Introduction

In this training we will start with a structural Verilog design netlist (from synthesis) and create step by
step a physical layout that can be manufactured. To keep runtimes reasonably low, we will use an
example design with a (slightly) lower complexity than most student design projects.

2.1 Example Design

The example design is based on the FIR filter that we have been using in the past exercises. The filter
has been changed to include several pipelined filter stages as shown in the block diagram below2 .

ResetxRBI

DataInxDI RamWDxD RamTestxTI

DataInReqxSI RamAddrxD
DataInAckxSO SY180_2048X16X1CM8
r256x72tb300xo
RamRDxD

LUT 16 LUT 16 LUT 16 LUT 16 LUT 16

16 32 16 32 16 32 16 32 16 32
ScanEnxTI DataOutAckxSI
DataOutReqxSO
48 48 48 48 48 48 48 48 48 48

48 48 48 48 48 48 48 48 48 48 DataOutxDO
48 48 48 48 48
’0’

ClkxCI
filter_stage1 filter_stage2 filter_stage3 filter_stage4 filter_stage8
filter
fiter_top
filter_chip

Each filter stage contains a large multiplier, a look-up table and an accumulator. Note that the input of
the first stage is tied to constants and therefore greatly simplified. The following is a short description
of all pins of the circuit:

2
The filter is basically useless and has only been engineered as an example circuit suitable for the exercise.

3
Pin Descriptions
Name Bits Dir Description
ClkxCI 1 In Clock input
ResetxRBI 1 In Reset input, active low signal, 0: Reset
ScanEnxTI 1 In Scan Enable for testing, 1: Scan
RamTestxTI 1 In Ram bypass control, 1: Test (RAM bypassed)
DataInxDI 16 In 16-bit data input
DataInReqxSI 1 In Request signal for data input
DataInAckxSO 1 Out Acknowledge signal for data input
DataOutxDO 16 Out 16-bit data output
DataOutReqxSO 1 Out Request signal for data output
DataOutAckxSI 1 In Acknowledge signal for data output

3 Getting Started

You will need a terminal program to type in commands throughout this exercise. In the computers
in the ETZ D61.2 you can get a terminal by accessing the menu on the top left corner and selecting
Applications→Accessories→Terminal.

Student Task 1:
• Change to your home directory and install the training files with the script provided:
sh > cd ˜
sh > /home/vlsi2/t1/install_t1

• Change to the design directory


sh > cd training_1

The copied files and folders are arranged in a certain structure which is described in the next sec-
tion.

3.1 Directory Structure

The following figure shows the directory structure for a design directory that was created by the
cockpit tool developed by the Design Zentrum (DZ) of ETH Zurich.

4
design .cockpitrc Configuration for the cockpit

calibre Final layout, DRC and LVS


docs Links to documents

encounter out Final output files: netlist, layout, timing (Verilog,GDSII, SDF)

save Save files for Encounter (Encounter native format)

scripts Example scripts, run scripts (TCL)

src Input source files: netlist, constraints, io placement

sample Sample input files

tech Links to technology files, etc.


lef Links to absracts and technology
lib Links to timing libraries

modelsim Simulation tool


simvectors Stimuli and expected responses

sourcecode VHDL sourcecode

synopsys Synthesis environment

tetramax Test vector generation, test coverage

In this structure, there are five subdirectories for C ADENCE S O C E NCOUNTER . It is strongly recom-
mended to use them in the following way:

out Place all final data to be exported from C ADENCE S O C E NCOUNTER in this directory. This
includes the final netlist (the initial netlist gets modified by clock tree insertion, optimization etc.),
layout and delay files that will be used for postlayout simulation and/or physical verification and
chip finishing. A sample script that generates all these files is provided (scripts/exportall.tcl).
save Put all C ADENCE S O C E NCOUNTER save files, i.e. files in native C ADENCE S O C E NCOUNTER
format, in this directory.
scripts Contains TCL scripts. By default several example scripts for common tasks are provided. It
is highly recommended to develop a run script that contains all the commands used for your
design.
src All user input files should be placed here. These include the initial Verilog netlist, the I/O place-
ment file, timing constraints file and clock tree definition file (all will be explained later in section
3.2).
tech Holds links to technology specific files. Cockpit manages this directory automatically.

3.2 Input Files

The input files required for back-end design with C ADENCE S O C E NCOUNTER can be divided into
two categories:
• Design files that describe (or are closely related with) the circuit, first of all the Verilog netlist of
our synthesized design.
• Technology files that describe the technology itself as well as libraries of standard building
blocks implemented in this technology.

5
Let’s start with the first category.

3.2.1 Verilog Netlist

The Verilog netlist we obtain from synthesis contains standard cells, functional I/O pads and their
interconnection information. While the functionality including scan circuitry is already complete, some
special cells are still missing:
• Supply pads to provide power and ground to the core (pads ’VCCKD’ and ’GNDKD’) and to the
padframe (pads ’VCC3IOD’ and ’GNDIOD’).
• Corner pads that need to be placed in the corners of the padframe to complete the power lines
running inside the padframe (pad CORNERD).
Due to the arrangement we have with our ASIC manufacturer, student designs are strictly limited in
size. As a consequence at most 56 pads (not including the 4 corner pads) can be placed in the
padframe. Furthermore, to ease chip testing on the ASIC tester two predefined power schemes have
been established:
1. 40 signal pads, 16 supply pads
Take a look at the following web page for an illustration of the power schemes and to obtain further
information on constraints for the semester design projects.
http://www.eda.ee.ethz.ch/index.php/UmcL180#Mini.40sic

With all this information we are now ready to add the missing corner and supply pads to our Verilog
netlist.
A typical Verilog netlist that you will obtain from S YNOPSYS D ESIGN C OMPILER will contain many
levels of hierarchy. Each level of hierarchy is enclosed between the

module name ( pin names separated by comma )


...
endmodule

statements, where ’name’ refers to the name of the module (module is the Verilog equivalent of an
entity in VHDL). In our case we need to add the pads to the top-level module which contains the rest
of the I/O pads. The top-level design is almost always the last module definition in a Verilog file3 .

Student Task 2:
• Copy the Verilog netlist to encounter/src/ in order to have a clean copy of the initial netlist
even if synthesis is rerun.
sh > cd encounter/src/
sh > cp -p ../../synopsys/netlists/filter_chip.v \
filter_chip.v.initial

The file specialpads.v contains four corner pads and 8 supply pads corresponding to the
power scheme 1. As our design uses power scheme 1, no changes are required to this
3
The content of the module needs to be defined before it can be instantiated by a different module. Consequently the
top-level module is the last to be defined, however not all Verilog files need to be hierarchical, a design can also be
spread between multiple files

6
file. For power scheme 2, we would have to comment out the eight additional supply pads
(comments in Verilog start with //).
What remains to do is to add the contents of specialpads.v at the right point, i.e. where the
other pads are, to the initial netlist.
• Using a text editor a , open filter chip.v.initial and find the definition of the top-level module
’chip’ by searching for:
module chip
Below this declaration you should see lines that instantiate the pads. Insert the contents of
specialpads.v at this point. As long as you are in the module body, it does not matter where
exactly you insert them.
• Save the file as filter chip.v and exit the text editor.
a
There are many text editors you can use. There are terminal based editors (vi, vim, nvi, joe, jed, pico, nano etc.),
editors that are mainly terminal based but have a simple GUI (emacs, xemacs, gvim etc), and GUI based editors
(mousepad, gedit, nedit, kate etc). Out of these emacs, vi (and derivatives), and nedit are the most advanced
editors.

Remark: In the future you can use a small Perl script to add the specialpads to the initial netlist, i.e.
sh > ./insert_specialpads ../../synopsys/netlists/filter_chip.v \
./specialpads.v > filter_chip.v

inserts the contents of specialpads.v into the last module defined in ../synopsys/netlists/filter chip.v
and write the modified netlist to filter chip.v.

3.2.2 I/O File

After the last step our Verilog netlist contains all pads. However there is no information that actually
tells the tool where each pad should be placed. The pad placement is very important as it directly
determines the PCB layout4 . In our case, we want all designs to share a common power and ground
pad locations so that a single test board can be used on our ASIC tester. For practical reasons we
have decided to use a 56-pin package for all designs. So even though the chip has only 48 physical
pins, it will be placed in a package that contains 56 pins5 . Depending on the power configuration,
a different bonding scheme will be used. These two configurations can be seen on the following
webpage:
http://www.eda.ee.ethz.ch/index.php/UmcL180#Mini.40sic

The cockpit will copy sample I/O files automatically to the src/sample directory6 . All lines starting
with ‘#‘ are comments. The file consists of two main sections: globals and iopad.

(globals
[global definitions]

4
A good pinout could simplify the routing on the PCB, allow you to use fewer layers and result in less parasitics
5
8 pins will be left unconnected
6
For this technology there will be four files. There will be two template files chip.io−template and chip−ep.io−template
for the normal and extended power configuration respectively. These files have all the required power connections in
place, and the data sections are commented out. There are also two example files that have fictional I/O placement
where all pins are defined.

7
)
(iopad
(topleft
[pads that are on the top left]
)
(left
[pads that are on the left side]
)

[definitions for other sides]


)

For us the relevant part is the iopad section. This part contains eight subsections that define the
names of the pad instances, and their locations in the four sides and four corners. We do not have
to touch the corner specifications7 as they will be the same for all designs. We have to distribute the
pads among the four sides of the chip top, right, bottom, left. If you look at the sample file you
will see that for each pad there is a single line entry in the following form

(inst name="NAME_OF_PAD" offset=OFFSET_VALUE ) # pin no: PIN_NUMBER

The last part following # is a comment, it is there just for your information. Regardless of the power
scheme you are using, we will use the same 56 pin package as illustrated in the webpage above.
The PIN_NUMBER is just a reminder to show which particular location is being defined. The location
is specified using the OFFSET_VALUE. C ADENCE S O C E NCOUNTER uses a coordinate system that
bases the coordinate (0,0) on the bottomleft corner as shown in the figure below:

top
topleft topright

1
2

left right
Side

Offset

0,0
bottomleft bottom bottomright

On the left and right side the pads will be ordered from bottom-to-top, and on the top and
bottom side the pads will be ordered from left-to-right. This ordering can be quite confusing, as
it is neither clockwise, nor counterclockwise. Therefore the aforementioned comments showing the
actual pin numbers will be very useful.
7
topleft, topright, bottomleft, bottomright

8
The OFFSET_VALUEs given in the template represent fixed locations for the given pad. It is very
important that you do not change these values, as the chip-finishing part will rely on the pads being
located exactly at these locations.
You can assign your pads by writing the name of each pad into the corresponding NAME_OF_PAD.
The name of the pad will be the name of the instance in the Verilog file. For example assume that you
are using standard power scheme and your clock signal is assigned to a pad named pad_clock. In
your Verilog file you would have the following entry for this pad:

XMD ClkxCI_PAD ( .I(ClkxCI) [other pin definitions] )

If you now want to place this pad on pin number 54 of your package, you will find the subsection top
in the I/O file and edit the line for pin 54:

...
(iopad
...
(top
...
(inst name="ClkxCI_PAD" offset= 864.28 ) # pin no: 48
...
)
...
)

Be careful, do not modify the offset value while you are editing the I/O file. Since we use a fixed
bonding scheme for the power and ground pins, all we need to do is extract the instance names for
all our signal pads and place them by inserting within the appropriate inst name="" statement cor-
responding the OFFSET_VALUE which corresponds to the desired location. It is also recommended
to put the clock pin (if possible) to pin number 48. All new test boards will make sure that the pin 48
has the best signal quality.
Preparing the I/O file from scratch can be a lengthy and tedious task. To avoid unnecessary work
during this exercise we will start with an almost complete I/O file, but before doing so we will describe
the full procedure recommended when starting from scratch:

1. Start C ADENCE S O C E NCOUNTER and proceed to design import8 by selecting Design→\


Import Design. In this form make sure that the IO A SSIGNMENT F ILE is empty.
2. If everything works well, the design will be loaded. Now we can write out a template file that will
contain all the names of the pads. Use Design→Save →I/O File ... to save an I/O file
src/chip−sequence.io. You can select the SEQUENCE checkbox, however it is not imperative.
What we need is only the names of the pads.
3. Copy the template I/O file src/sample/chip.io−template to src/chip.io. As noted earlier, this file
includes all offset= statements, and all statements for corner and supply pads.
4. Using a text editor open the files src/chip.io and src/chip−sequence.io. You need to move the
PAD_NAMEs from the file src/chip−sequence.io to the correct positions in the file src/chip.io.
5. All entries for data pins in the template file are by default commented out using ‘#‘ character.
Do not forget to remove the comment character for the pads you are using.
8
Importing the design will be covered in detail in Chapter 4.

9
Student Task 3:
• Now, for this exercise you can start with the almost complete I/O file src/chip.io−incomplete\
instead of the template file. This file has all the pads placed properly with the exception of
the 16 pads of the input bus DataInxDI which are still missing.
Furthermore the file src/filter chip.sequence.io mentioned above has already been gener-
ated for you.
The desired I/O assignment is depicted in the figure below and can also be found in the file
src/filter chip.io.psa .
• Create the complete I/O file and save it as src/filter chip.io.
a
Postscript viewers were very common in the earlier days, you can use gv, kghostview, or evince to view this file

You can use the utility src/io2ps.pl to generate a postscript file from your I/O file. This utility will also
verify if you have used the correct offset locations in you I/O file, and will report errors. For best
results, you should also provide the Verilog netlist file, which will enable the script to make even more
checks.
sh > ./io2ps.pl filter_chip.io > filter_chip.pin_diagram.ps

The src/io2ps.pl utility uses a configuration file with the extension .pads. Per default the file src/io2ps\
.pads will be used. If you are planning to use the extended power scheme, you will have to add the
configuration file src/io2ps−ep.pads to the command as well.
DataOutxDO_PAD_12

DataOutxDO_PAD_13

DataOutxDO_PAD_14

DataOutxDO_PAD_15

DataInAckxSO_PAD

DataInReqxSI_PAD
RamTestxTI_PAD
ScanEnxTI_PAD
ResetxRBI_PAD

ClkxCI_PAD
pad_gnd_p4

pad_gnd_c4

pad_vcc_p4
pad_vcc_c4

56 55 54 53 52 51 50 49 48 47 46 45 44 43
42
1

pad_vcc_p1 pad_gnd_p3
41

DataInxDI_PAD_9 DataOutxDO_PAD_11
2

40

DataInxDI_PAD_8 DataOutxDO_PAD_10
3

39

DataInxDI_PAD_7 DataOutxDO_PAD_9
4

38

DataInxDI_PAD_6 DataOutxDO_PAD_8
5

37

DataInxDI_PAD_5 DataOutxDO_PAD_7
6

36

pad_gnd_c1 pad_vcc_c3
7

35

pad_vcc_c1 pad_gnd_c3
8

34

DataInxDI_PAD_4 DataOutxDO_PAD_6
9
10

33

DataInxDI_PAD_3 DataOutxDO_PAD_5
11

32

DataInxDI_PAD_2 DataOutxDO_PAD_4
12

31

DataInxDI_PAD_1 DataOutxDO_PAD_3
13

30

DataInxDI_PAD_0 DataOutxDO_PAD_2
14

29

pad_gnd_p1 pad_vcc_p3
15 16 17 18 19 20 21 22 23 24 25 26 27 28
DataInxDI_PAD_11

DataOutxDO_PAD_1
DataOutReqxSO_PAD
DataOutAckxSI_PAD
DataInxDI_PAD_10

DataInxDI_PAD_13

pad_gnd_c2

DataOutxDO_PAD_0
DataInxDI_PAD_12

pad_vcc_c2
pad_vcc_p2

DataInxDI_PAD_15

pad_gnd_p2
DataInxDI_PAD_14

10
3.2.3 Timing Constraints

Just as for synthesis, we need to specify timing constraints for the backend design with C ADENCE
S O C E NCOUNTER .
With decreasing process geometries the impact of placement and routing on timing, power, etc. is
steadily increasing. Therefore, timing analysis and optimization have become very important in order
to arrive at a layout that (still) satisfies all requirements.
As C ADENCE S O C E NCOUNTER supports most of the more common S YNOPSYS D ESIGN C OMPILER
commands/constraints it should be rather straight forward to create an appropriate timing constraints
file based on the constraints used for synthesis.

Student Task 4:
• There is an example constraint file src/sample/chip.sdc−sample that contains the most
commonly used commands along with many useful and important comments.
Copy this file to src/chip.sdc and modify it so that the following constraints get set (and
nothing else!):
– Define a 125 MHz clock
– Specify 3.5 ns input delay for all inputs
– Specify 5.0 ns output delay for all outputs
– Specify an input transition time of 0.8 ns at all inputs
– Specify a 15 pF output load for all outputs

3.2.4 Technology Files

The tech directory and the two subdirectories contain technology files that describe the technology
itself as well as libraries of standard building blocks implemented in this technology, i.e. standard
cells, pads, RAM/ROM.

• Technology files (UMCL180)


lef/header6 V55.lef Base technology description, defines metal layers, vias, spacing rules,
routing
umcL180.capTbl Table used to extract parasitic capacitances and resistances for signal and
power wires.
streamout.map Layer mapping table used when exporting the final layout in GDSII format.
• Library files (standard cells, pads, macro-cells)
lef/*.lef Physical description, shape and allowed orientation of cells, layer and shape of pins,
blockages, antenna information, ...
lib/*.lib Functional description, timing and power information, maximum load/fanout or transition-
time allowed, ...

11
3.2.5 Macro-cells

The macro-cells for the umcL180 process are created using dedicated memory compilers. The spe-
cific memory compiler we have access to is able to create five different types of macro-cells with
various capacities:

• SU180 : single-port static RAM


• SJ180 : dual-port9 static RAM
• SY180 : single-port register-file10
• SZ180 : two-port11 register-file
• SP180 : via programmable ROM

The following parameters are used for the macro-cells:


• words
Number of words in the memory
• sub-word size
Number of bits within a sub-word of the memory. The sub-word is the smallest unit used for
data access in the macro-cell12 .
• number of sub-words per data word
This parameter allows creating multiple sub-words. Each sub-word can be written to separately.
For example, A 32-bit RAM can be configured as having a single 32-bit sub-word, or two 16-bit
sub-words, four 8-bit sub-words and so on.
• column or block multiplexer
This parameter affects the geometry of the macro-block. This can have significant influence
on the performance of the macro-block. There is no general rule to determine this parameter.
Once the memory requirements are known, all possible geometries will be considered and the
most suitable one will be determined.
There are several available macro cells, their datasheets can be found under:

/usr/pack/designkits-1.0-ma/umc_L180/faraday/gen/memaker/200901.1.1/datasheet.dz

If none of the available macro-cells suit your needs more can be easily generated on demand. Please
contact the Microelectronics Design Center for this purpose.
Our example design uses a single-port RAM named SY180_2048X16X1CM8. This RAM has 2048
words of 16-bits each (single sub-word) and a block multiplexer of 8. All necessary preparations to
work with this macro-cell have already been done, so you do not need to do anything additional for
this exercise.

9
dual-port memories have two completely independent access ports. At the same time two separate memory addresses
can be accessed for both read and write.
10
Although the name suggests that the memory is made out of individual registers, it is very similar in design to SRAM.
11
In two-port memories, the read and write ports are separate, so you can simultaneously read and write. There are
timing constraints for reads and writes to the same address, please refer to the memory compiler manual for details.
12
In many places this sub-word is referred to as ’byte’. This might be slightly confusing, since a byte is commonly
accepted to be an information unit consisting of 8-bits.

12
4 Importing the Design

Student Task 5:
• Start C ADENCE S O C E NCOUNTER a either from your design directory by using cockpit
sh > cd ˜/training_1
sh >
sh > icdesign umcL180 &

• or from the encounter directory by issuing the command


cd ˜/training_1/encounter

cds_soc81 encounter

a
This exercise uses version 8.1 of the Cadence SoC Encounter . There are newer versions of these software,
however the main principles have not changed much so we will continue to use this version for this exercise,
newer versions have slightly changed GUI elements, and improved capabilities for some functions.

We will now import our design.


C ADENCE S O C E NCOUNTER uses a large configuration file that defines the design and technology
files to be loaded as well as some global settings to be applied.
Cockpit does automatically generate an appropriate sample configuration file src/sample/chip.conf
that should be used to start with.
Student Task 6:
• Copy the sample file into the src directory.
sh > cp src/sample/chip.conf src/filter_chip.conf

• Select Design→Import Design ... to open the design import form. This form con-
tains fields for all configuration options. At the bottom of this window, there are buttons
to load and save the configuration from/to a file. Use the L OAD ... button to load the
configuration file we have just copied to the src directory.
• On the B ASIC tab make sure that V ERILOG N ETLIST :, T IMING C ONSTRAINT F ILE : and IO
A SSIGNMENT F ILE : match your design. C OMMON T IMING L IBRARIES : and LEF F ILES :
should already be correct.
• On the A DVANCED tab the only setting you might want to adapt for your design is the
D EFAULT D ELAY P IN L IMIT : in the category D ELAY C ALCULATION. We will explain this later.
• Once you are happy with the configuration don’t forget to save your changes to the config-
uration file.
• Click O K to import your design. Monitor the messages on the console for errorsa .
Pay attention to the messages where the timing constraint files is loaded (“Reading timing
constraint file”) to see if everything was accepted! If there are errors, you need to fix them!
a
You can ignore warnings (SOCLF-58), (SOCLF-200), (TECHLIB-436), (SOCSYC-2), (EMS-27)

13
We are now in the floorplan view of C ADENCE S O C E NCOUNTER which displays an empty floorplan
with only the pads placed. All top level module(s) of the netlist are shown as a pink/purple square to
the left and all macro-cells to the right. Note that all standard cells are inside the module(s).

5 Floorplanning

Now we will have to decide how cells and macro-cells will be placed on our chip. This process is
called floorplanning. For a standard design, our main concern would be to find a floorplan that will
result in the smallest possible area, while fulfilling all performance and reliability requirements. This
is purely driven by economical reasons, since chip costs are mainly determined by the area. In some
cases there are additional geometrical constraints. The manufacturing company may impose certain

14
limits to the aspect ratio of the final layout13 , or even dictate the maximum height or width of the
layout.
Back-end design is not only used for complete chips. Macro-cells that will be part of a larger system-
on-chip design can also be designed in this way. In such cases there might be even more restrictions.
For example, certain metal layers might be reserved for the system level.
So the question is, “How small can my layout be so that I am still able to fulfill all specifications?”. As
a lower bound, you will need enough area to place all your I/O pads and standard cells. Ideally, in
terms of area (and assuming your design is not pad limited, see exercise 2), you will want to place
standard cells without leaving extra space in between, completely filling out the core area. This is
hardly ever possible because:

• The number of interconnections that can pass through a certain area is limited by the number
of metal layers available14 , wire width and minimum spacing requirements. Depending on the
interconnection overhead, the area above the cells15 may not be sufficient for routing.
• Timing is greatly affected by the placement of your cells. Placing them next to each other with
no space in between not leave the tool any flexibility in placing cells. This in turn reduces the
optimization options of the tool, like the ability to cluster cells that are closely interconnected.
• All designs require power routing for operation. Some wires of the power connection limit where
the cells can be placed, or restrict signal routing which in turn increases the area requirement.
• The majority of designs require a clock tree to function. This clock tree is added during the back-
end design. This requires additional area for the buffers used in the clock tree. Furthermore,
the clock tree synthesis algorithm can produce better results if it has more freedom to place its
buffers.
• Macro-cells, like the RAM in our example, usually require some extra space along the edges so
that they can properly be connected to power and signal lines.
• Designs that have a high switching activity require a lot of current for a short time which is
called a surge. The power distribution network may need additional decoupling capacitors to
store some charge that can provide some of the current of the standard cells during such a
surge. Additional space for these decoupling cells may be required during placement.

As a consequence, the standard cell rows (which form the core area) can not be filled completely
with standard cells, in other words there needs to remain some free space in between cells.
Utilization indicates to what amount the standard cell rows are filled. 100% utilization is the upper
bound where all cells are abutted and there is no extra space, while a utilization of 50% means that
half of the core area is empty.
Usually, it is not possible to predict whether or not it is possible to fulfill all requirements with a certain
utilization16 . You will have to try and find out. This is the main reason why back-end design is an
iterative process17 .

13
Especially in MPW runs, a lot of silicon area is wasted if all designs have wildly different dimensions.
14
For our technology there are 6 metal layers.
15
Cells in our technology use mostly the lowest metal layer Metal-1 and very rarely the Metal-2 for internal connections,
all other layers are free for routing.
16
Both placement and routing are separately NP complete problems, without completing the routing and placement
you will not know if it is possible to fulfill the requirements.
17
Obviously, technology plays an important role, and it is possible to give certain guidelines for a technology. However,
backend design is always highly dependent on the design itself. You will usually see in a few iterations what is possible
and what is not.

15
5.1 Semester Projects

The MPW provider used for the semester projects offers modules caled Mini Asic (mini@sic) with a
size of 1519.62 µm × 1519.62 µm. Therefore, the chip size for the semester project ASICs is fixed.
Please refer to the following web page to learn the details.
http://www.eda.ee.ethz.ch/index.php/UmcL180#Mini.40sic

As a consequence, we only have to make sure that our design fits on this area, and there is no need
to find the smallest possible layout. We may however need to constrain the core area to make it
smaller if the utilization is too low, since a spread out design has longer interconnections that may
adversely affect timing.

5.2 Sketching a Floorplan

Before we go on with C ADENCE S O C E NCOUNTER we need to make some planning and understand
some key concepts. The figure on the following page is an example floorplan (not an ideal one) that
shows the important concepts.
In C ADENCE S O C E NCOUNTER die area corresponds to the total silicon area available to place pads
(excluding bonding area for this technology) and core cells. For the semester projects this is strictly
limited to 1519.62 µm × 1519.62 µm. All pads (I/O, power and corner) are placed in what is known
as the padframe. The remaining area can be used for the core of the chip. For semester projects
the theoretical maximum for core area is 1239.38 µm × 1239.38 µm = 1.54 mm2 .
As can be seen from the figure, the core area is surrounded by a core power ring. In its simplest
form this consists of two (one for VCC, one for GND) wide18 metal lines that evenly distribute the power
all around the chip. In order to leave room for the power ring, we need to leave a certain I/O to core
spacing.
The standard cells are designed in such a way that, when placed next to each other their VCC and
GND pins can be connected with a horizontal power line. These horizontal lines are then extended
to the core power ring. These power connections are relatively narrow (0.76 µm in the technology
that we use) and run over the entire width of the core area. This could be a problem for designs that
consume much power, since the cells towards the middle would not have a good power connection19 .
To improve this, vertical power stripes that connect to the horizontal power lines can be added,
thereby forming sort of a mesh.
The core area is filled with standard cell rows on which later all standard cells will be placed. In the
same area we will usually also need to make room for our macro-cells. Most macro-cells need some
free space around themselves. This free space is required to make signal connections, add a block
power ring around the macro-cell or simply to prevent standard cells from being placed too close to
the macro-cell. We will define a block halo to specify this free space.

18
The width of the metal line depends on the amount of current drawn from the line, you will be able to judge this
better after exercise 3 which is dedicated to estimating the power consumption. We will mostly use a width of 20 µm,
since this is the widest metal that can be manufactured without slotting (wider metal lines require slots/holes which
break up the metal shape).
19
The problem is that if much current is drawn, there will be a significant IR drop along the power lines. The cells
in the middle will be supplied with a lower VCC than the ones on the sides. This could dramatically effect the
performance of the system.

16
When placing a macro-cell, you should also take into account where the power and signal pins of the
block are located and what metal layer they are on. Often signal connections are only on two edges
and you want them to face the core and not the I/O pads.
Now, when we consider all the above, the core area that remains free to place core cells on is much
smaller than the 1.54 mm2 that we started with. Our example design has a total cell area (including
RAM) of 0.82 mm2 and should therefore comfortably fit into the designated area.

1519.62 µm

VDD
GND

Power Stripe
Standard Cell Power Connections

Block Power Ring


Standard Cell Row

Macro Cell

1239.38 µm
(RAM)
Standard Cells

I/O and Corner Pads


Placed on the Padframe Block Halo
Block Power Connection

I/O to Core
Core Power Ring Power Pad Connections Spacing

5.3 Initialize Floorplan

We are now ready to proceed with C ADENCE S O C E NCOUNTER .

Student Task 7:
• From the menu select Floorplan→Specify Floorplan.... A large window will open.
• Select the D IE S IZE BY: W IDTH AND H EIGHT option and make sure that both values are
1519.62.
• Now we need to specify the I/O to core spacing by filling in the four values under the C ORE
M ARGINS BY: entry. There must be sufficient room for the power ring around the core area.

17
Larger values will reduce the area available to place the core cells thereby increasing core
utilization.
As noted earlier, some iterations are usually required to find optimal values for a particular
design.
• In this exercise we will assume that we will use one VCC and one GND line of maximum
width 20 µm. We need some extra space between the lines and, for the moment, we can
start with a distance of 50 µm for all sides and click on OK.

The floorplan should now look like shown in the screen-shot below. Note that the pads are all placed
at their proper locations as the I/O file used during design import specifies absolute locations and we
made sure that the die size stays fixed to the proper size during the initialize floorplan step.

Student Task 8:
• Next we need to place the RAM macro-cell. Change the cursor mode to M OVE /R ESIZE /R E -
SHAPE by selecting the appropriate icon (next to the ruler icon) or use the keyboard shortcut
’SHIFT-R’. Now you can select the RAM macro-cell and drag it to any location you like. The
blue lines displayed are so called flightlines that show where the signal connections to the
block are.

You can change the orientation of the RAM by either using Floorplan →Edit Floorplan→\
Flip/Rotate Instances ... (or press ’r’), or with the attribute editor (press ’q’). Note that the
RAM macro will completely block Metal-1, Metal-2, Metal-3 and Metal-4. Only Metal-5,
Metal-6 will be available for routing over the RAM macro-cell20 .

20
By default, the internal structures within a cell or block are not displayed. You need to make “Cell Blkg” visible to
see the so called blockages within a cell.

18
5.4 Power Planning

The next step is to create the power distribution network.


The Verilog netlist that we started with does not contain any power connections, therefore we need
to create this connectivity now. We have to connect the power/ground pins of all instances to the
respective global power/ground net that was specified on the D ESIGN I MPORT form (category P OWER
on the A DVANCEDtab)21 .
This can be done using the Floorplan →Connect Global Nets ... form or you can use the
globalnet.tcl script provided.

Student Task 9:
• Execute the script provided by typing on the command line of C ADENCE S O C E NCOUNTER
(not GUI):
enc > source scripts/globalnet.tcl

21
There is also a special rule required if there are logic one/zero values 1’b1/1’b0 instead of TIE1/TIE0 cells in your
netlist. You should however not have such logic values in your netlist.

19
Next we will add the core power rings that distribute power all around the core.

Student Task 10:


• Select the menu Power →Power Planning →Add Rings.... A large window will ap-
pear. The N ET ( S ) field on the top defines for which nets rings will be created. The default
is to create power VCC as well as ground GND rings.
• In the R ING C ONFIGURATION section you can specify on what layers the ring segments will
be created. Select metal5 H for TOP and B OTTOM and metal6 V for L EFT and R IGHT.
Specify W IDTH as 20 µm, S PACING as 1.5 µm and O FFSET as 4 µm and click O K.

There are many alternative power distribution schemes that can be used. The one that we have
chosen here is a very simple one. We have selected the upper metal layers Metal-5 and Metal-6
for the ring, because in this technology Metal-6 is thicker and consequently has less parasitic
resistance which is desirable for power distribution.
For your own designs, you should perform a power analysis (topic of Training 2) to find out the best
power distribution approach that matches your design.
The width has been chosen as 20 µm for convenience reasons. Basically the wider the power con-
nection, the better. But as already mentioned earlier, in this technology, metal lines wider than 20 µm
need to be slotted (’stress relief slots’) which requires extra effort. As an alternative to slotting it is
also possible to create several smaller parallel rings, e.g. two VCC and two GND rings.

20
S PACING determines the distance between the two nets and O FFSET determines the distance be-
tween the core area and the innermost ring.
We also need a (partial) ring around the macro-cell, you will see later why this is necessary.

Student Task 11:


• Select the menu Power→Power Planning →Add Rings... just like before. This time
in the R ING T YPE box, select B LOCK RING ( S ) AROUND. You can leave the selection at
E ACH BLOCK since we have only one block anyway.
C ADENCE S O C E NCOUNTER is usually smart enough to create wires only on the edges
where no power lines are yet, i.e. to not create new wires on top of the core ring.
• If this fails you can specify the segments and connections you want on the A DVANCED tab.
• Fill in the values/settings similar to that of the A DD R INGS and click on O K.

At any point if you wish to delete part of the floorplan you can:
• use the U NDO feature by simply pressing ’u’
• select and remove objects of a specific class (press ’d’)
• use the menu option Floorplan →Edit Floorplan →Clear Floorplan...
• select an object and hit the ‘Del’ key on the keyboard

Student Task 12:


• Also, you can save or load (restore) your floorplan at any time using the menu Design
→Save →Floorplan ... and Design →Load →Floorplan ... respectively.
• Save your floorplan to the save directory.

At this point power is to the standard cells arrives from the sides. Especially for fast designs the
standard cells in the middle of the standard cell row will not receive sufficient power it is important to
add vertical stripes to improve the power distribution.

Student Task 13:


• Select Power →Power Planning →Add Stripes ....
The S ET C ONFIGURATION part of the window defines the properties of one stripe set.
The S ET PATTERN part defines how many stripes will be added. We can either choose to
insert a fixed number of sets or only specify the distance between two sets S ET- TO - SET
DISTANCE :

• In the F IRST /L AST S TRIPE part, we select R ELATIVE FROM CORE OR SELECTED AREA. Add
to X FROM LEFT and X FROM RIGHT a value stripe sets in such a way that the standard cell
rows get divided into three equally long pieces. See the screen shot for width, spacing and
layer. Note: You can fine tune this later by moving the stripe sets.
• By default stripes will continue over macro cells. To prevent this, select the O MIT STRIPES
INSIDE BLOCK RINGS option in the S TRIPE B REAKING section of the A DVANCED tab.

21
It is rather easy to move wires in C ADENCE S O C E NCOUNTER . Click on the move wires button (or
press ’m’), select the wires you want to move, and drag them to their new location. C ADENCE S O C
E NCOUNTER will make sure that electrical connections remain intact. If you want you can use this to
fine tune the stripe placement.
We still need to define a block halo for the RAM macro-cell. This is necessary to keep standard cells
from being placed to close to the RAM and also to avoid problems when routing the power lines of
the standard cell rows.
The figure below illustrates one common problem with the block halo.

Terminated Power Line (good)

Standard Cell Row

Dangling Power Line (bad) Macro-Block

Standard Cell Row


Power Rails

Block Halo

22
In this figure, only two standard cell rows are shown. The block halo around the first row extends far
enough to cover the two power lines22 . This is like it should be.
For the second row, the block halo does not cover the power rails, and when making the power
connections C ADENCE S O C E NCOUNTER will try to extend the power connection past the power
rails as shown in the figure. This leaves a dangling power line23 . While this will not render your chip
useless, it should be avoided.

Student Task 14:


• From the menu select Floorplan →Edit Floorplan →Edit Halo.... A window will
appear, where you can specify a keep-out zone for routing and/or placement around the
macro-cell.
Usually we only need a Placement Halo. The size will depend on your power routing/floor-
plan.
• Create an appropriate Placement Halo.

Notice that the I/O pads are placed with some distance between them24 . At some point in the design
flow we need to close the gaps between the I/O pads in order to complete the supply rings that run
around the core (within the pad cells) and are required to supply the circuitry within of the pad cells.

Student Task 15:


• Instead of using wires, we will place so called filler cells that completely fill the gaps and
establish the required connectivity.
There is a script that will automatically insert matching filler cells. Type the following in the
C ADENCE S O C E NCOUNTER console window
enc > source scripts/fillperi.tcl

22
This is just for illustration. It is not possible to draw a block halo that has this (L) shape.
23
This sort of dangling wires are known as geometry antenna in Cadence SoC Encounter
24
This is due to the contraints set by the company that bonds the chips. They specify that the minimum distance
between two adjacent pads can be 90 µm. Since even a core-limited pad in this technology is roughly 60 µ wide, we
need to place them with gaps in between.

23
Now we need to finalize the power connections of the chip. The following connections still need to be
made:

• The core ring needs to be connected to the core supply pads (VCC3IOD and GNDIOD).
• All standard cells need to be connected to VCC and GND lines.
• All macro-cells need to be connected to VCC and GND lines.

Student Task 16:


• Select Route →Special Route ... from the menu. SRoute is the special net router,
and is only used to make power connections.
The R OUTE : part contains the different connection types we have listed above. B LOCK
PINS are macro-cell power connections, PAD PINS are the connections from the core supply
pads to the core ring. We will not need PAD RINGS since we have already used filler cells to
complete these rings. S TANDARD CELL PINS will add power lines to the standard cell rows.
Finally, if you still have stripes that are not connected to power (not very likely) you can use
the S TRIPES ( UNCONNECTED ) option.
• While it is possible to route all connections at the same time, it is strongly recommended to
do it one by one:
1. Start with PAD PINS. If nothing happens you have most likely forgotten to source the
globalnet.tcl script.
2. Route B LOCK PINS. Check the result, did the router connect the macro-cell the way
you wanted? If not you may need to study the A DVANCED tab of the SRoute window.
If all fails you can edit the connections manually.
3. Route the S TANDARD CELL PINS. This should create many horizontal Metal-1 lines
that connect to the rings and stripes. Look for dangling wires around the block halo
(adjust the block halo if necessary).

We are now finished with floorplanning. Your floorplan should look similar to the following screen
shot.

24
6 Placement

We will now start with the placement of the standard cells in the core area. Placement is a very
computation intensive problem, and mostly heuristic algorithms are used for this purpose.

Student Task 17:


• Select Place →Standard Cells.. ....
We want run a full placement and not an incremental or just the quick prototyping one.
I NCLUDE P RE -P LACE O PTIMIZATION however is very useful as it removes all buffers/invert-
ers trees from the netlist which will help us for timing analysis as you will see later.
• To set advanced options click M ODE. Set C ONGESTION E FFORT to L OW and deselect RUN
T IMING D RIVEN P LACEMENT as timing driven takes much longer and might not help that
much to improve timing. There are several other options that you can set, but at this time
we will leave them as they are. Apply the changes by pressing OK
• You will come back to the placement window seen below, click OK to start placement. This
may take some time.

We have to warn you about the various performance related options such as C ONGESTION E FFORT
and RUN T IMING D RIVEN P LACEMENT above. In the exercises sometimes we will advise you to use
certain settings for these options in order to reduce runtime, or because for this particular design
we have found out that a particular option gives better results. When you do your own designs, you

25
should consider evaluating which options are better suited rather than copying all options from this
exercise.

For each standard cell, the placement algorithm will try to find the optimum location so that there is a
feasible routing solution and the total length of the connections is minimized.
Examine the placement by using the design browser (switch to the physical view). You will notice that
standard cells within the same entity are mostly placed next to each other.
The available space and the placement of macro-cells and I/O pads can have a great influence on
the placement of standard cells. Even though more space seems to be a good idea, too much
space sometimes results in placements where the average distance between standard cells and
consequently the delays caused by wire capacitance/resistance become larger. Only experience and
several iterations will allow you to find a placement for your circuit that is close to optimal.
Note: Visibility of S PECIAL N ET is turned off in the next screen shot.

26
The results for placement (and later routing) are strongly design dependent. For example, structures
with many interconnections such as look-up tables will usually need much more space than synthesis
predicted as the cells need to be spread out in order to have enough space to route all the intercon-
nections. This is why generalizations for back-end design, such as ”During back-end design, your
circuit area will increase by 10%” don’t work very well.

Student Task 18:


• Let us save the entire design with Design →Save Design As →SoCE. This will save the
configuration file, netlist, floorplan, special route, placement and routing files as well as the
current mode, options and preferences. A design saved in this way can be restores using
Design →Restore Design ... →SoCE.
The space required is surprisingly small as most files are compressed and the library files
do not get saved along with the design.
• Remember to save under the save directory.
Alternatively you could also just save the placement. Select Design →Save →Place \
....

During synthesis, S YNOPSYS D ESIGN C OMPILER assigns constant logic values to two special stan-
dard cells named TIE0x and TIE1x, where x is a drive strength modifier. This creates a small
inconvenience, as often one of these cells is assigned to drive many outputs at the same time, creat-
ing relatively long interconnections.
There is sufficient place on the chip to place several of these cells. We will use a script that first
removes all these cells. Then we will set the rules for placing these cells. The example script scripts\
/tiehilo.tcl sets the maximum number of connections driven by a single cell to 20, and the maximum
distance between the pin and the tie cell to 250 µm. And finally we insert the tie cells according to
the rules we have defined.
Student Task 19:
• At the command line type:
enc > source scripts/tiehilo.tcl

7 Timing

The synthesis tools we currently use for HDL synthesis (S YNOPSYS D ESIGN C OMPILER) are not
aware of any instance placement information. Therefore the interconnects can only be estimated
based on a statistical model, i.e. the fanout of a net determines its length, capacitance, resistance and
area. Now that the placement and even trial-routing is available the timing might differ considerably
from the numbers obtained from S YNOPSYS D ESIGN C OMPILER.

7.1 Analysis

C ADENCE S O C E NCOUNTER has a practical timing analysis function, where you only have to specify
the state of the design (see below) and the A NALYSIS T YPE (Setup or Hold) you want to run.

27
Pre-Place design is not placed
Pre-CTS design is placed but clock tree is not yet inserted
Post-CTS design is placed and the clock tree is inserted
Post-Route design is placed and routed
Sign-Off will use extra tools for even more precise analysis. We will not use this as these tools are
not installed/setup.

Depending on this state, trial route (a very simple, but fast routing) and/or parasitic extraction might
be run automatically prior to the timing analysis. This will improve the accuracy and help to avoid
unnecessary iterations.

Student Task 20:


• Open Timing →Analyze Timing and make sure P RE -CTS and S ETUP is selected.
• Start the timing analysis by clicking O K.

Note: You could also do this from the command line with
enc > timeDesign -preCTS

As the design is not routed, C ADENCE S O C E NCOUNTER will perform trial route and parasitic extrac-
tion before doing the timing analysis. A short summary will be displayed on the console (the actual
numbers may differ slightly):

+--------------------+---------+---------+---------+---------+---------+---------+
| Setup mode | all | reg2reg | in2reg | reg2out | in2out | clkgate |
+--------------------+---------+---------+---------+---------+---------+---------+
| WNS (ns):| -9.069 | -6.554 | -9.069 | -0.686 | -7.328 | N/A |
| TNS (ns):| -2684.3 | -1776.9 | -2392.1 | -1.172 | -43.761 | N/A |
| Violating Paths:| 861 | 732 | 454 | 7 | 6 | N/A |
| All Paths:| 1807 | 1342 | 817 | 18 | 6 | N/A |
+--------------------+---------+---------+---------+---------+---------+---------+

+----------------+-------------------------------+------------------+
| | Real | Total |
| DRVs +------------------+------------+------------------|
| | Nr nets(terms) | Worst Vio | Nr nets(terms) |
+----------------+------------------+------------+------------------+
| max_cap | 187 (187) | -3.774 | 188 (188) |
| max_tran | 368 (13826) | -8.333 | 387 (13867) |
| max_fanout | 0 (0) | 0 | 0 (0) |
+----------------+------------------+------------+------------------+

Density: 59.566%
Routing Overflow: 0.00% H and 0.25% V
------------------------------------------------------------

The summary gives a very good overview of the current design timing. Some explanations:
• The analysis was run in setup mode, i.e. setup time checks were performed but no hold time
checks.

28
• The columns contain numbers for all path in the design (ALL) or for specific path groups, e.g.
reg2reg for all register to register paths.
• Worst negative slack (WNS) reports the slack for the most critical path. Negative numbers
mean that the constraints are violated by this value.
• Total negative slack (TNS) is the sum of WNS for all violating paths. Together with the number
of violating paths this figure helps to see how severe the violations are.
• Real/Total DRV show (electrical) design rule violations, some libraries have a maximum tran-
sition time for all nets. The report above shows that 370 nets have a transition violation (the
signal takes too long to change from logic-1 to logic-0 or vice versa). In addition 135 nets have
a maximum capacitance violation (the total amount of capacitance driven by a net exceeds the
limit set by the design library). These violations are mostly related to excessive parasitic capac-
itance due to interconnections, and generally cause timing violations as well. However, even if
a DRV does not cause a timing violation it needs to be fixed.
• D ENSITY and R OUTING OVERFLOW show the placement utilization and routing resources, i.e.
are a measure for the feasibility of the current floorplan/placement.
Remark: Refer to exercise 4 of VLSI I25 if you have problems with timing concepts.
The summary looks really terrible. Obviously we have many timing violations that we need to have a
closer look at, before we try to optimize the timing with C ADENCE S O C E NCOUNTER .
Here are some important points to consider when doing so:

• The timing depends entirely on the constraints you have specified in the file src/chip.sdc. The
most common mistake is to have errors in this file. Before you go any further make sure that
your timing constraints are correct.
• Make sure to not accidentally use constraints that were written for the core level (chip without
pads) at the chip level (with pads) and vice versa. The pads affect the I/O timing quite a bit and
the drive capabilities of a standard cell and an output pad are entirely different, i.e. set_load
needs to be very different.
• Inputs and outputs used for test and debugging may cause timing violations. Most of these
signals are not dynamic (they are not toggled during normal operation) and the timing paths
originating from these inputs or ending at these outputs should be ignored, i.e. left uncon-
strained or explicitly disabled.
• To speed up delay calculation C ADENCE S O C E NCOUNTER does not compute the timing of
nets with a fanout above a certain limit but rather swaps in predefined values for delay, capaci-
tance and transition time. All these numbers are specified on the D ESIGN I MPORT form on the
A DVANCED tab in the ”Delay Calculation” category. As a result you will not see the real timing26
of these net in timing analysis and furthermore optimization will not see (and therefore not fix)
violations27 on these nets. However, this is usually the desired behavior as we give these nets
a special treatment anyway (with CTS).

25
You can access the exercise descriptions, files, and solutions under /home/vlsi1/u4.
26
To see the real timing you can change the limit on-the-fly from 1000 to a very high value in the console with
setUseDefaultDelayLimit 100000. More on this topic later.
27
DRV violations will be fixed but no setup/hold violations. Clock nets are even more special, also no DRV fixing will
be done there.

29
Let’s now examine the detailed reports that were generated by timing analysis and can be found in
the timingReports folder. Each analysis produces multiple files. Among these there are three files
dedicated to design rule violations (max capacitance: *.cap , max fanout: *.fanout, max transition
time: *.tran violations), and separate *.tarpt timing analysis report files for different path groups
(in2out, in2reg, reg2reg, reg2out)

Student Task 21:


• Where do the violating paths in the in2out path category start?
• Where do the violating paths in the in2reg path category start?
• Do the paths in reg2out and reg2reg look like normal path that should be optimized to
meet timing or is there something wrong?
• Why are the reg2reg paths too slow? Look for large numbers in the Delay column and
check the drive strength of the corresponding cell.

There are several different problems in the .sdc file that we have used. First of all, two of our inputs
should not be considered for timing analysis28 . We also have several nets (clock, reset and scan
enable) that we will take care of separately (using the clock tree synthesizer, which we will see later).
These nets will show up in the DRV reports. We do not want to solve timing related problems for
these nets (since they will anyway be solved later), the time and effort required to optimize these nets
could prevent other parts of the design to be optimized.
We can use the D EFAULT P IN L IMIT feature of C ADENCE S O C E NCOUNTER to stop C ADENCE S O C
E NCOUNTER from extracting timing information (and reporting timing violations) for the nets that we
will be optimizing later on. By default the pin limit of C ADENCE S O C E NCOUNTER is set to 1000. In
our case this number is too high (we have slightly more than 400 flip flops in our design).

Student Task 22:


• Let us see the nets which have a large fanout. Report all nets with e.g. more than 400 pins.
Use the console command:
enc > report_net -min_fanout 400

• Now set a suitable limit with the command


enc > setUseDefaultDelayLimit <number>

so that the high fanout nets will not be considered for timing. Also make the neces-
sary changes to the timing constraints file src/chip.sdc to disable the offending input-
ports. Reload the timing constraints by selecting the menu Timing →Load Timing \
Constraint ....
• Then rerun timing analysis.

If you have done everything correct, the only setup violations should be in the path group register-
to-register and register-to-out. There should no longer be pins that belong to scan enable or reset
network in the transition time violation report.

28
Cadence SoC Encounter provides a special timing calculation mode that is called Multi-Mode Multi-Corner
Analysis (MMMC). In this mode it is possible to define several scenarios (i.e. separate test and functional modes).
The setup for MMMC is slightly involved and will not be covered as part of this exercise.

30
7.2 Optimization

In order to (better) meet the constraints, C ADENCE S O C E NCOUNTER can try to optimize the design
at every stage of the design process. In our case, the worst setup time violation is about 5.8 ns (for
a 8 ns period), although the netlist delivered by the synthesis tool had no timing violations. This is
due to differences in interconnect parasitics between the two tools. While the synthesis tool relies on
an estimate (statistical model based) C ADENCE S O C E NCOUNTER can use the real placement and
(trial-)routing at hand. Consider the following line from a timing report (broken down over many lines
for readability)

Path 1: VIOLATED Setup Check with Pin i_filter_top/u_filter/u_filter_stage_5/


RegxDP_reg_42_/CK
Endpoint: i_filter_top/u_filter/u_filter_stage_5/RegxDP_reg_42_/D (ˆ) checked
with leading edge of ’ClkxCI’
Beginpoint: i_filter_top/u_ram_wrapper/i_ram/DO5 (ˆ)
triggered by leading edge of ’ClkxCI’
Path Groups: {reg2reg}
Other End Arrival Time 0.000
- Setup 0.149
+ Phase Shift 8.000
= Required Time 7.851
- Arrival Time 14.405
= Slack Time -6.554
Clock Rise Edge 0.000
= Beginpoint Arrival Time 0.000
Timing Path:
+----------------------------------------------------------------------------------------------------------+
| Instance | Arc | Cell | Slew | Load | Delay | Arrival |
| | | | | | | Time |
|-----------------------------------+---------------+--------------------+-------+-------+-------+---------|
| | ClkxCI ˆ | | 0.000 | 1.828 | | 0.000 |
| ClkxCI_PAD | I ˆ -> O ˆ | XMD | 0.000 | 0.000 | 0.000 | 0.000 |
| i_filter_top/u_ram_wrapper/i_ram | CK ˆ -> DO5 ˆ | SY180_2048X16X1CM8 | 0.130 | 0.033 | 1.750 | 1.750 |
| i_filter_top/u_ram_wrapper/i_test_| A ˆ -> O ˆ | MUX2 | 8.441 | 1.874 | 3.973 | 5.722 |
| bypass_mux5 | | | | | | |

The last line reports an standard cell instance MUX2 with low driving capability (2) that has to drive a
big load on its output (1.876 pF). The propagation delay is therefore huge (3.98 ns).
The timing of the same cell as reported by synthesis are: Delay: 0.15 ns, Slew: 0.09, Load: 0.01.
While this is an extreme case you see how synthesis can be wrong without knowing the actual
placement and wire loads.

Student Task 23:


• Open the optimization form by selecting Timing →Optimize ....
D ESIGN S TAGE needs to be set to the current design stage. Some options are only available
for certain stages, e.g. hold time optimization can not be performed during PRE -CTS as it
doesn’t make much sense.
Timing is not the only thing that can optimized. Most technologies specify design rules
like maximum transition time, maximum capacitance driven by a certain cell or maximum
fanout.
• After pressing the M ODE button, within the T HRESHOLDS section you can find options that
can be used to tighten the constraints in order to get some margina .

31
• Set the options as shown in the figure below and hit OK. Watch the progress of the op-
timization in the console window. C ADENCE S O C E NCOUNTER is very verbose with its
actions.
a
Cadence SoC Encounter will already automatically add a small margin on its own (internally)

During optimization C ADENCE S O C E NCOUNTER can select different drive strengths for cells, add/re-
move buffers and inverters, move instances or even restructure part of the logic (just like synthesis
does).
Optimization is done using iterations of timing analysis, optimization, trial-route and parasitic extrac-
tion.
As a last step C ADENCE S O C E NCOUNTER performs a timing analysis on the optimized design,
prints the summary to the console and writes the detailed reports to the timingReports directory.

Student Task 24:


• Take a look at the summary and the final reports generated. There should be no violations
left.

But what happens if we can not fix the violations with optimization? Again, first make sure to under-
stand what your constraints are and why they are violated. Often there are errors in converting the
design specifications to constraints (is the input delay really 3.5 ns? Also for this pin?) and describing
them properly with the commands available. If you still have problems, there are three levels where
you can reach a solution:

• Optimization during backend design (C ADENCE S O C E NCOUNTER )


C ADENCE S O C E NCOUNTER can optimize the design at every stage of the design process. In
general, the earlier the stage, the more changes can be done, e.g. P RE -CTS optimization has
much more flexibility than P OST-R OUTE optimization. At the P RE -CTS stage registers can be
moved and resized, this will no longer be possible after clock tree insertion. On the other hand,
the parasitic interconnect information is much more accurate with later stages of design, so the
timing information (and hence the optimization goals) will be more accurate.
We can (re)run the optimization at various stages, try a new placement or even start with a
new floorplan. It is impossible to give general guidelines, you will have to see what works best
for your design. If you are far from meeting your target (e.g. for a 10 ns clock, if after all
optimizations you still have a timing violation of 2 ns), you may need to go back to synthesis.

32
• Optimization during synthesis
Once you have tried to place and route a netlist you will get a better idea about the relationship
between synthesis results and back-end results (area and timing wise). You may use this
information to adjust the timing constraints and re-synthesize the circuit.
• Architectural optimizations
If nothing else helps, you will have to modify your architecture. During this iteration you will have
a much better idea about what is critical for your circuit.

If all of the above fails, you will have to see if the specifications could be changed.

Student Task 25:


• Your design has changed considerably as the optimization algorithms have modified the
netlist and placement. Save it by using Design →Save Design As.

8 Clock Tree Insertion

The fan-out of a net refers to the number of inputs driven by a particular output. High fan-out nets
(that drive hundreds or even thousands of inputs) need to be handled differently from standard inter-
connections. Note: For timing analysis we did adjust the pin limit (setUseDefaultDelayLimit) in
order to treat them differently.
Every synchronous circuit has at least one high fan-out net, namely the clock net. For most circuits
reset and scan-enable signals have to be distributed to each and every flip-flop as well.
The main problem with high fan-out nets is the large load capacitance that needs to be driven. Each
driven input adds its own input capacitance to the total load capacitance and in addition, the intercon-
nection required to distribute the signal to all these inputs increases the load capacitance further.
There are three important parameters for such nets:
Transition time This is the time it takes to change the logic level of a node (e.g. 0 → 1). Basically,
the more load an output has to drive, the more time is required to charge this load. CMOS
drivers consume additional short circuit current during the transition, therefore long transition
times are not very welcome. Furthermore, noise on signals with long transition times can result
in glitching. Most libraries set an upper limit for the transition time (for the technology we are
using this is 1.79 ns for typical libraries). To lower the transition time, a tree of buffers can be
inserted so that the total load is shared between the buffers. The lower the desired transition
time, the more buffers are required.
Insertion delay The time required for the signal to travel from the driver to the end-points. This delay
is usually different for each end-point. Each level of buffers in the buffer tree will add a delay to
the signal.
Skew The difference between insertion delays of different end-points. To minimize skew, a balanced
buffer tree has to be built. Generally, the lower the desired skew the more buffers are required.
What parameters are most important depends on the type of net:

33
Clock Our main concern is to reduce the skew, since it will effect our timing. The maximum skew
depends on the clock period. As an example, for a 20 MHz clock a clock skew of 0.5 ns is
acceptable. But for a 200 MHz clock, the same skew equals to 10% of the clock period and
would be to high.
If you over-constrain your skew, you will need a deep (and large) clock tree and your insertion
time will rise, which will affect your input and output timing. Therefore you will want to balance
the skew against insertion delay and the number of buffers. Constraining maximum insertion
delay too low will usually degrade results.
Usually, a tree that gives you an acceptable skew will also give you a decent transition time, so
you don’t have to worry about that.
Reset We are interested in propagating the reset within one clock cycle to all flip-flops in our design.
For designs with on-chip reset synchronization this is strictly required. The insertion delay
should therefore be less than the clock period, transition times within the bounds imposed by
the technology and skew doesn’t matter at all.
Scan Enable Very similar to the reset signal. Usually a slower clock is used for scan testing, therefore
we can allow even a larger insertion delay. For transition time and skew the same holds true as
for the reset.

Sink Tran

Buf Tran

Sink Tran

Buf Tran
AutoCTS
Root Pin
Sink Tran

Buf Tran

Sink Tran

Min Delay

Max Delay

Max Skew

In C ADENCE S O C E NCOUNTER , clock tree synthesis (CTS) is used to generate optimized buffer
trees to drive high fan-out nets. It can be configured to satisfy a variety of constraints.

Student Task 26:


• A sample clock tree synthesis configuration file can be found under src/sample/chip.ctstch\
−sample. The sample file contains three different configurations for a clock, a reset and a
scan enable signal.
• Copy this file to the src directory and adapt the ’AutoCTSRootPin’ statements to match
your design.
• For educational purposes, change the clock tree specifications as follows: max. skew
0.2 ns, max. insertion delay 4 ns, max. transition time at buffers 0.6 ns and at clock pins
0.4 nsa

34
Take a closer look at the other two trees too.
a
It is usually not a good idea to specify a small max. insertion time such that this becomes a limiting factor for
CTS. Results may degrade significantly and for most designs the insertion delay is not very important anyway.

If the design employs a reset synchronization register (the example design has one) the source of
the reset tree must be the output of the synchronization register. Note that there is a special option
named SetASyncSRPinAsSync YES for the reset tree definition. This allows set and reset pins to
be considered as targets for the clock tree optimization.
The scan-enable signal is also a special case. Normally the clock tree synthesis algorithm starts at
the AutoCTSRootPin and traces through the netlist in order to find valid endpoints. Per default,
combinational gates will be traced through and clock and asynchronous input pins of sequential
elements (flip-flops) will be stopped at.
By specifying the NoGating rising option, we can make the tracer stop at the first gate encoun-
tered. This is necessary since the scan enable signal is often connected to multiplexers and we want
their input pins to be endpoints. Once this option is underway you need to specify the internal pin of
the pad driving the scan-enable signal, otherwise tracing will stop prematurely at the pad cell.

Student Task 27:


• Read in the clock tree specification by selecting Clock →Design Clock ... from the
menu. Using the browser select the clock tree specification file you have just modified.
Press L OAD S PEC. DON’T PRESS OK yeta . You should now see a summary for all three
clock specifications on the console, check it.
• Our netlist may have some buffers on the high fan-out nets we want to build trees on. We
need to remove them prior to CTS with the following command:
enc > deleteClockTree -all

a
Pressing OK will start the clock tree insertion. We need to make sure that the clock tree specification is correct
before we go ahead with this step. If you accidentally pressed OK here, it is advised to restart from the last
saved point.

A large number of errors can be discovered by analyzing the pins connected to these nets, even
before building a clock tree.

Student Task 28:


• Select Clock →Trace Pre-CTS Clock Tree .... To start the trace, click on the icon
on the top left and accept the default trace file name. A summary will be displayed on the
console and the content of the trace file visualized in the GUI.

35
We can see how the trees currently look like and what pins are connected to them. Look also at the
trace file directly. Things to look for include:

• Clock, reset, or scan-enable connecting to unexpected input pins, e.g. the reset signal should
not connect to pins other than asynchronous set/reset pins of sequential elements.
• Unexpected latches on the clock tree can be discovered this way (G or GB pin).
• Discrepancy between the number of endpoints of clock, reset and scan trees. For our example
numbers are as follows:

– clock tree: 443 with 442 flip-flop CK pins + 1 RAM CK pin


– reset tree: 441 flip-flop RB pins
– scan tree: 447 with 441 flip-flop SEL pins + 6 mux S pins, to choose between the functional
and test (scan chain) output signal.

As we see, 442 flip-flops are clocked but only 441 receive a reset signal, this is due to the reset
synchronization register being connected to the external reset signal rather than the internal
reset tree. As the reset synchronization flip-flop is also not on the scan chain and we use full
scan otherwise the 441 flip-flops on the scan tree match perfectly. You get the idea...

Student Task 29:


• Open the file chip.cts trace and search for Clock Tree to examine the leaf pins.
• If everything looks OK we can proceed with clock synthesis. In the S YNTHESIZE C LOCK
T REE form press OK.

After a few minutes clock tree synthesis will be completed. Detailed reports will be generated under
the directory specified on the form (most likely clock report). This directory includes a simple report
file (clock.report).

36
A summary report is also displayed on the C ADENCE S O C E NCOUNTER console. The first column
shows the achieved performance while the second column reports the target specified in the config-
uration file.
Student Task 30:
• Check your results (summary and detailed reports). How many buffers were added? How
many levels created? What’s the insertion delay? Are all constraints met?
Note 1: You will get a max transition time violation on ClkxCI_PAD/I which can safely be
ignored. As we have specified an input transition time of 800 ps on all primary inputs there
is no way CTS could fulfill the 600 ps requirement at this point.
Note 2: Unless the ‘‘RouteClkNet YES’’ option was used (more on this later), the
timing figures reported are only estimates and might change quite a bit with detailed routing.

9 Timing Revisited

At this point we will have to go into some more detail about timing. During different stages of the de-
sign flow, we have slightly different timing constraints (Refer to the following figure for the differences
in the three stages).
a) synthesis initially the design does not contain any pads. The input delay tidel and the output delay
todel should contain the contribution of the input tinpad and output toutpad pads.
b) pre-CTS during placement and routing phase, all required I/O pads and drivers will be present.
At this stage there is no clock tree present. The timing should be adjusted, as at this moment
the input delay tidel and output delay todel no longer include the pad delays.
c) post-CTS once the clock tree is inserted, the timing will change slightly again. Due to the clock
insertion delay tdi the internal clock will be slightly offset when compared to the external clock.
At the input, the data travelling towards the first flip-flop inside the chip, will have more time,
since this flip-flop will be trigerred by a clock signal that has been delayed by tdi . At the output
however, the data that is coming from the chip will be launched with the internal clock, but will
have to be sampled by the external clock. Consequently there will be less time for this signal.
It should now be clear why it might be desirable to set constraints on the clock insertion delay property
by specifying minimum and maximum values in the chip.ctstch file by MinDelay and MaxDelay
parameters. The clock insertion delay can play an important part in the I/O delay. You may want to
keep the insertion delay within certain limits to ensure proper I/O timing.
Design tools have different mechanisms to deal with these three different cases. The simple solution
is to use multiple constraint files for different stages. However, both S YNOPSYS D ESIGN C OMPILER
and C ADENCE S O C E NCOUNTER accept several parameters to deal with this problem automati-
cally. In the following we will discuss on how C ADENCE S O C E NCOUNTER calculates delays in the
presence and absence of clock tree. The following table summarizes the most important settings:

37
timing analysis mode clock propagation mode clock latency
(setAnalysisMode) (set propagated clock) (set clock latency)
-noSkew forced ideal no effect
-skew -noClockTree forced ideal SDCs in effect
-skew -clockTree SDCs in effecta SDCs in effectb
a
still ideal mode unless set propagated clock is set
b
set clock latency command is overridden by overlapping set propagated clock constraints

The timing analysis mode is automatically updated by C ADENCE S O C E NCOUNTER to match the
design stage, i.e. before clock tree insertion it is set to ’-skew -noClockTree’ and afterwards to
’-skew -ClockTree’. The analysis mode can also be changed manually with the setAnalysisMode
command.
The two synopsys design constraints (SDC) set_propagated_clock and set_clock_latency
are usually specified by the designer in the chip.sdc file. Furthermore, CTS tries to add a
set_propagated_clock constraint on-the-fly (in memory), which can cause a number of prob-
lems:
• This constraint will only be added if the AutoCTSRootPin pin/port in chip.ctstch and the clock
waveform source pin/port (from the create_clock command in chip.sdc) are perfectly identi-
cal, i.e. not port vs. instance pin etc.
• This constraint is never written to your chip.sdc file, so if you reload that file the constraint is
lost.
• Before CTS, only a pointer to your constraints file is saved along with the database. Now, if a
constraint was added by CTS, all loaded constraints (including the new one) will be saved along
with the database to a new file (*.pt). Restoring this database will then load this new constraints
file instead of the one in encounter/src/ that you might have expected.
Note: As soon as you manually (re-)load a constraints file, the behavior is reverted to the normal
one.
Now, as can be seen from the table above, to get the actual timing of the buffers/inverters on the
clock tree instead of ideal mode, setting both ’-skew -ClockTree’ and set_propagated_clock
is required. Also note that set_propagated_clock gets overridden for all pre-CTS design stages
and could therefore be set right from the start (as already mentioned earlier).
In ideal mode, the clock tree insertion delay is zero unless the set_clock_latency command
is used to specify a different number, preferably close to the delay of the real tree (that is still to
be inserted). While this ”placeholder” delay has the advantage that the I/O timing doesn’t change
between pre-CTS and post-CTS phases, it renders timing reports more intransparent and is not
handled exactly the same across different tools. Therefore, do not use this command unless you
know what you are doing.
In conclusion, it is recommended to include set_propagated_clock right from the start, not use
set_clock_latency and load modified timing constraints after CTS only if required, i.e. when the
I/O timing numbers (set_input_delay, set_output_delay) need to be adjusted to account for
the actual clock tree29 . For this training we will modify and reload the constraints30 .

29
For slower clock speeds and/or uncritical I/O timing this is often not required.
30
It might be more convenient to keep a separate post-CTS constraint file rather than changing the numbers back and
fourth when redoing the flow.

38
Tclk Tclk Tclk
tidel tin2reg treg2reg treg2out todel
tpd ff tpd a tinpad tpd b tsu ff tpd ff tpd c tsu ff tpd ff tpd d toutpad tpd e tsu ff

a b c d e

Top
a) Clk

Tclk Tclk Tclk


tidel tin2reg treg2reg treg2out todel
tpd ff tpd a tinpad tpd b tsu ff tpd ff tpd c tsu ff tpd ff tpd d toutpad tpd e tsu ff

a b c d e

Chip
b) Clk

Tclk Tclk Tclk


tidel tin2reg treg2reg treg2out todel
tpd ff tpd a tinpad tpd b tsu ff tpd ff tpd c tsu ff tpd ff tpd d toutpad tpd e tsu ff

a b c d e

Chip
c) Clk

tdi
External Clock Clock insertion delay

Internal Clock More time for input Less time for output

tidel tin2reg treg2out todel

The previous figure illustrates all three stages in some detail. Whereever possible the same naming
conventions as the textbook have been used31

31
Refer to page 235 “How to formulate timing constraints”, and page 346 “How to achieve friendly input/output timing”
for more on this topic

39
Student Task 31:
• Copy your timing constraints file to filter_chip_postCTS.sdc and then modify the I/O
timing constraints to account for the insertion delay of the actual clock tree, make sure
that the clock is set to PROPAGATED MODE and load the constraints (Timing →Load \
Timing Constraint ...a )
• Run timing analysis (make sure to select P OST-CTS as design stage).
• Examine the reports timingReports/chip postCTS*. You should now see the real timing on
the clock network.
• If you have violations, run a P OST-CTS (!) optimization with default settings. This should
fix all violations.
• Save the entire design.
a
Currently loaded constraints will be purged before the new ones get loaded.

10 Signal Routing

We will now route the signal nets. What you have seen so far are only trial-route nets that are not
DRC clean and can therefore not be manufactured.
Student Task 32:
• There are two routing engines in C ADENCE S O C E NCOUNTER . WRoute is the older one
and NanoRoute is supposed to be the latest and greatest. Start NanoRoute by selecting
Route →NanoRoute →Route.... A large window will open. Enable the I NSERT D IODES
option (you can leave the D IODE C ELL N AME field blank) and leave all other settings at their
defaultsa . Click OK to start routing. You can observe the progress in the console window.
a
On multi-CPU or multi-core machines you can increase the number of CPUs used by selecting Set Multiple
CPU. This gives almost a linear speedup.

The F IX A NTENNA and I NSERT D IODE will cause the router to change layers and/or insert special
protection diodes in order to avoid damages that can happen during manufacturing due to charges
that accumulate on the wires and stress the gate oxide of input pins. Note that this is usually referred
to as P ROCESS A NTENNAS which is entirely different from geometrical antennas (which is related to
dangling wires).

40
Our example design should route without problems. This is not always the case and we might get
geometry violations. Geometry violations include shorts between nets and design rule violations (for
example metal lines are drawn too close to be manufactured as separate wires). Needless to say
that we must solve all these violations.
You should always closely examine the violations in order to find out what causes them. Sometimes
there is an unfortunate placement of macro-cells or power lines to blame and sometimes there is just
not enough space to route all connections. Solutions range from re-running routing to completely
reworking the floorplan.

Student Task 33:


• Now that we have the real signal wiring we need to perform a postroute timing analysis to
see if we still meet all constraints. At this point not only a setup time analysis, but also a
hold time analysis needs to be run. Usually it is not necessary to deal with hold time until
this point.
Note that you have to do two separate runs, one for setup and one for hold, as it is not
possible do this in one single step. Use the GUI (make sure to select P OST-R OUTE) or type
the commands below to perform the two analyses.
enc > timeDesign -postRoute
enc > timeDesign -postRoute -hold

• Inspect the two summaries and the report files written to the ’timingReports’ directory. You
will most likely have setup violations.

To fix violations or increase the hold margin we can now perform a postroute optimization. Internal
hold time violations need to be fixed in any case as, unlike internal setup violations, they can not be

41
avoided later on (i.e. real chip) by lowering the clock speed32 .
Further possibilities to improve timing include over-constraining the P OST-CTS optimization and en-
abling the T IMING D RIVEN option of NanoRoute. Earlier in the flow, T IMING D RIVEN P LACEMENT
might be worth a try. Please note that the biggest improvements are possible with ’Pre-CTS’ opti-
mization as the registers can be moved and resized at that stage. Per default, clock tree insertion will
”fix” the registers to preserve the clock tree, i.e. they no longer can be moved or resized.

Student Task 34:


• If you have large ”reg2reg” setup violations, this step may take a very long time. During the
initial iterations of the design, it might be a good idea to use a more conservative (using a
longer clock period) timing constraint so that not much time is spent during the optimization.
Once you are satisfied with all other aspects of the design, you could revert to the original
time constraints and let the optimizer try to achieve the timing.
• Perform a postroute optimization Timing →Optimize ....
• Optimization will delete and re-route all nets that are affected by the changes and run setup
and hold mode timing analyses at the very end. Once again, inspect the reports.

Student Task 35:


• Now let us have a look at the postroute timing of our clock tree(s)
enc > reportClockTree -postRoute

This will print a summary on the console and write a couple of report files chip.ctsrpt* to
the encounter directory. There should be no (or only minor) violations of our clock tree
constraints.
Please note that the previous postCTS and postRoute setup (and hold) analyses already
consider clock skew as they time every single path from the clock root to the leaf pins
separately. Therefore, even a rather big skew reported here doesn’t really matter as long
as the former analyses passed.

So far, the clock tree has been routed as any other signal net. This is usually good enough, but if you
want, for whatever reason, to further improve clock net timings, you can do the following (in CTS):

32
This does not necessarily hold true for multi-clock designs.

42
• In the clock tree constraint file, set RouteClkNet YES. This is a per-tree setting that instructs
CTS to call NanoRoute in order to route this clock net during clock tree insertion. The wires
get a status of FIXED and will therefore not be changed later during signal routing. While this
improves timing on the clock tree, overall routability gets worse.
• To further improve timing, you can tell NanoRoute to route this net not like an ordinary signal
net, but to create a balanced routing (by following the so called RouteGuide computed by
CTS). To do so, set UseCTSRouteGuide YES in the clock constraint file33 .

11 Timing Debug

To analyze timing violations, C ADENCE S O C E NCOUNTER also offers a graphical interface (Timing
→Debug Timing) that visualizes paths and allows cross-probing with the layout. We will not explain
the tool in detail here, but rather make some important notes:
• This functionality is sort of standalone, it does not use results from the timeDesign command
but runs a new analysis that generates the file top.mtarpt. Then these paths are visualized.
• If the above file already exists, it will usually simply be loaded. This means that whenever your
design has changed you have to regenerate this file in order to get up to date data. This can be
done with the G ENERATE switch on the form that opens when you click the folder icon.
• When generating the top.mtarpt, the current timing mode is relevant, i.e. to analyze hold paths
timing mode has to be set to hold mode.

33
This will persistently(!) alter the global CTS Mode to ‘‘setCTSMode -useCTSRouteGuide’’

43
12 Finishing

We are almost done with backend design, there are only a few steps required to finish the layout and
verify that everything is correct.

12.1 Insert Filler Cells

Student Task 36:


• Now that we don’t need the additional space within the standard cell rows anymore, we
have to fill these gaps with filler cells. This is required for fabrication. In addition, some of
them contain capacitors between VCC and GND that filter spikes on the power lines.
enc > source scripts/fillcore.tcl

Note that your row utilization will be 100% after this step. This means that you will have no room
for further optimizations. Make sure to insert filler cells after all optimizations have been completed.

44
Note: It is also possible to remove the filler cells with Place →Filler →Delete... or by using
the script removefillcore.tcl.

12.2 Checking Connectivity and Geometry Violations

Now that we are completely finished with the layout, we should make sure that we have no connection
errors, i.e. all logic connections from the netlist are also present in the physical layout.

Student Task 37:


• Select Verify →Verify Connectivity ... from the menu. A window will appear.
Run the analysis and check the console for the report summary. There should be no
violations.
• In a similar way let us verify all geometrical shapes. Select Verify →Verify Geometry \
... from the menu. Run the analysis and check the report on the console. You should get
no violations.

There is a script that will perform the last verification steps for you automatically. You can set a
variable DESIGNNAME to assign the base name for all the files generated by this script.
enc > set DESIGNNAME MyBeautifulChip
enc > source scripts/checkdesign.tcl

45
12.3 Evaluate the Physical Design

Take the time to examine the routing. This is the main feedback you need for a second back-end
iteration. Try to view all metal lines separately to see how congested your routing is. If you see a lot
of Metal-6 (orange) you are probably close to the density limit. In our design you should not notice
any congestion and Metal-6 will barely be used. If your design routed without problems and the
routing was rather sparse then the next time you could assign a smaller core area and increase the
row utilization. On the other hand if the design barely routed you have found the limits, in a second
iteration you might consider assigning a little more core area timing degrades with congestion.
Check the connections of your macro-cells and pads, this may give you an idea how to place the
macro-cells the next time around. You need to get used to evaluating the result of different back-end
design runs.

12.4 Generate Output Files

Congratulations, you have completed the back-end design. That was not so hard now, was it?

Student Task 38:


• Save your design using Design →Save Design As ... →SoCE to the save directory
and make sure that you use a name that shows this is a finished design (i.e. chip final.enc).
• Finally we need to export all data needed for post layout simulation and physical verification
(DRC/LVS). There is a script that will write out all relevant files to the out/ directorya .
enc > source scripts/exportall.tcl

a
To get complete supply net connectivity in the Verilog netlist for LVS, the missing connections for the power
and ground pins (GNDIO/VCC3IO) of the pads are added and removed on-the-fly. We could also define and
handle these two nets in the same way as VCC/GND, but there are more drawbacks than benefits.

Similar to the checkdesign.tcl file, the variable DESIGNNAME will be used to assign the base name of
the files. If you do not specify a name, final will be used. After you complete this step you will have
the following files:

*.v This is the final netlist. Make sure to use this netlist for post layout simulations.
*.gds.gz The layout in GDSII (Graphic Design System II) format. This is the standard format for
exchanging layout data.
*.sdf.gz The SDF (Standard Delay Format) file to be used for post layout simulation.

46
Institut für Integrierte Systeme
Integrated Systems Laboratory

Department of Information Technology and Electrical Engineering

VLSI II: Entwurf von hochintegrierten Schaltungen


227-0147-00

Training 2

Energy Efficiency and Power Distribution


Prof. Dr. H. Kaeslin
Dr. N. Felber

SVN Rev.: 1025


Last Changed: 2013-11-05

Reminder:
With the execution of this training you declare that you understand and accept the regulations about
using CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://dz.ee.ethz.ch/regulations/index.en.html.
1 What you will learn

In previous trainings, you have learned how to carry out a digital circuit design that meets given
timing and area constraints. This exercise will extend your knowledge to power considerations. More
specifically, we will show you:
• How to determine node activity figures of adequate accuracy.
• How to estimate a circuit’s power dissipation from node activities.
• How to locate excessive voltage losses in power and ground networks.
• How to detect excessive current densities in power and ground networks.
• How to improve power and ground distribution networks where necessary.
• A few ideas for improving a circuit’s overall energy efficiency (optional).
You will be assisted by M ENTOR G RAPHICS M ODELSIM (for circuit simulation) and by C ADENCE S O C
E NCOUNTER (for place&route, preparation of power and ground nets, IR drop analysis, and current
density estimations).

2 Introduction

2.1 Theoretical background

As explained in section 9.1 of our textbook,1 four phenomena dissipate energy in static CMOS cir-
cuits:

Phenomenon Results in dissipation Nature


Charging and discharging of capacitive loads while node voltages dynamic
Crossover currents are in transit
Driving of resistive loads (if any) at all times, even after static
Leakage currents circuit has settled

We will not be concerned with static power in this exercise as we limit ourselves to pure CMOS circuits
with no resistive loads and because leakage is almost negligible due to the conservative fabrication
process being studied. For the needs of EDA tools the dynamic dissipation can be attributed to library
cells as follows.
Internal power Pint is the power dissipated inside a cell for the charging and discharging of internal
capacitances and due to crossover currents.
Switching power Pext is the power dissipated inside a cell for charging and discharging the load
capacitance connected to the cell’s output. That external load consists of the input capacitances
of all cells being driven plus the parasitic capacitances of the wires (aka interconnect).
The total power dissipation Ptot related to a cell can now be expressed as

Ptot = Pstat + Pdyn ' Pdyn = Pint + Pext (1)

Calculating Pext is straightforward


α 2
Pext = fcp Cext Udd (2)
2
1
Hubert Kaeslin, “Digital Integrated Circuit Design, from VLSI Architectures to CMOS Fabrication”, Cambridge
University Press, 2008.

2
where α denotes the switching activity of the cell’s output node, Cext the load capacitance attached,
and Udd the supply voltage. fcp stands for the computation rate, i.e. the inverse of the computa-
tion period. 2 Pint gets calculated in much the same way, yet coming up with accurate activity and
capacitance figures requires detailed information about the inner circuitry and layout of each cell.
A power estimator essentially is a piece of software that sums up the various contributions over an
entire circuit. Provided the same clock and voltage get used everywhere, this amounts to
M N M N
X X
2
X αm X αn
Pckt = Pint m + Pext n ' fcp Udd ( Cint m + Cext n ) (3)
m=1 n=1 m=1
2 n=1
2

Index m = 1...M refers to the cells instantiated in the circuit and n = 1...N to the nets of interconnect
running in between. For each cell, an internal activity figure αm is estimated from the node activities at
the input(s). Note that Cint m is not meant to correspond to any capacitance physically present in the
circuit. Rather, it is just a numerical parameter adjusted for each cell during library characterization
such as to model its internal dissipation. 3
Equation (3) tells us a few important things about power dissipation and power estimation:
• Realistic switching activity figures are crucial, they can be obtained from gate-level simulations.
• Realistic capacitance figures are important, they are best extracted from layout data.
• Dynamic power grows with Udd squared. The power vs. speed dilemma is discussed in the
textbook.

2.2 Manual activity and power calculations for warm up

To get a feeling for the process, let us estimate the power consumption of the toy example of Figure1,
a simple arithmetic processing unit that accepts two unsigned numbers of 4 bits each (InputAxDI
and InputBxDI) and that delivers either their sum or their product at the output (OutputxDO) as an
8 bit word.

Figure 1: A small arithmetic unit used for hand calculations.

A signal AddxSI decides which operation result gets assigned to the output according to the following
rule (in pseudo-VHDL):

2
For standard single-edge-triggered one-phase clocking, computation period and clock cycle are the same fcp = fclk .
Double-edge triggered circuits, in contrast, offer two computation periods per clock cycle so that fcp = 2fclk .
3
Incidentally observe that any attempt to capture the internal dissipation of a cell with a single quantity is not exactly
accurate as the energy dissipated when one input toggles may also depend on what is happening at other inputs at the
same time. And in the occurrence of a bistable, the current state is likely to matter too. While industrial standard
cell models typically cover all possible situations, we shall not be concerned with such details here.

3
if AddxSI = ’1’
then OutputxDO <= InputAxDI + InputBxDI;
else OutputxDO <= InputAxDI * InputBxDI;
end if;

The frequency of ClkxCI is 100 MHz and the input waveforms are represented in Figure2. They are
periodic and the two input values (InputAxDI and InputBxDI) have been chosen to be always the
same. Moreover, suppose that no glitches occur. Supply voltage is 1.8 V.

ClkxCI

InputAxDI
= 0000 1111 0000 1111 0000 1111 0000 1111 0000
InputBxDI

AddxSI

OutputxDO

20 40 60 80 time (ns)

Figure 2: Input and output waveforms.

Table 1: Power dissipated for driving the various nets of interconnect.


Capacitive load Node activity Switching power
Net Cext [fF] α [1] Pext [mW]
ClkxCI 140 2 ...
AddxSI 90 ... ...
InputAxDI (per bit line) 60 ... ...
InputBxDI (per bit line) 60 ... ...
OutputxDO (per bit line) 0 ... ...
Further nets neglected in the context of this excercise

Table 2: Power dissipated by the various circuit blocks (@ 100 MHz).


Dynamic power [mW]
Switching power Internal power
Adder 0.04 0.12
Multiplier 0.66 0.56
Output register + mux 0.00 0.54

4
Student Task 1:
1. Output waveform: Collecting all 8 bits into one signature, draw the waveform and nu-
meric values of OutputxDO in Figure2.
2. Switching activities: Assuming single-edge-triggered one-phase clocking, complete the
node activity column in Table 1.
3. Power spent for switching of nets: You now have all the facts required to calculate the
switching powers associated with the various nets according to (2). Fill in the numbers into
the last column.
4. Power dissipated within circuit blocks: Now consider Table 2. What is the main sink
of power among the blocks listed there and how much does it dissipate?
5. Consolidated dissipation: Compiling all contributions from Table 1 and Table 2, how
much power does the circuit dissipate internally, that is, with no load attached?
6. Overall dissipation: Suppose each output drives a load of 1 pF. What is the total power
consumption now?

3 The test vehicle used for computerized calculations

3.1 Architectural overview

Figure3 illustrates the circuit serving as a test case for this exercise. The circuit is entirely digital
and dominated by two finite impulse response (FIR) filters of identical structures that differ in their
coefficients. Each filter is fully parallel. At the output, an adder combines the high-pass and low-
pass responses. A 2-bit selection input signal can be used to only output the high-pass component
or the low-pass component. Additional flags (ModexSI/TestModexTI) enable/disable the filters
completely.

OUTSELECTxSI

MODExSI
0
TESTMODExTI

0 0
16 16
DATAINxDI DATAOUTxDO

0 0

Figure 3: High-level diagram of test vehicle used in this exercise (simplified).

In the exercise, node activity figures will be determined by way of gate-level simulations. For compar-
ison, let us now make a quick back-of-the-envelope calculation from data available without detailed
simulation. The test vehicle is believed to have the characteristics below.

5
Clocking discipline single-edge-triggered one-phase
Clock frequency fclk [MHz] 50
Supply voltage Udd [V] 1.8
Number of interconnect nets N 5 500
Avg. load capacitance Cext n [fF] 30.0
Avg. switching activity αn [1] 0.2
Number of cell instances M 3 900
Avg. equiv. capacitance Cint m [fF] 25.0
Avg. internal activity αm [1] same as αn

Student Task 2: Plug in these numbers into (3) and put down the result here: ....

3.2 Install test vehicle and start cockpit

We provide you with a finished test vehicle with final routing completed. To install it do

Student Task 3:
1. Open a Unix shell window.
2. Install the test vehicle:
sh > /home/vlsi2/t2/install_t2_partA

3. Start the cockpit:


sh > cd training2_partA
sh > icdesign umcL180 &

The design views now available include


1. Source code (available at sourcecode/..)
2. C ADENCE S O C E NCOUNTER database
3. Final netlist
4. .sdf file for back annotation

3.3 Generating stimuli

For running meaningful power simulations we will need the right input stimuli. We provide a set of
stimuli in the simvectors directory (input.stim). During this training, you will need to modify the stimuli
files to estimate power in different operating modes. As seen in Figure 3, the signal OutSelectxSI
is used to control which filter block is added to the output. Furthermore, there is a ModexSI and
a TestModexTI signal that controls how the internal registers are enabled. These signals can be
used to configure the test vehicle in a variety of modes. To change the operating mode, you need to
adapt the number in the first line of the stimuli file simvectors/input.stim, since it encodes the
operating mode of the design as integer value. See the following table for the operating modes we
will use in this exercise.
The subsequent integer values in the stimuli file correspond to the input data. Next let us give some
technical comments on the process of automated power estimation.

6
TestModexTI ModexSI OutSelectxSI(1) OutSelectxSI(0) int value
Enable all: 1 1 1 1 15
Disable HP: 1 1 1 0 14
Clock Gate: 0 0 1 0 2

4 Power Estimation Flow

We are going to use the same CAD/CAE tools your are familiar with from previous exercises and/or
from your semester project in VLSI design. During earlier design phases, ModelSim had served to
functionally verify RTL source code. The focus now shifts to collecting the respective toggle counts
of electrical nodes present in a circuit netlist as a prerequisite for power calculations.
In search of accuracy, we are going to do a postlayout simulation that includes the various lay-
out parasitics that had come into existence once placement and routing were completed. For this
purpose, the netlist — previously written out by C ADENCE S O C E NCOUNTER in Verilog format — is
compiled using ‘vlog’ instead of ‘vcom’ (have a look into the file modelsim/compile gate.csh in order
to observe the compilation of the verilog netlist). Since ModelSim is able to perform mixed-language
simulations, we can use any VHDL testbench (almost the same as the testbench for rtl simulation,
only with some minor adaptations) to carry out this particular postlayout simulation.
The next point that merits your attention is the selection of the stimuli. As power dissipation is data-
dependent, it is important to make a proper choice of the stimuli vectors to get meaningful results.
The node activities used for power estimation must be statistically representative for the target
application which implies that the stimuli will not necessarily be the same as those employed during
functional verification.
What follows is a brief overview of the file types involved in annotating a netlist.

SDF back annotation: The SDF (Standard Delay Format) file contains the information about the
interconnect and cell delays in a design. It can be exported from C ADENCE S O C E NCOUNTER
to transmit these delay data to a simulator (and/or to a static timing analyzer). This file is
required for any type of post-layout simulation, irrespective of whether you are interested in
calculating power consumption or in gate-level functional verification.
VCD back annotation: The VCD (Value Change Dump) file logs all signal changes (i.e. the “events”
in VHDL terminology) that occur during a simulation run. The information is essentially the same
as in the ModelSim wave window but in textual form. File size thus not only grows with design
complexity but also with the length of a simulation run. A VCD is required for power analysis
with C ADENCE S O C E NCOUNTER . For obvious reasons, it is always possible to extract the
average activity for each circuit node from a VCD file but not the other way round.

As a welcome observation, we note that no parasitics exchange file (such as SPEF or RSPF) is
required to transport estimated capacitance values from the place&route tool to the power calculation
tool as both functions are assumed by C ADENCE S O C E NCOUNTER in the current design flow.
Side note: Our experience suggests that, while internal dissipation is well characterized in our tech-
nology (umc L180), leakage power is often by far overestimated.

7
5 SoC Encounter Power Analysis

In this section we will perform a power analysis of our final chip using different sets of toggle activities.
C ADENCE S O C E NCOUNTER is able to perform a power analysis based on statistical estimates of the
switching activity. For more accuracy it can also process value change dump (VCD) files generated
as a result of post-layout simulations. Throughout the whole power analysis exercise you will have to
update the following table continously.
We will first start C ADENCE S O C E NCOUNTER and load the saved test vehicle.
Student Task 4:
• Start C ADENCE S O C E NCOUNTER .
• In the C ADENCE S O C E NCOUNTER GUI and select the menu Design →Restore \
Design →SoCE... and choose chip filter.enc from the save directory. Among the views
on the top right hand, select the last one, the P HYSICAL VIEW.

Power Analysis Method Total Power [mW] Dominating Instances Power [mW]
Global Activity
Input Activity
VCD-Based Activity
Enable all
Enable all (zero inputs)
Disable HP
Clock Gate

5.1 Statistical Power Analysis

As you know, dynamic power consumption directly depends on the switching activity. C ADENCE S O C
E NCOUNTER provides some simple approaches that estimates the switching activity of the circuit,
without running costly simulations. These methods are useful to quickly get a first measure of the
chip’s power consumption.

Global activity

C ADENCE S O C E NCOUNTER allows to automatically set a default toggle-activity value to all internal
nodes. Throughout the power analysis each internal node of your chip will toggle with this value
during each clock cycle.

Student Task 5: In order to start this analysis, select Power →Power Analysis
→Run Power Analysis... a . In this form, select the folder reports/power as the results
directory (see Figure 4).For the moment leave the clock frequency at 100 Mhz. Then step into
the Activity tab and write 0.2 as global activity (this means that every node will change its
state with a probability of 0.2 per clock cycle). This is a good initial value. At this point, you are

8
able to start your first statistical power analysis. Press the OK button (or A PPLY)
a
If the menu Run Power Analysis... is not available select first Set Power Analysis Mode... and press
OK with the default settings. Now the previous menu should be accessible

The power analysis will then start and write lines similar to the following on the C ADENCE S O C
E NCOUNTER shell window:

CPE found ground net: GND


CPE found power net: VCC voltage: 1.8V
INFO (POWER-1606): Found clock ’ClkxCI’ with frequency 50MHz from SDC file.

CK: assigning clock ClkxCI to net ClkxCI

Propagating signal activity...

Starting Levelizing
2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT)
2011-Nov-07 10:29:54 (2011-Nov-07 09:29:54 GMT): 5%
..

Among the messages in the console you will find some information about the clock. Notice that the
clock frequency extracted from the SDC file (50 MHz) does not match the frequency specified in the
GUI. The tool will use the SDC version, so the entry in the GUI will be ignored. It is important that
you always check the clock frequency on the console.

Student Task 6: Adjust the clock frequency (dominant frequency value) in the GUI so that
it matches the SDC value, and rerun the analysis.

There will be a warning message on the console about the TIE cells not having a power model. Since
the tie cells, do not have any switching activity (they tie the output to either logic-1 or logic-0), this is
not really a problem.
At the end of the analysis C ADENCE S O C E NCOUNTER will write a summary on the console. The
result will also be written to the chip filter.rpt file, in the reports/power directory. Have a look at it and
try to identify the main results of the power dissipation of your chip. How much power does the chip
dissipate? What are the values that contribute most to the total power?

Student Task 7: Talk to an assistant and discuss where most of the power is being dissipated.
Calculate the total power dissipated by these instances. Update the results table at the beginning
of section 5. Use the additional column to enter the power dissipated by the above mentioned
instances.
Once we run the analysis again this report file will be overwritten. For this exercise we would
like to preserve the file, so that we can compare the results later on. Step into the encounter
directory of this exercise and make a copy or move the file under a different name, for example:
sh > cd ../encounter
sh > mv reports/power/chip_filter.rpt \
sh > reports/power/chip_filter_ga.rpt

9
Figure 4: Run Power Analysis menu in Cadence SoC Encounter .

Input Activity

Setting all internal nodes to a fixed activity is a gross oversimplification. Not all gates will switch with
the same probability (i.e. a 3-input AND gate switches its output much less than say a 2-input XOR
gate). Instead of setting a default switching value to every internal node of the chip, it is also possible
to define only the activity of the input pins. C ADENCE S O C E NCOUNTER is then able to propagate
this activity inside the chip.

Student Task 8:
• To execute this new power analysis go back into the Run Power Analysis... menu
and deselect the global activity option in the Activity tab. Return to the Basic tab and
put the value 0.2 in the input activity field. As before set the frequency to 50 MHz. Leave
the flop activity and the clock gate activity fields emptya .
• Run the analysis and check the new report. What is the total power dissipation of the chip
now? Can you explain the difference with the previous value? Which of the two results is
more reliable?
• Update the table you started from the last time with the current results.
• As before, rename the generated report file:
sh > mv reports/power/chip_filter.rpt \
sh > reports/power/chip_filter_ia.rpt

a
The first specifies the activity of outputs of sequential logic, while the latter specifies the average number of
times that a clock-gating cell switches in a clock cycle.

10
5.2 Stimuli-based Power Analysis

Using a circuit simulator to determine node activity figures

Instead of trying to estimate the switching power (with different levels of accuracies), we can use the
M ENTOR G RAPHICS M ODELSIM simulator to run the complete simulation and determine the exact
switching activity. We can tell M ENTOR G RAPHICS M ODELSIM to write out a ‘Value Change Dump’
(VCD) file from the post-layout netlist, which will for all nodes include information that tells when the
node has switched to what value.
Student Task 9: Step into the modelsim directory of this exercise:
sh > cd ../modelsim

Compile the placed & routed netlist of the final design. Also compile the testbench and related
files. All these compilations can be performed by executing a single shell scripta :

sh > ./compile_gate.csh

Now start the simulator with a prepared run script:

sh > ./run_gate.csh

a
A good idea is to take a look at it! you should know what you are executing.

To view the input and output of the filter, there is a .do file that will show the relevant signals in the
Wave window. On the console you could type:
vsim > do wave.do

Student Task 10:


• Now we are ready to generate the dump file. We will first simulate the circuit for 100 ns
so that the circuit is properly initialized (we do not want to include the activity during the
initialization phase). Then we have to tell modelsim where to store the VCD file. The last
thing is to specify the names of the nodes that we would like to monitor, i.e., the scope. The
following three commands are used for this purpose:
vsim > run 100ns
vsim > vcd file ./vcd/chip_filter.vcd
vsim > vcd add -r /chip_filter_tb/DUT/*

• At this stage we can run the gate-level simulation until the end (20.142 ns). Moreover, the
simulator needs to be flushed at the end of the simulation run to make M ENTOR G RAPHICS
M ODELSIM write the VCD file.
vsim > run -all
vsim > vcd flush

11
For a real design, the simulation could take a very long time, and more importantly, could produce
very large (Gigabytes !!) of VCD files. For your own designs consider writing the VCD files to the
/scratch directory.
This simulation, however, should not take that long. As you can see from the wave window, the inputs
are rather random, and should produce a lot of activity.

Stimuli-based Activity

At this point, we have a VCD file that contains the toggle activity of the nodes in the design based
on a simulation with actual stimuli. We will now give it to C ADENCE S O C E NCOUNTER to perform a
stimuli-based power analysis:

Student Task 11:


• As before, select the menu Power →Power Analysis →Run Power Analysis....
• In the main tab, select VCD F ILE to perform a simulation-based power analysis. Note
that if you don’t check this option, SoC uses the values given in the other fields. Take the
generated VCD file and enter as S COPE the top-level module chip filter tb/DUT. Note that
there is no leading slash ’/’ in the scope. You could also specify a start and stop time for
the power simulation. Here, specify a start time of 100 ns, and a stop time of 20,000 ns
(numbers are taken from the simulation). Leave the block field empty and press A DD. Do
not forget to press A DD!
• The results directory should be reports/power. See Figure 5 to get an overview of the
window’s setup. Press OK..

Figure 5: Run Power Analysis menu in Cadence SoC Encounter with vcd file.

12
Once the power analysis starts, it will start writing to the C ADENCE S O C E NCOUNTER shell messages
that look similar to the last times. But we have to study them carefully. When the clock period specified
in the SDC file, and the clock period within the VCD file do not match, you will get a message that
says (for example):

WARNING (POWER-1784): Existing clock frequency 217.391MHz


is being overwritten with 200.034MHz on clock rooted on
net ClkxCI from VCD file.

In this case the VCD clock frequency will be taken. In our exercise, we do not have this problem.
Furthermore, there will be a message similar to the following one

With this vcd command, 4426896 value changes and 1.99e-05 second
simulation time were counted for power consumption calculation.

The line above summarizes how C ADENCE S O C E NCOUNTER has interpreted the VCD file. It is very
important to make sure that the time (expressed in seconds) is equal to what we have simulated (and
have intended). In our case, the time should be 20,000 ns - 100 ns =19,900 ns, which matches the
above message. Make sure that you have the correct time.

Filename (activity) : ../modelsim/vcd/chip_filter.vcd


Found in design : 24858/26118
Coverage for file : 5473/5473 = 100%

The lines above tell us what C ADENCE S O C E NCOUNTER has extracted from the VCD file. It is
very easy to make mistakes and use the wrong VCD file. The second line shows the total number
of switching activities, and the third line shows what percentage of the internal nodes that were
annotated.
If you see that the message looks like the following:

Found in design : 0/0


Coverage for file : 0/5473 = 0%

you have a problem (most probably, it is the wrong file, or the wrong scope has been specified
because the leading slash has been omitted). C ADENCE S O C E NCOUNTER will still perform the
analysis regardless of the success of the annotation. Since nothing was backannotated, the results
will just be wrong.

Student Task 12:


• Take a look at the report chip filter.rpt in the output directory that you have selected. How
much power does the chip dissipate now?
• Update your results table with the latest result. Do not forget to update the power in the
second (mystery) column.
• Compare the results with the older analyses, does your result make sense?

13
5.3 Effect of Switching Activity

For the last part we have used a simulation of random input data. The stimuli file was given for the
exercise, and we just used these values. The question that we should now investigate is how much
could the stimuli file effect the overall power consumption.

Student Task 13:


• To do this, we apply the stimuli producing the least activity in the design: an all zero vector.
Generate a stimuli file with an all zero input and record a new VCD file. (You will have to
figure out how)
• Update the estimated power in our table.
• Present the results to an assistant.

5.4 Architectural Changes to Save Power

Architectural decisions can have a signicant effect on the power consumption of the circuit. The
test circuit we use in this exercise has been designed to have several different operation modes that
correspond to differing architectural choices. A summary of the options can be found in Section 3.3.
The stimulus file in the previous section used both the high-pass and the low-pass filter component
at the same time (option Enable all). The first thing that we will do is to disable the high-pass-filter
(option Disable HP) and check the resulting power analysis.

Student Task 14:


• Modify the stimulus file, simvectors/input.stim so that the option Disable HP is selected.
You should only change the first number in the stimulus file. (Make sure you are not using
the stimuli file with zero activity!)
• Perform a power analysis using the VCD file generated from the new stimulus file.
• Report your numbers in the table. How does it compare to previous results?

After examining the power reports, and consulting the simplified block diagram in Figure 3, you should
notice that there is a way to reduce the power consumption without losing functionality.

Student Task 15:


Describe a couple of approaches that could reduce the power consumption of the circuit. Discuss
your solutions with an assistant.


We will implement a solution that uses clock gating technique to disable the unused filter bank. The
test circuit already has the control signals for this solution (see Section 3.3). We will use the option
Clock Gate. This option will a) only enable one block, and b) use clock gating to stop the clock
propagation in the block that is not enabled.

14
Student Task 16:
• Modify the stimulus file, simvectors/input.stim so that the option Clock Gate is selected.
You should only change the first number in the stimulus file.
• Perform a power analysis using the VCD file generated from the new stimulus file.
• Report your numbers in the table. How does it compare to previous results?

The change in the input file uses the ModexSI to disable the filter blocks in connection with the
OutSelectxSI signal. The TestModexTI signal toggles the Clock gating circuitry: 1 - clock gate
inactive, 0 - clock gate active.
Normally architectural changes like the one we have just described can not always be performed by
changing the input stimuli (this was done in this exercise to save time). Such architectural changes
would require changes to be made to the circuit description, re-synthesis of the circuit, and a fresh
back end design process. Once the backend process is complete we would extract the SDF file and
the netlist, use M ENTOR G RAPHICS M ODELSIM to generate a new VCD file, import this file back into
C ADENCE S O C E NCOUNTER and perform the power analysis.

E Explain the numbers in your final table with your assistant.


E
Next week, we will study the effects of IR drop and investigate the effects of different power distribution
strategies.

15
6 Ground bounce, supply droop and Electromigration

In this part of the training we want to determine an adequate power routing strategy for our design.
We can determine the width, layers, and the number of stripes and power rings by evaluating how
much the power distribution is affected.
To perform this analysis, we will use the Rail Analysis of C ADENCE S O C E NCOUNTER . The rail
analyser can show the current density, ground bounce and supply droop across the power lines in
a chip. This allows us to evaluate whether or not the current power distribution is adequate for the
design. In C ADENCE S O C E NCOUNTER the ground bounce and supply droop are called IR drop.
While designing the power nets, it is important to keep in mind two different problems:
• IR drop: Since the metal exhibits a natural resistance (R), current (I) flowing through such a
connection will create a voltage drop. This in turn will reduce the supply voltage of any cell,
which is at the detriment of its performance (increased propagation time e. g.). Additionally,
excessive supply drop and ground bounce may violate noise margins leading to a malfunc-
tion of the chip4 . Depending on process voltage temperature (PVT) variations, it immediately
influences the correct behavior of the chip.
• Electromigration5 : Thermally agitated metal ions are washed away by flowing electrons, thereby
reducing the cross section of the metal. As a final result an interruption of a power line can
occur, which destroys the chip. This phenomenon is dependent on the current density J.
IR drop is a problem that has an immediate effect on the chip’s operation, while electromigration is
a slow process, which may show its negative impact after months or even years during which the IC
has been correctly working. The positive side effect of designing the supply wires sufficiently wide
with respect to electromigration is that fusing due to of high current densities is prevented. That is,
constraints for preventing electromigration are much tighter than those for preventing fusing.
Fortunately, C ADENCE S O C E NCOUNTER features efficient rail analysis tools that show the IR drop
along the supply lines and the current density therein graphically. Basically, there are two versions of
the Rail Analysis available:
• Early Rail Analysis: Is a simplified analysis that can be used after floorplanning.
• Rail Analysis: A more accurate analysis, that can also take into account the power distri-
bution within macros such as I/O pads and memory macros.
In this exercise we will use the more precise one, i. e. the Rail Analysis.

7 The Test Vehicle

The design being used throughout this part of the training will already be very familiar to you. It is
the same design you used in Exercise 3. In order to give you an overview of it once again, Figure 7
illustrates the main components.

4
Check the VLSI book in chapter 10.3
5
Check the VLSI book in chapter 11.6.1 for a more detailed discussion

16
BCJRDDataxDI
in2Gamma

DataxDI DataxDI Input mem1 mem2 mem3


Memory
MBJCRFsm
OutRam1xD OutRam2xD OutRam3xD

LLRSelectxSI

FSM
ModexSI
PADS

PADS
gammaAdder gammaAdder gammaAdder

BistGammaOkxTO
BistAlphaOkxTO
TestModexTI BistAlphaDonexTO

BistEnxTI BistGammaDonexTO
dummyBeta
alphaConn betaConn
Conn

alphaUnit betaUnit dummyBetaUnit

ClkxCI

alphaMem GammaxD
ResetxRBI
BetaGammaxD LLRxDO LLRxDO
BetaxD
AlphaxD
LLRUnit
i_res.. MBCJRUnit
top

mbcjr_chip

Figure 6: The test vehicle being used.

7.1 Installation and Preparation Work

The test vehicle can be installed as follows:


Student Task 17:
1. Open a Unix shell window.
2. Install the test vehicle:
sh > /home/vlsi2/t2/install_t2_partB

3. Start the cockpit:


sh > cd training2_partB
sh > icdesign umcL180 &

Afterwards, load the already prepared design:

Student Task 18:


1. Start C ADENCE S O C E NCOUNTER .
2. Navigate to Design →Restore Design →SoCE... and choose mbcjr chip.enc from the
save directory.
3. Change to the P HYSICAL VIEW of the design.

17
For the Rail Analysis, some power-specific information is required, which can be gained from the
power analysis as follows:

Student Task 19:


1. Setup the power analysis mode: Power→Power Analysis→Set Power Analysis \
Mode... and click OK using the default settings.
2. Switch to the C ADENCE S O C E NCOUNTER shell and execute the following command in
order to perform a power analysis which generates the required power-specific information
for the Rail Analysis:
enc > report_power -rail_analysis_format VS \
enc > -outfile reports/power/mbcjr_chip_vcdx4.rpt

3. Watch the output within the C ADENCE S O C E NCOUNTER shell and check whether the
coverage of the node activity file reaches 100%.

The output of the power analysis should look similar to the following:

Loading TCF file save/mbcjr_chip.enc.dat/mbcjr_chip.tcf

Filename (activity) : save/mbcjr_chip.enc.dat/mbcjr_chip.tcf


Found in design : 26202/26202
Coverage for file : 26202/26202 = 100%

TCF-Toggle Count File: You should have recognized that for the previous power simulation we
didn’t use a VCD file (as within the first part of the training), but a TCF file instead. As the name sug-
gests, the TCF filetype contains the toggle count information of the nodes, and is an SoC Encouter-
specific file format. In contrast, the VCD file format contains the complete timing information. TCF
files can be generated from VCD files but not the other way round.
Now you are ready to start analysing the design with regard to its power distribution.

8 Rail Analysis

8.1 Rail Analysis Setup

Student Task 20:


• From the menu select Power→Rail Analysis→Set Rail Analysis Mode.... Within
the B ASIC tab, set the ACCURACY to Accurate. For the P OWER G RID L IBRARIES choose
the .cl files in the directory tech/cl/.
• Select EM Analyse Models and choose the file tech/EM.6.models.
• Compare the settings to Figure 7. If all is correct, save the settings by using S AVE .. and
then press the OK button.

18
Figure 7: Set Rail Analysis Mode GUI in Cadence SoC Encounter .

8.2 IR Drop Threshold

To perform IR drop analysis, we need to fix a threshold that indicates the worst acceptable voltage
level in the design. The threshold voltage can be extracted from the databook (located in the DOCS
directory):

Student Task 21: Look for the operating conditions in the standard cell databook and report the
following values:
Operating voltage:
Minimal voltage:

At first sight, a good threshold value might be the minimal voltage of the standard cells. However, we
need to take into account that the IR drop analysis is done for VCC and ground separately, that is the
maximal IR drop is the sum of VCC and ground drops.

Student Task 22: Taking into account the considerations from before, determine an appropriate
threshold level for the IR drop on the power nets:

19
8.3 Rail Analysis Run

Student Task 23:


• To run the analysis select from the menu Power →Rail Analysis →Run Rail Analysis....
• Set VCC as the Power Net(s), set the Voltage(s) and the appropriate threshold.
• In the Power Data menu choose the Current Files switch and then select the instance
current file that was generated in the previous step, i.e. static VCC.ptiavg for the net VCC
(located in the reports/power directory).
• C ADENCE S O C E NCOUNTER does not really know how the power signal will enter the chip.
You can do this by using the P OWER PAD definition. The easiest way is to use a Pad File.
To create this file, choose Pad File click on the C REATE button.
• In the Edit Pad Location window set the net name under Auto Fetch Pad Location
to VCC and press AUTO F ETCH. The Pad Location List is updated with all the VCC sup-
plies. Now you can save this list under the name mbcjr chip VCC.pp in the save folder (use
the VS file format). Close the window by pressing C ANCEL.
• Back to the Run Rail Analysis... you have to load the Pad Location List that you
have just created by selecting it within the Files: option. As the Net Name: use VCC
and press the A DD button.
• After providing the results directory reports/rail, the GUI should look similar to Figure 8.
Press the A PPLY button.

If the rail analysis succeeded, the C ADENCE S O C E NCOUNTER shell should display an output similar
to the following:

* Exiting vstorm2 normally.

vstorm2 exited successfully.


Check Reports/main.html generated inside state directory.

8.4 View Rail Analysis Result

Once the rail analysis is completed, you have to open a new window, named Power & Rail \
Results to be able to see the results.

Student Task 24:


• Go to the menu Power →Report →Power & Rail Results....
• This will bring up a new window. In the Basic tab, at first select the B ROWSE button
and choose the previously generated rail analysis results, which should be located in the
reports/rail directory. The results files will be called something like VCC 25C avg 1. Press
the L OAD S TATE button to load the results.

20
Figure 8: Rail Analysis GUI in Cadence SoC Encounter .

Note that the last number of the result files directory (’1’ in the example aforementioned) gets incre-
mented each time you run a new rail analysis. Thus, when you want to view the results of a new rail
analysis, you need to load the state from the new result directory. The tool will allow you to visualize
different features like the IR Drop or the Current Density directly on your chip.

IR Drop

For the first step we will analyze the IR Drop map of the chip.

Student Task 25: Under R AIL A NALYSIS P LOT T YPE select IR - IR D ROP. Make sure that the
option AUTO A PPLY in the ACTION field is checked. Otherwise you will have to press the A PPLY
button in order to show the results. Compare your settings with those from Figure 9.This will give
you a color coded map of the IR drop of your chip. The highest drop will be colored dark red. You
can dim the rest of the circuit with F9 button to see the IR drop more clearly.

By default, the tool will automatically determine the color ranges. You can change this if you want in
the AUTO F ILTER field (e. g. by pressing the AUTO button).

Resistor Current

In the Power & Rail Results window select RC - R ESISTOR C URRENT to show the plot of the current
flowing across the wires. Again you can check AUTO A PPLY or press A PPLY.

21
Figure 9: Power & Rail Results GUI in Cadence SoC Encounter .

22
Resistor Current Density

The resistor current density plot (RJ - R ESISTOR C URRENT D ENSITY option) computes the ratio
J/Jmax for every wire of the chip. More precisely, J corresponds to the actual chip current density
and Jmax is the maximal allowed current density of the selected metal. A ratio greater than 1 means
that the current density limit of the segment is violated. This is an important aspect since for critical
values of J/Jmax , your chip could suffer from the problems described at the beginning of Sec. 6.

Student Task 26:


• Examine the default design, talk to an assistent and discuss some possible solutions in
order to better distribute the power.
• Where is the worst IR drop located?
• Where is the worst resistor current density located? Why?

9 Power Distribution Techniques

Throughout this section we will apply different techniques, which allow us to better distribute the
available power within our design. In order to see how the particular techniques effect the power
distribution of our design, Table 3 should be updated with the gained results continously.
Note that, in order to make you aware of the different problems for power distribution, we use a design
that is very bad in the beginning so that you see the increases of the different steps. In a typical chip
design flow, most of the steps are however not necessary.

Table 3: Power Distribution Techniques - Results Table.


Voltage / IR Drop [V ] Nets below Threshold [%]
Default design:
Connected pads:
Connected macro:
Widened power rings:
Doubled power rings:
Power rings @ Metal /Metal :
Added power stripes:

Student Task 27: Have a look at the results of the rail analysis of the default design, which you
have gained during Section 8 and fill out the first row of Table 3. The first empty column of the
table should contain the maximum IR Drop within the design, whereas the second column should
be completed using the number of nets, which violate the IR Drop threshold (in %).

23
9.1 Supply Pads Connectivity

One of the major issues, why our test vehicle has such a bad power distribution is due to the tiny
connections between the supply pads and the actual core of the design. One way to solve this
problem would be to manually widen those connections. Another way is to use the built in routing
option from C ADENCE S O C E NCOUNTER , which makes sure that the connections are as wide as
possible for the used supply pads. This can be done in the following way:

Student Task 28:


• Go to the menu Route →Special Route.... Within the B ASIC tab have a look at the
R OUTE field and deselect all options except the one for PAD P INS. Your settings should
look similar to those in Figure 10. Close the dialog using the OK button and as soon as
routing has finished, have a look at the newly created connections at the supply pads.
• Run another rail analysis as described in Section 8.3, have a look at the results (see Sec-
tion 8.4) and complete the appropriate row in Table 3.

Figure 10: Special Route GUI in Cadence SoC Encounter to improve Pad Connectivity.

9.2 Macro Blocks Connectivity

You should have already recognized that another major problem within our design is the connectivity
of the macro block. Fixing this issue is more or less equal to the previous one:

24
Student Task 29:
• Go to the menu Route →Special Route.... Within the B ASIC tab have a look at the
R OUTE field and deselect all options except the one for B LOCK P INS. Close the dialog using
the OK button and as soon as routing has finished, check the newly created connections
at the macro block.
• Run another rail analysis and complete the results table.

9.3 Adjustment of the Power Rings

The current width of the power rings is definitely at a minimum (they are almost as narrow as the cell
library allows them to be). In order to get some information about the different available metal layers
and their electrical characteristics, you will now examine one of the technology specific files provided
by the design kit:

Student Task 30: Navigate to the directory encounter/tech/lef/ and open the file header6 V55.\
lef using less. Browse through the file and complete the following table:

Minimum Wire Width Maximum Wire Widtha Resistance Thickness


µm µm Ω/2 µm
Metal 1
Metal 3
Metal 6
a
Watch out for the maximum wire width before slotting occurs.

Now you should be able to set the width of the power rings accordingly:

Student Task 31:


• Use the ruler to determine the width of the power rings. How wide are they currently?
• What would be a more suitable width for the power rings?
• Ask an assistant whether your assumptions are suitable or not. Correct them if necessary.
Afterwards open the menu Power →Power Planning →Add Rings... and insert the
settings illustrated in Figure 11.
• Run another rail analysis and complete the results table.

Widen the power rings definitely improved the power distribution of our design. Nevertheless, not all
of the nets reach the previously defined threshold. Hence, we have to take further steps in order to
acquire a lower IR Drop. One possibility is to double the number of power rings:

Student Task 32: Open the menu Power →Power Planning →Add Rings... and apply
the same settings as within the previous step, except the N ET ( S ). Here insert GND VCC GND \
VCC, which results in doubled power rings. After hitting the OK button, run another rail analysis
and write down the results in Table 3.

25
Figure 11: Add Rings GUI in Cadence SoC Encounter .

26
As you should see from your results, the addition of a second power ring does not improve the power
distribution much. Therefore you can delete the second power ring we have just created by simply
removing the appropriate wires within the design. What you can see from the previous step is that
oversized power networks do not always help you to get a better power distribution. Instead, they
only consume die size, which certainly can be used in a better way.
Throughout the previous section you have gained some electrically-specific information about the
different metal layers. Maybe you can already imagine that the choice of the correct metal layer also
plays a major role during designing the power distribution network. Hence, let us now try to change
the metal layers of our power ring in order to reduce the IR Drop.

Student Task 33:


• First, remove the existing power ring within the floorplan (select and delete).
• Open the menu Power →Power Planning →Add Rings... and keep the previously
entered settings (Check that you do not insert the unnecessary second power ring this
time.), except that you choose a more suitable metal layer.
• Press the OK button and run another rail alaysis. Have a look at the results of the rail
analysis and complete the corresponding row in your results table. Which metal layer did
you choose and does the change improve the power distribution?

9.4 Power Stripes

Still some of the voltage levels of the nets within our design are below the initially set threshold. As
we can see from the latest rail analysis results, the highest IR Drop is located right in the middle of
our design. Therefore we will try to correct these violations by inserting power stripes.
Like during the insertion of power rings, you also have many different parameters which you can tune
during the insertion of power stripes. Some of them are listed in the following:

Orientation: Power stripes can, of course, be inserted either horizontally or vertically. Because the
supply wires for the standard-cells are horizontally aligned, vertical power stripes are more
suitable to improve the power distribution.
Width: As with power rings, the width of the power stripes can be defined.
Quantity: Depending on the present design, you may have to adjust the number of the power stripes
being inserted.
Power Grids: Further power distribution techniques like a power grid (i. e. vertical as well as horizon-
tal stripes) are possible6 .

For our design we will only insert a single power stripe:

Student Task 34:


• Open the menu Power→Power Planning →Add Stripes... and navigate to the B A -
SIC tab. The stripes should be designed for the Net(s) GND VCC. Choose an appropriate
metal layer and a Vertical direction. The Width of the stripes should be 20µm and they

6
Figure 10.9 of Section 10.4 within our textbook “Digital Integrated Circuit Design, from VLSI Architectures to CMOS
Fabrication” shows some sample layouts.

27
should have a Spacing of 1.5µm.
• In the S ET PATTERN field select the N UMBER OF SETS and insert just a single set.
• The stripes should be inserted at a predefined locationa . Within the First/Last Stripe section
select Start from: left and for R ELATIVE FROM CORE OR SELECTED AREA insert 430µm.
Compare your settings with those from Figure 12 and press the OK button.
• Run your final rail analysis and check the results. Complete the results table. Hopefully,
you don’t have anymore violating nets.
a
As already mentioned earlier, re-runnig the whole backend designflow for each power distribution improvement
would have been too time-consuming for a single afternoon. Therefore the nice guys from the DZ have already
prepared a suitable location for the power stripes.

Figure 12: Add Power Stripes GUI in Cadence SoC Encounter .

28
9.5 Conluding remarks

Although we primarly tried to reduce the worst case IR drop and tried to be above the specified
threshold voltage, you should in general also check that the IR drop distribution is consistent to your
expectations. For instance, you would expect increasing IR drop the farther away you go from power
distribution.
Also note that we only do a rail analysis for the VCC net and thus omit the ground network in this
training.

10 It’s Your Turn

Now that you are more or less an expert7 in power analysis and power distribution techniques and
you know how to circumvent appearing problems, you can show what you have learnt by the use of
a new sample design.

Student Task 35:


• In order to close the current design use the C ADENCE S O C E NCOUNTER shell and type in:
enc > freeDesign

• This will close the previous sample design. Open the new design as usual by navigating
to Design →Restore Design →SoCE... and choose mbcjr chip II.enc from the save
directory. Change to the P HYSICAL VIEW of the design.

Before you can start with the power distribution analysis in C ADENCE S O C E NCOUNTER , you need
to create a VCD file to get node activity information using the technique you learned in the first part
of this training. In the following, we recapitulate the flow:

Student Task 36:


Compilation of the netlist: As a starting point, use the (C ADENCE S O C E NCOUNTER ) exported
Verilog netlist, which is located at /encounter/out/mbcjr chip II.v. This netlist, together with
the testbench- and simulation-specific VHDL files, has to be compiled. The required VHDL
files are listed in the following:
1. /sourcecode/VHDLTools.vhd
2. /sourcecode/LTEPkg.vhd
3. /sourcecode/mbcjr simulstuff.vhd
4. /sourcecode/mbcjr chip TB pack.vhd
5. /sourcecode/mbcjr chip TB.vhd
You may want to have have a look at the gate-level compile script we used during the first
part of this training.
Simulation of the netlist: If the netlist and the VHDL files have been compiled successfully, you

7
Although you should be familiar with all of the tasks required for this part of the training, do not hesitate to ask an
assistant if you get stuck somewhere. The EDA tools can be a little bit confusing at the beginning. Nevertheless,
this part of the training should help you to get a better overview of how power analysis works by going through all
of the different steps on your own, this time without a guided tour provided by the assistants.

29
can start with the actual simulation of the netlist. The gate-level simulation script from the
first part of the training will help you to design a suitable run script for your current design.
The SDF file you will need for the simulation is located at /encounter/out/mbcjr chip II.sdf\
.fixed.gz. Because the present design has a RAM macro block in it, you have to specify
the fsa0a c memaker verilog - library before you can run the simulation (In addition to the
core- and I/O-specific verilog libraries.).
In order to get the VCD file, which contains the information of the nodes during the actual
running phase of the design, we recommend to generate the VCD file only between 1µs
and 3µs. This, on the one hand, gives you the advantage that you do not generate the
toggle activity during the initialization phase and on the other hand limits the size of the
resulting VCD file because of the simulation end time.
Power Simulation: Now that you have the node activity file, you can switch back to C ADENCE
S O C E NCOUNTER and create the power-specific files required for the subsequent rail anal-
ysis by running a VCD-based power simulation. Do not forget to run the power simulation
setup at Power →Power Analysis →Set Power Analysis Mode... at first. How
much power does the design consume?
Check the output in the C ADENCE S O C E NCOUNTER shell in order to be sure that the
coverage of the VCD file is OK and hence your power value is correct. After running the
power analysis the files static GND.ptiavg and static VCC.ptiavg should be available in the
directory /encounter/reports/power/.

Now you are ready to start with your first attempts in order to improve the power distribution of the
new design. Do not forget to do the setup of the rail analysis as described in Section 8.1 before you
start with the actual analysis.

Student Task 37:


• Your first task will be to perform a rail analysis of the initial design and complete the first row
of Table 4. Then, improve the power distribution network step-by-step using the techniques
you have seen in the guided example in the previous section.
• Complete the results table below by describing the power distribution technique you have
applied in the first row and the resulting maximum IR Drop in the second row. The goal
should be to achieve a minimum supply voltage level of 1.788 V.

Remark (Hints): In the following, we provide some hints and comments that should help you to
the improve power distribution:
1. A well formed power distribution network cannot be detected by only considering the worst
case IR Drop. Rather, try to build your network in a way such that almost all components
(standard cells, macro blocks, etc.) are provided with the same supply voltage. This in-
cludes that you should not simply stop your efforts as soon as all nets do not violate the
initially set threshold anymore, but try to achieve a balanced power distribution.
2. As you have seen, the special route option in C ADENCE S O C E NCOUNTER can be used
to route specific nets, such as VCC and GND. However keep in mind that C ADENCE S O C
E NCOUNTER considers only those nets, which are not yet connected and moreover con-
siders only those wires, which have not been placed yet (i. e. if there are two wires already
placed on two different metal layers and are running across each other, C ADENCE S O C
E NCOUNTER will not check whether they should be connected during a special route pro-

30
cess).
3. Some of the problems in the design might be much easier to detect by using further analysis
methods of the rail analysis, which we have not mentioned in this training. Feel free to try
the other analysis methodes besides IR Drop and Current Density.

Table 4: New Design Power Distribution Techniques - Results Table.


Step Power Destribution Improvement Voltage / IR Drop [V ]
0 None (Initial design):
1
2
3
4

Congratulations — That’s it!

E Present the results to your assistant and discuss any open questions.
E

31
Institut für Integrierte Systeme
Integrated Systems Laboratory

Department of Information Technology and Electrical Engineering


VLSI II: Entwurf von hochintegrierten Schaltungen
227-0147-00

Training 3

Physical Verification
Prof. Dr. H. Kaeslin
Dr. N. Felber

SVN Rev.: 878


Last Changed: 2012-10-10

Reminder:
With the execution of this training you declare that you understand and accept the regulations about
using CAE/CAD software installations at the ETH Zurich. These regulations can be read anytime at
http://dz.ee.ethz.ch/regulations/index.en.html.
1 Overview

In the last two trainings we have learned to use SoC Encounter and have transformed a netlist into
a physical layout. In this exercise we will deal with two important steps in the back end design flow:
Design Rule Checking (DRC) and Layout Versus Schematic (LVS) validation.
We will also investigate the final physical layout and learn more about how standard cells are de-
signed, and what is inside them.
In the first part of the exercise we will outline the design flow for DRC and LVS for the semester
projects. In the second part we will use DRC and LVS to find and correct errors in several example
designs.

1.1 About the Style

We will try to use a number of different styles to identify different types of actions. These are summa-
rized below:

Student Task: Parts of the text that have a gray background, like the current paragraph, indicate
steps required to complete the exercise.

Actions that require you to select a specific menu fill be shown like the following:
menu→sub-menu→sub-sub-menu
 
Whenever there is an option or a tab that can be found in the current view/menu we will use a button 
to indicate such an option.
Throughout the exercise you will be asked to enter certain commands using the commandline1 . The
following is an example of the linux command line.
sh > command to be entered on the linux command line

2 Physical Verification Design Flow

The goal of the back end design is to convert the structural netlist that we obtained through the
front end design into a physical layout database that essentially contains the geometrical design
information for all design layers. This information (also called the mask data) is used by the IC
foundry site as a blueprint for manufacturing. Essentially the IC foundry gets only a large database
that contains a set of geometric shapes (mostly rectangles) in different layers.
Some of the layers in the database are physical layers. That means these layers will be directly man-
ufactured. a good example are the various metal layers used for interconnections. There are some
non-physical layers that are used as part of the design flow to identify certain structures (umcL180
uses a dedicated layer to identify resistors for example), or just for informational purposes (various
text layers)

1
There are many reasons for using a commandline. Some functionality can not not be accessed through GUI commands,
and in some cases, using the commandline will be much faster. Most importantly, things you enter on the commandline
can be converted into a script and executed repeatedly

2
The IC foundry defines several rules for each of the process layers. In essence the foundry would
say
“If you manage to send me a geometrical database that obeys all the rules I have sent
you, I guarantee that I will be able to manufacture an ASIC that works (with some yield).”
For example, in the umcL180 technology that we use in our exercises, if you want to draw two parallel
interconnections on metal-1 (the lowest metal layer), you have to make sure that the metal lines are
at least 240 nm wide and are separated by at least 240 nm between them. These rules are known as
design rules.
DRC is a rigorous process where the entire physical design database is checked against these design
rules. Over time, the IC manufacturing technology has evolved tremendously. With each technology
step, the design rules have become more complex and numerous. A modern process like umcL180
has many different manufacturing options (you could choose between high-speed or low-leakage
options for your transistors, manufacture the chip with 4, 5, or 6 metal layers, use a 2000 nm, 1200 nm
or a 800 nm top metal layer just to name a few), making DRC an increasingly complex operation. The
design rules for the process are linked under docs/umcL180 topological layout rule.pdf.
As mentioned earlier, the IC foundry receives only geometrical information on the design. Before sub-
mitting this data base, it is important to make sure that the bunch of polygons that we are submitting
indeed represent our circuit. This process is known as the Layout (the physical database that con-
tains polygons) Versus Schematic (the design database that contains a netlist of transistors) check.
During LVS, the tool takes the physical database and identifies transistors and interconnections and
thereby creates a netlist directly from the physical database. In the second step this netlist is com-
pared against the original logical netlist (gate level netlist) that was the basis of our design. If both
netlists match, we can be confident that the physical database that we are sending for manufacturing
contains our design, has no shorts or missing connections2 .
We will be using the Calibre tool from Mentor Graphics for both DRC and LVS in our design flow as
illustrated below:

2
Note that this only ensures that the layout corresponds to the final gate-level netlist, whether or not the gate-level
netlist actually does what we expect it to do is an entirely different manner that is handled by design verification.

3
TCL script for export
encounter/scripts/ Cadence SoC
exportall.tcl

Physical Layout Verilog Netlist

encounter/out/final.gds encounter/out/final.v

SPICE includes

verilog2spice calibre/lvs/spice.inc

Calibre DRV
Verilog includes

calibre/lvs/verilog.inc

SPICE Netlist
calibre/lvs/final.sp

Runset file Runset file


calibre/drc/runset.drc calibre/lvs/runset.lvs

Antenna Runset file Calibre DRC Calibre LVS Labels


calibre/drc/antenna.drc calibre/lvs/lvs_labels.txt

2.1 Getting Started

Student Task 1:
• Start by copying the example files into your directory by issuing the command
sh > cd ˜
sh > /home/vlsi2/t3/install_t3

This will create a cockpit hierarchy in the directory training 3. The design that we will use in the first
part of the exercise is the one we have used in training 1, we have included only the relevant parts of
the design database3 . There is a second small design called sbox which contains several DRC and
LVS errors. We will use this design in the second part of the exercise.

3
You can also use your own semester thesis as an example for this exercise.

4
2.2 Accessing the Layout

Hierarchy
Display Layers
Layout Display

Depth
Display

We will use the M ENTOR G RAPHICS C ALIBRE DesignRev (DRV) tool to access
 the gdsiifile produced
by C ADENCE S O C E NCOUNTER . Start the tool either by clicking on the Calibre DRV button of the
cockpit 4 , or manually change to the directory calibre and use the command line5 :
sh > calibre-2009.3 calibredrv -dl .init/L180.layerprops

The -dl option specifies a configuration file that will load change the layer information so that the
colors of the layout looks similar to that you have seen in SoC Encounter.

Student Task 2:
• Load the final gdsii file. Select File→Open Layouts.. to bring out a requester.
• Change to the
 directory ../encounter/out where our outputs are located. You will have to
change the File type at the bottom of the requester to ZIP (*.gz, *.Z, *.z) in order
 
to be able to see our file t2.gds.gza .
a
Most tools will be able to read compressed (∗.gz) files without a problem.

4
you can start cockpit by typing icdesign umcL180
5
do not use a ’&’ to run this program in the background, as it needs to access the console

5
You should now see the layout in the main window as shown in the previous figure 6

Student Task 3:
• Familiarize yourself with the editor, try displaying all levels of the hierarchy, zooming, ma-
neuvering, displaying only certain layers.
Note that, unlike in SoC Encounter you have full access to the layouts you can examine
individual standard cells as well.
• Take a piece of paper, and looking at the layout draw the the transistor level schematic of
the following cells including the transistor dimensions: INV1, BUF1CK, ND2, OA12.

In the previous step you have managed to extract the circuit looking at the geometric information, this
is exactly the same way that Calibre will extract your circuit during LVS phase.

Student Task 4:
• View the full chip again.
• You can also extract nets. Try to determine the connectivity of some nets. By default
you can only extract on the top-level, you
 can change this behavior by by selecting Edit
→Preferences...
  choosing the tab Nets
 and selecting All Levels for the option
Search Depth .a
 

• Try this feature. Select one net. Click


 with the right mouse button to get a small context
menu. You can select Extract Net from this menu.

a
Note that if you select the supply nets, the extraction will take very long. There will be an Abort  option
visible during the extraction on the bottom left corner of the DRV.

2.3 Running DRC

In this section we will perform a DRC on the layout we have just loaded.

Student Task 5:
• Select Verification →Run nmDRC.. to bring out a requester. You will be asked to
select a runset filea . Click on the ... to bring in a file requester and choose the file runset\
.drc from the directory calibre/drc.
a
If you have had previous DRC runs, all previous runset files will be available for you to choose from. Be careful
when selecting a file, since in most cases a cockpit structure is used, the names will generally be very similar.

6
By default Calibre DRV will draw an outline around each instance and will write its name inside the box, while
this is sometimes very useful, it will make the screen very cluttered. You can disable this, by selecting Edit →\
Preferences... and deselecting the option Draw reference outlines from the requester that comes up.

6
The main DRC window contains four main configuration steps which are available on the left hand
side7 :

Rules Defines the run file and the run directory. The runset file sets both options here
Inputs Tells where to find the layout file. The default is to export it from the DRV viewer. This is a
waste of disk space, since we used the viewer just to look at the gdsii file. We will change this
in the next step.
Outputs Specifies where the output will be written (the defaults should be ok)
Run Control Allows you to set some options about the run.

If any of the configurations is not valid you will see the content of the field in red colour.
Under the menu Setup →Select Checks you can examine which design rules will be checked
with the current run. Notice that not all checks are selected as some of the checks are for different
fabrication options.

Student Task 6:
 
• Select the Inputs tab on the main DRC window on the left hand side. We will change the
   
default There is no need to save the layout information once again. Disable Export from layout viewer
 
button, and select the gdsii file (encounter/out/t2.gds.gz) using the file requester.

 
7
There is a fifth one DRC options which is not enabled by default. You will have to go to the menu Setup and
   
enable the DRC Options . However the default values should be sufficient for most cases
 

7
Student Task 7:
 
• Run the DRC by pressing Run DRC on the left hand side of the DRC window. This
operation will take several minutes to complete. A second (RVE) window will open show-
ing the results of the current checks. If there are any errors from any of the completed
checks,
 you will already see them.
 After all checks are complete you will see a third win-
dow ( DRC Summary Report ) will appear. At this point, if you have any errors, they will be
 
shown in the RVE window.
 
If you scroll to the end of the DRC Summary Report window, you should see the following summary
 
(actual numbers may be slightly different).

--- SUMMARY
---
TOTAL CPU Time: 123
TOTAL REAL Time: 74
TOTAL Original Layer Geometries: 274986 (16225226)
TOTAL DRC RuleChecks Executed: 466
TOTAL DRC Results Generated: 0 (0)

The summary tells us that there were 274’986 geometries, and they were checked for 466 different
rules, in the end the important part is the last line TOTAL DRC Results Generated, which tells
us there were ”0” violations.
We are almost complete. For our technology the antenna rules8 have to be checked using a separate
runset.

8
you can read more about antenna rules on page 527 of your textbook.

8
Student Task 8:
• Run a second DRC, this time by using the runset calibre/drc/antenna.drc. You can load a
new runset through the menu File →Load Runset.

It is very important to have a completely error free DRC run for your semester project. We will practice
on how we can fix errors in the second part of the exercise

2.4 Running LVS

In this section we will run LVS on the same design. Note that both DRC and LVS have to be repeated
whenever, there is a change in the layout.
Before we start the LVS, we have to prepare the data. We already have the layout, what we need is
to generate the schematic (the S of LVS). We have the final verilog netlist that we have saved in SoC
Encounter. This netlist contains all modifications and additions (clock tree, buffers, filler cells) of the
back end design flow. This netlist does not contain the transistors within the standard cells. We will
use a simple script to generate a transistor level netlist (in SPICE format) that contains all necessary
information for the schematic.

Student Task 9:
• If you still have the DRC window, close it, we will not need it for a while.
• Using a console change to the lvs directory calibre/lvs. You will find the script verilog2spice
there. Execute it using:
sh > ./verilog2spice ../../encounter/out/t2.v t2.sp

[sh] This will convert the verilog netlist under ../../encounter/out/t2.v into a SPICE netlist
named t2.sp in the current directory.
You will see several warning messages of the type:
Warning: Module instantiation XXX has pin mismatches with module YYY
These messages come from instantiations where all defined outputs are not connected to
a net. This is not tragic, and can be ignoreda .
a
For your own designs you should always investigate what these messages are. For this exercise we have made
sure that there are no surprises.

Now we have our schematic we need to start the LVS. Unlike the DRC, there are several additional
files that control the LVS flow. The t2.sp file will include the file spice.inc which in turn includes the
SPICE level definitions of all the libraries and macro-cells.
The runset file runset.lvs among other things includes the file lvs labels.txt. This file puts two labels for
VCC3O and GNDO the power and ground signals for the I/O cells. Throughout the backend design we
have only used VCC and GND as the only power signals. In reality, there are several other independent
supply nets, and these have to be taken into account for the LVS. The labels are positioned on the
correct I/O supply pads9 and help identify the additional supply nets for the purpose of LVS.

9
Unless you have deviated from the power pin template given.

9
Student Task 10:
• Select Verification →nmLVS..” in the calibre DRV main window to start the LVS. You
will have to load the runset file located under calibre/lvs/runset.lvs.
 
• Once again make the changes in the Input configuration so that the layout is not exported
 
from the viewer but is taken directly from encounter/out/t2.gds.gz.
 
• You will notice that the input configuration is still red. You should also notice that there are
   
multiple tabs in the configuration area. Currently the layout has been selected. Click on
   
the Netlist and bring up the file browser by pressing ... . You should be able to select the
SPICE netlist file that we have created named t2.sp.
 
• The next issue is to tell the Top Cell: name for this design. The default name selected is
 
chip. In our case this is not correct. You can find the correct name from the verilog netlist,
from the SPICE netlist, or by browsing the instances within the file by clicking on the ”... ”
button.
 
• All other options should remain the same. Start the LVS run by clicking on the Run LVS 
button on the right hand side.
 
The LVS run will also produce two windows, the RVE and the LVS Report File . This time there
 
should be two dangerous sounding extraction warnings:

WARNING: Short circuit - Different names on one net

This is a mistake within the RAM macro that we are using. If you examine the warning closely you
will see that the net has two different labels. However, the names are similar, essentially one of the
net names is missing the first character. Text labels placed within the layout can help the LVS tool to
identify the structure of the circuit. In some cases, such mistakes could mislead the LVS tool. In our
case, the exact name is not important and we can ignore these two warnings10 .
 
In the RVE window you can now select Comparison Results on the left tab. You will see a green
   
smiley and the Cell chip Summary will show you the comparison results.
 
Until now we have seen how to run DRC and LVS with example designs. However, in both cases we
did not encounter any problems. In the next section we will briefly talk about what we can do when
we find problems, and how they can be solved.

3 Finding and Fixing Errors

The first question that may come to mind is ”Why should there be errors?”. Afterall we have only
used EDA software to create the final layout, and have also performed both a Verify Geometry
   
(essentially the equivalent of a DRC) and Verify Connectivity (basically an LVS) and had no errors
 
there. There are a couple of explanations
10
In the exercises we frequently tell you that the warnings are not important. This is not always the case. Even in
production design flows, such mistakes, and errors are commonplace. You have to be careful to examine each warning
and error to ensure that there is nothing wrong with your design.

10
• While we did not use it in our design flow, it is possible or sometimes even necessary to make
additional changes on the layout, after SoC Encounter. This may be required to add logos,
bonding pads, chip corners, additional process control monitors, including IPs for which there
are no design views (lef, lib) for SoC Encounter etc. In some cases these additions will be done
manually, which is a very error prone operation.
• SoC Encounter may produce errors, which it can not detect. This may be a ’bug’ in Encounter
(there is no guarantee that there will be none), or (more likely) a mistake in one of the technology
files that tell customize Encounter for a given technology.
• SoC Encounter works with abstract views of macro-cells and standard cells. These cells only
include relevant information for the cells, such as pin locations and obstructions within the cell
but not everything like transistors. Some errors may only show up when the real cells with full
layouts are used.
• IC foundries will only accept data if it passes DRC with a certain tool. In our case we have to
make sure that the Calibre DRC run reports no errors, otherwise the foundry will return the chip
back to us for fixing.
• And last but not least, ASIC design is a very costly process, additional verification by a second
tool would increase our confidence in the overall design.

The next question might be ”So what happens if we actually find an error?”. The obvious answer is to
fix it. But before we start we need to understand the error. Especially modern processes have many
processing options, and many conflicting rules, so the first order of business is to identify the problem
and verify that it actually violates a design rule that needs to be fixing.
In our case, most of the time we will be violating various ”Metal Coverage” rules. However, we
have an agreement with our IC foundry that stipulates that the IC foundry will perform a ”metal filling”
step that will get rid of all problems with ”Metal Coverage” so we can safely ignore these errors.
The second step is to try to determine what caused the error. If at all possible it is better to find a
fix that can be implemented in earlier stages (i.e. in SoC Encounter) rather than fixing the problem
manually11 . Sometimes this will not be possible, the reason may not be clear at the time, or there
might be significant time pressure to finalize the design as soon as possible, and you might be forced
to correct the error manually.
In this section we will pratice finding and fixing errors for a set of crafted examples that contain errors.
We will use a much smaller design, to reduce the run times, and make it a little bit easier to debug
the circuits12 .

11
As mentioned earlier, manually altering the layout is a very error-prone process. In addition, the design might require
another iteration (for a completely different reason) leaving you to fix the same error over and over for subsequent
iterations.
12
If your semester project designs have indeed errors, you can use them as examples as well.

11
3.1 DRC

Calibre opens many windows while running, it might be easier if you quit Calibre and close all windows
before starting the second part of the exercise

Student Task 11:


• Start Calibre DRV and load the design encounter/out/sbox.gds.gz.
• Run DRC (normal DRC, not antenna) on this design, just like you did the previous section.
You will get several errors.

The RVE window should look similar to the figure below:

In this DRC run there are ten DRC errors (RVE calls them results) violating six different rules (RVE
calls them checks). In the figure we have selected the second check named 4.13B.a13 and at the
bottom there is an explanation of the design rule:

Minimum space between ME1 regions is 0.24um where MET1 width < 10um

For a given check, multiple results can be listed (like the one in the figure). The RVE window is
connected to the DRV window. Once you double click on any result, you can highlight the error
location on the DRV window, alternatively you can select one result and press H, or use the right
mouse button to show a context menu. The highlighting will be made using a new layer in the DRV
window. You can treat this layer just like any other layer and hide it if necessary.
We will now locate and correct all the errors. Since this is an exercise we have made sure that the
errors are easy to locate and fix.

13
The name corresponds to the section of the design rule manual. In our case it is on page 28 of the design rule manual
accessible under docs/umcl180 topological layout rule.pdf, which is section 4.13, rule 13B.a

12
Student Task 12:
• Let us start with the error 4.13A, select and highlight the error.
• You will see the error marker, showing the region where the error has been encountered.
However we do not see the complete layout, only the top level of the hierarchy. You can
change the number of hierarchy levels visible by pressing ”>”. Press > at least three times
to make the first 4 levels of hierarchy visible.
• Currently many layers will be visible. We have an error that only concerns metal-1. Figure
out how you can disable the layers, and arrange that only metal-1 (and the error marker)
remains visible.

Select "Move" only "edge" is selected

Leave "metal1" visible


and turn all others off

Measure the width


of the rectangle

Select this edge

move this edge


down by 0.2 µm

make sure "Depth"


shows all layers

Student Task 13:


• Now you should see the problem much more easily. The design rule says that Metal-1 lines
have
 tobe at least 0.24 µm wide. Determine the width of the offending part by selecting a
Ruler and measuring the width.
• We will now have to modify the rectangle so that its width is at least 0.24 µm.
 First
 disable
all the selection options on the top right side of the DRV menu except for edge . This will
 
allow you to select a single edge of a geometric object. Change the tool to Move and try
to move the bottom edge of the polygon by the required amount.
• You will notice that you can not move the object by the amount you want. There is a
grid setting of 0.1 µm that restricts movements by this amount. Change this preference by

13
 
selecting Options →Grid Setting from the menu. Make sure that the Spacing is set
 
to 0.01 um. Now move the edge by the proper amount.

Calibre DRV gives some simple commands to edit the layout. It is important to adjust the layer, grid,
and selection settings continuously to work effectively. As a general rule, never use a fine grid if you
can also work with a coarse grid. In case you make a mistake there is always Edit →Undo:.

Student Task 14:


• Rerun the DRC after fixing the error. 
Please note that you havemodified the layout, and
this time you want to make sure that Export from layout viewer option is selected in the
   
Input tab of the DRC window.
 
If you did everything right, the error should disappear.

Student Task 15:


• Try to locate the 5 violations of the rule 4.13B.a. This rule finds metal-1 structures that are
too close to each other. In the first error you will see that the metal-1 line almost connects,
but falls shy. The problem could be solved in two ways, either we connect the structures, or
we leave sufficient space (0.24 µm) to make them two separate structures. DRC does not
care about connectivity information, that is the job of LVS.
• Errors 2, 3 and 4 have the same cause. These are called notch errors. Although the
structures are connected, any U type shape must still satisfy the minimum distance rules.
Error 5 is slightly different, the problem is on a piece of Metal-1 that is within an instance (a
via instance). You can resolve the problem by moving the instance.
 
• To be able to select the instance, you need to make sure that the selection criteria is ref  .
Then you need to make sure that you are displaying the hierarchy level where the instance
is visible (as outline) but its contents are not visible. Use the ”<” and ”>” keyboard shortcuts
to adjust the hierarchy visibility.
• The via instance contains both Metal-1 and Metal-2. You can not only rely on seeing Metal-
1 in this case but you must make sure to see Metal 2 as well.

In a lot of cases, one DRC or LVS error may cause a series of others. Furthermore, while fixing one
error, it is possible to introduce more errors. Depending on the run time of DRC, you may decide to
check and correct other errors first or go one by one. No matter which way you choose, fixing DRC
errors requires patience and concentration.

Student Task 16:


• Correct all errors until you have a clean DRC.
• Make sure that you save the end result, since we will use this for the LVS.

3.2 LVS

We will continue with the layout that has all DRC errors corrected.

14
Student Task 17:
• Prepare the SPICE netlist for the sbox by using the verilog netlist encounter/out/sbox.v.

• Now run the LVS check following the first part of the tutorial. We need to make a small
change in the runset file. For the previous exercise we used a small file (lvs labels.txt) to
add two labels for the pad power and pad ground signals. This file should not be used
for this second part. You can either comment out the contents of this file, or use the runset
calibre/lvs/runset sbox.lvs.

If you have not introduced additional errors while correcting the DRC errors, you should have 10
Discrepancies as a result.

Note that, it is not easy for the tool to determine which one of the two inputs: the layout or the
schematic is correct. It can just report that there is a discrepancy. LVS errors are more difficult to
locate than DRC errors, since one error can cause multiple discrepancies. In the figure above, we
have chosen Discrepancy #4. At the bottom you can see the problem. The left hand side is the
Layout side and the right hand side is the Schematic (called Source in DRV). The report can be
interpreted as follows:

In the Layout the net 291 connects to the gate of two transistors named X792/M6 and
X792/M0. The LVS tool matches this net to the n878 in the Source netlist. However, for
these two transistor connections there is not a comparable connection. Whereas, simi-
larly there are two connections to the gate of transistors XU892/MMI 2 1 and XU892/\
MMI_3__1 in the Source netlist but not in the Layout netlist.

It is important to go through all discrepancies, and try to find one that is more obvious14 .

14
Some of the discrepancies may be misleading, keep an open mind, and look at them one by one.

15
The RVE tool has many interesting faetures that will allow you to compare the two netlists. Try clicking
on the net name. You will see two schematic windows appear. In some cases this can make it much
easier to understand what is wrong.

Student Task 18:


• Start by the discrepancy where you have one net in the Layout, but no corresponding net
on the Source. There should be two such nets. Highlight the net in the DRV by double
clicking on the net number in the RVE.
• One of the problems is in a single cell. This is not so easy to solve. The other one should
be much more easier. Correct the problem, run DRC first to make sure that you have not
introduced errors, and rerun LVS.

If you have done it correctly you will only have four discrepancies remaining. Problems with supply
nets usually bring in many other errors with it.

Student Task 19:


• Find the discrepancy where you have one net on the Layout but two nets on the Source
netlist. This is a clear case of a short on the layout. Two nets that were separate on the
Source have become a single net, because of an error on the layout.
• Highlight the net on the layout. What you see should have been two separate nets. Can
you find the short?
• There are several small tools that can help you locate the short. Use the aforementioned
trick by clicking on the net name in the RVE. This will launch a schematic window in which
you can see all the instances involved. You should see that four instances are intercon-
nected in the layout, whereas in the source there are two separate nets with two instances
each. At some point in the layout these two nets have merged. Find the short,remove it
and run DRC and LVS again.

If all is well, you should have three discrepancies remaining. Essentially the remaining three are
interconnected. You will not be able to solve them independently.

Student Task 20:


• Once again use the capabilities of the RVE. Before you start, close both schematic win-
dows. Make sure the Highlight →Clear Existing Highlights is not selected.
Highlight all three errors. There are multiple problems here. Look for the mix-up in the area
where all three errors intersect.
• The solution will require you change some connections. It does not really matter how you
choose to correct them, the solution could use any valid layer as long as DRC rules are
not violated. Small chanegs in the layout will also result in small changes in the parasitic
capacitances. However the changes you make, will not have an impact on the overall
performance.

Congratulations, you have successfully corrected all errors.

16
4 Conclusion

DRC and LVS are both an essential part of the design flow. Ideally, the EDA tools should produce
error free designs. However, this is not always the case. and in any case having a second opinion
from a different tool is always good.
Especially LVS errors can be tricky to locate and to correct. These exercises are not meant to be a
thorough manual on how to find and correct all types of errors, but to give you an idea on what can
be done.
It is important to run a DRC after you modify the layout, as any edit could potentially also result in
new design rule violations. Similarly, if you modify a layout that was LVS-clean, you should re-run
LVS.

17

You might also like