CAD4Soc - 2024 - Singlepage - 115 - End

Chapter 4
ANALOG AND TRANSISTOR

LEVEL SIMULATION
Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 115

Introduction to SPICE
SPICE = Simulation Program with IC Emphasis
History
• SPICE history has started 1968 at the University of • 1983: SPICE2G.6: Up to that point, SPICE was written
California, Berkeley in FORTRAN
• 1968: Predecessor CANCER (Computer Analysis of • 1993: SPICE3f: First mature release written in C
Nonlinear Circuits, Excluding Radiation)
• Today: SPICE3f5 is the newest version from Berkeley,
• 1971: Release of SPICE (improved CANCER) into the also available as sourcecode
public domain (responsible: Prof. D. Pederson) http://bwrc.eecs.berkeley.edu/Classes/IcBook/SPICE/
• 1975: SPICE2

Introduction to SPICE
• In June 1998, the father of SPICE and driving

force behind IC simulation has received the 1998
IEEE medal of honor
• Donald Pederson (72) holds no patents, but

always believed in the idea of public domain
software
• He died 2004.

PSPICE (= Personal Computer SPICE)
• Because the source code of SPICE was available to • Today, LTSpice is the choice for free SPICE simulation.
everyone, many commercial versions were developed Software is hosted by AnalogDevices:
by the industry https://www.analog.com/en/resources/design-tools-and-
calculators/ltspice-simulator.html
• PSPICE by MicroSim was the first and became the

most popular circuit simulator for several reasons:
• Limited functions of PSPICE are distributed
freely !!
• MicroSim introduced a very convenient
graphical postprocessor called PROBE
• Powerful input file error checking routines were
introduced

What is SPICE ?
• SPICE is a general-purpose circuit simulation program • Circuits may contain

for nonlinear DC, nonlinear transient and linear AC • resistors, capacitors, inductors, mutual
analyses inductors,
• independent voltage and current sources,
• four types of dependent sources,
• transmission lines,
• diodes, BJTs, MOSFETs
• ...

Example: Voltage Divider
VOLTAGE DIVIDER
VIN 1 0 DC 12
R1 1 2 3000
R2 2 0 1K
C1 2 0 10N
.OP
.END

Scale Factors
T=1E12 G=1E9 MEG=1E6 K=1E3 MIL=25.4E-6
M=1E-3 U=1E-6 N=1E-9 P=1E-12 F=1E-15
• Letters immediately following a number that are • 1000, 1000.0, 1000HZ, 1E3, 1.0E3, 1KHZ, 1K all
not scale factors are ignored, also letters represent the same number
immediately following a scale factor (SPICE is

not case-sensitive)

Built-in Models
• passive devices: R, L, C, G • Dependent Sources

• Independent Sources (including waveform • Active devices: BJT transistors, MOSFETs, Diodes,
….
generators)

MOSFET Models
 Some SPICE implementations have up to 20 different levels of MOS models (distinguished by

the keyword LEVEL and a number).
 The basic three are implemented in all SPICE versions.
LEVEL=1 Shichman-Hodges model All

LEVEL=2 Geometric-based analyical Meyer model All
LEVEL=3 Semi-empirical short channel Dang model All
LEVEL=4 BSIM1 (Berkeley Short Channel Igfet Model) SPICE3, PSPICE
LEVEL=5 BSIM2 Jeng Model SPICE3
LEVEL=5 BSIM3 (version1) PSPICE
LEVEL=6 BSIM3 (version2) PSPICE
LEVEL=6 MOS6 Sakurai-Newton model SPICE3
 BSIM4.6.5 (newest version for < 65nm transistors) has > 400 parameters, 2/3 non-physical

Analog Circuit Simulation (1/11)
• Typical questions • Simulation types and domains

• How does the detailed timing look like? • Time domain: transients analysis (.tran), Fourier
• Are the setup/hold time requirements met? analysis (.four)
• What’s the power consumption in a block? • Frequency domain: small signal analysis (.ac), noise
analysis (.noise)
• How fast can the load capacity be charged?
• DC domain: operating point analysis (.op), transfer
• What’s the threshold voltage of a certain gate?
function analysis (.tf, .dc), two-port network
parameters
• Additional analysis types: temperature analysis

(.temp), parameter optimization (.sens), yield
simulation, statistical (Monte Carlo) simulation (.mc)

• But how do we get the results, for example,

in a simulation of transient signals?

• But how do we get the results? (continued)
Circuit simulation: A mathematician´s point of view

Modified nodal
analysis
A  q ( x(t ))  f ( x(t ))  s (t )  0
Kirchhoff´s laws
Topology of the circuit
Device Models Whereby:

x(t): node voltages, some branch currents
Model parameters
q(x(t)): Terminal charges, branch fluxes
Design parameters A: Incidence matrix (circuit topology)
f(x(t)): Static part of element equations
s(t): Independent sources

• But how do we get the results? (example)
Set of nonlinear ADEs (algebraic differential equations)

• Algorithms solve this problem usually in three steps • Take out the “nonlinear”:
• Take out the “differential”: Reduction of the nonlinear algebraic equations into a
Reduction of nonlinear ADEs into a series of nonlinear series of linear equations using linearization methods
algebraic equations at a series of discrete time points (Newton-Raphson, ... method)
by using numerical integration methods (Euler,  Accuracy is determined by number of
Trapezoid, Gear Shichman, ... method) linearization iterations
 Timestep (delta between time points) is crucial • Solve the linear equation system:
for accuracy! Using methods like LU factorization
 Pivot methods (reordering of matrix elements)

may help to increase numerical accuracy

• Solving a linear equation system (Gauss)

a11 a12 a13 x1 y1
a21 a22 a23 . x 2 + y 2 = 0
• Example: three dimensional a31 a32 a33 x3 y3
system (three variables, N=3) y1
a11 a12 a13 x1
• Multiply line 1 with a21/a11 and 0 a22(1) a23(1) . x 2 + y 2(1) = 0
substract the result from line 2. x3
0 a32(1) a33(1) y 3(1)
Multiply line 1 with a31/a11 and
substract the result from line3 a11 a12 a13 x1 y1
• Multiply line 2 with a32(1)/a22(1) 0 a22(1) a23(1) . x 2 + y 2(1) = 0
x3
and subtract the result from line3 0 0 a33(2) y 3(2)
• Solve equation system backwards x 3 = - y 3(2) / a33(2)
starting from line 3
x 2 = - y 2(1) + x3 a23(1) / a22(1)
x 1 = - y 1 + x 2 a12 + x 3 a13 / a11
• Disadvantages:
• Matrix may need re-ordering in order to get diagonal elements != 0
• The number of multiplications/divisions is 1/3 x ( N3 + 3 N2 - N ), that is, the computation time is in the order of O( N3 ). [N is the
number of variables]

• Solving a linear equation system (LU Method)

• Split Matrix A into two half matrices, so that A = L · U
a11 a12 a13 l11 0 0 a11 1a12u12au1313 l 11 l 11 u 12 l 11 u 13

a21 a22 a23 = l21 l22 0 a21. 0a22 1 au2323 = l 21 l 21 u 12 +l 22 l 21 u 13 +l 22 u 23
a31 a32 a33 l31 l32 l33 a31 0a32 0 a331 l 31 l 31 u 12 +l 32 l 31 u 13 +l 32 u 23 +l 33
(while the lxx and uyy elements can be calculated iteratively)
• With A · x = L · U · x = L · z = -y, you can solve the system in 2 easy steps, since both matrices have a rather easy form:
• Solve L · z = -y for z; then solve U · x = z for x
• Sparse Matrix Techniques
• Typical systems lead to very sparsely populated matrices. There are 90% or more matrix elements = 0,1,-1. Save unneeded
calculations by applying simple rules (for example, no additions, multiplications with 0, …)
• This reduces the number of calculations needed to solve the equation system to the order of O( N1.2 )

• Solving a nonlinear equation system G
• One-dimensional example: ID
u u
(u1-u) G + ID = 0 1
f(u1) = u1 G + IS (exp (u1/uT) - 1) - u G = 0
i.e.: find solution u1 for nonlinear function f(u1) = 0
• Linearize equation around
starting value u10 : f’(u10)
and find solution u11 for f’(u10) = 0
f(u1 )
• Check if accuracy criteria is
met: | u10-u11 | < e f´(u11 )
• If not, continue iteration: u2

1
u10 u1*
Linearize, solve, check u1
• Things look more complex u1

1
in matrix case, but the f´(u0 )
1
method is the same!

• Solving a nonlinear equation system G

• One-dimensional example: ID
u u
1
(u1-u) G + ID = 0
f(u1) = u1 G + IS (exp (u1/uT) - 1) - u G = 0
i.e.: find solution u1 for nonlinear function f(u1) = 0

• Linearize equation around
starting value u10 : f’(u10) f(u1 )
and find solution u11 for f’(u10) = 0
f´(u11 )
• Check if accuracy criteria is
u2
met: | u10-u11 | < e 1
u10 u1*
• If not, continue iteration: u1
Linearize, solve, check u1

1
f´(u0 )
• Things look more complex 1
in matrix case, but the

method is the same!

• Solving a differential equation system

• Replace the diode in the example above with a capacitor C and the equation for u1 looks as follows:
(u1-u) G + IC = 0
u1 = u – (C/G) (du1/dt)
• This problem is solved at certain time points: tn := tn-1 + Dt
with a starting value (e.g. u1,0 = 0) while the unknowns for the next time point are replaced by previous values using
integration methods:
• Implicit Euler u1,n+1 = u1,n + (du1,n+1/dt) Dt
=> du1,n+1/dt = (1/Dt) (u1,n+1 - u1,n)
• Trapezoid u1,n+1 = u1,n + (du1,n/dt + du1,n+1/dt) Dt/2
=> du1,n+1/dt = (2/Dt) (u1,n+1 - u1,n) - (du1,n/dt)
• Using e.g. implicit Euler makes the example equation look like:
u1,n+1 = u – (C/G) (1/Dt) (u1,n+1 - u1,n)
which then can be solved for the only unknown u1,n+1 because the value for the previous time point u1,n is known.

Initial trial operating point OP Analysis
Linearize eqs. around trial point
Next discrete time step Solve linear equations

of differential eqs. Define new trial point
Convergence?
Increment time
Transient Analysis
End of time interval?

Many loops and iterations to solve a huge • Accuracy is controlled by the methods applied
matrix numerically (integration and linearization) and certain error
• Resulting in rather long run-times even for small criteria
circuits • Higher accuracy requirements (that is, smaller time-
• Limiting the size of circuit to be simulated (memory steps, more lineari-zation iterations by using lower
consumption) error limits) result in higher run-times
• Simulators offer a wide set of parameters and options

• “Numerically” means
to control the algorithms described above
• Solution is always an approximation only
• Algorithms may sometimes fail due to convergence

problems

How Accurate Are SPICE Simulations?
• Well, did you think about… • parasitic resistors, capacitors (coupling), inductances,...
• the algorithms used for solving circuit equations • parasitics in power supply network
• numerical inaccuracies (in part controllable by • substrate effects

parameters) • process variations over time
• transistor, resistor, ... model parameter quality • parameter variations across a chip or across a wafer
• …

Difference Between Accuracy and Precision
Difference becomes Average on target (accurate)

clear by looking at
multiple simulations: less more
Standard deviation (precision)

lower
higher
Bild von Frantisek

Krejci auf Pixabay

Chapter 5
DESIGN STRATEGIES

Design Methodologies
• Ad hoc design • Structured design

• Not really professional, but works for small examples • Top-down
(e.g. PROM, PLA,...) • Bottom-up
• Can start from a simple block diagram • Mixed / Meet-in-the-middle
• Place gates / transistors, wire them
• Simulate
• Pray that it works ....

Structured Design Methodologies
• Structured design: Top-down • There is hardly any pure „100%“ project out there
• Understand the top-level specification, which must be done exclusively in a „Top-down manner“, since it is
impossible to 100% specify complex systems
complete
• Partition your design
• Identify & specify subsystems/subblocks
• Develop the next level by again decomposing it ...
• Until you are on the lowest level of abstraction /

primitive level
• Construct your subsystems, assemble them to

the system - done

• Structured design: Bottom-up • This will only work for small examples, since you will
simply not be able to master complex systems with
• Construct your system/sub-system from primitives
this methodology ....
• Assemble your sub-systems to form the system
• Verify that the system fulfills the requirements

• Structured design: Mixed / Meet-in-the-middle • Develop your subblock, simulate/verify it

• Partition your design • Develop an abstraction / abstractions to be back-
• Identify & specify subsystems/subblocks inserted on top-level (you may also call this „library“ or
• Refine your specification successively according „library elements“
to what you know/learned from an early • Verify your top-level design (top-level, either
implementation complete, or cross-sections, or with abstracts)
• Early implementation can also be „behavioral

models“ • In reality, all complex design tasks are done this way.
• Develop the next level by again decomposing it ... • Verification between all levels is essential
• Until you are on the lowest level of abstraction /
primitive level

Divide and Conquer
• Strategy 1: Divide and Conquer (D & C)

• Think of the design problem as a computation
• Divide the main problem into 2 or more subproblems Main
which when solved will lead to the solution to the Problem
main problem
• Some “stitching up” of subprob solns may be
required
Subprob. Subprob.
• Do this recursively, until each small problem can be
solved in an obvious way, e.g. using truth tables (TTs)
Subsub Subsub Subsub Subsub

prob. prob prob. prob.

Tree Design
• Strategy 2: Fast Tree Design for Associative • At each level of the tree the op operations are performed
Operations simultaneously and their results are op’ed at the next higher
level, and so forth
• An associative operation op is defined as one for which: • E.g. of assoc. oper: +, *, and, or, xor
A op B op C = (A op B) op C = A op (B op C) • E.g. of non-assoc. oper: -, /
Thus, A op B op C op D = (A op B) op (C op D). • E.g. designs: AND-tree, Wallace-tree multiplier
• This means that (A op B) and (C op D) can be done

simultaneously to speed up the operation and the results
op’ed to get the final result.
• Thus associative operations can be performed using tree-

like designs to get the result in Theta(log n) time

Speculative Computation
• Strategy 3: Speculative Computation • All the different o/p’s of the diff. Copies of B are Mux’ed
using prev. stage A’s o/p
• If there is a data dependency between two or more portions • E.g. design: Carry-Select Adder (at each stage performs two
of a computation (which may be obtained using D&C), don’t additions one for carry-in of 0 and another for carry-in of 1
wait for the the “previous” computation to finish before from the previous stage)
starting the next one
• Assume all possible input values for the next

computation/stage B (e.g., if it has 2 inputs from the prev.
stage there will be 4 possible input value combinations) and
perform it using a copy of the design for possible input
value.

Speculative Computation
x
0 A
x B(0,0)
B A y
z 0
y
0
(a) Original design: Time = T(A)+T(B) B(0,1)
1
4:1 Mux
z
1
B(1,0)
0
1
B(1,1)
1
(b) Speculative computation: Time = max(T(A),T(B)) + T(Mux).
Works well when T(A) approx = T(B) and T(A) >> T(Mux)

Best of Both Worlds
• Strategy 4: Best of both worlds: Average and worst • Get the best of both (ave-case, worst-case) worlds
case delay
• In the above schematic, we get the good ave case
performance of unary division (assuming uniformly
distributed inputs w/o the disadvantage of its bad
• Use 2 circuits with different worst-case and average-
worst-case performance)
case behaviors
• Use the first available output

Pipelining
• Strategy 5: Pipelining
Stage 1
Original ckt
or datapath Stage 2
Conversion
to a simple
level-partitioned
pipeline (level
partition may not
always be possible
Stage k
but other pipe-
lineable partitions
may be)

The Design Problem: Examples
• Design an 8-bit comparator that compares two 8-bit

• Too cumbersome and time-consuming
#s available in two registers A[7..0] and B[7..0] that
o/ps F = 1 if A > B and F = 0 if A <= B. • Fraught with possibility of human error
• Approach 1: The Truth-Table (TT) approach -- Write
down a 16-bit TT, derive logic expression from it, • Difficult to formally prove correctness (i.e., proof w/o
minimize it, obtain gate-based realization, etc.!
exhaustive testing)
A B F
• Will generally have high hardware cost and delay
00000000 00000000 0
00000000 00000001 0
--------------------
00000001 00000000 1
----------------------
11111111 11111111 0

The Design Problem: Example
• Approach 2: Think computationally/algorithmically • Approach 2(a): Flat algorithmic approach:

about what the circuit is supposed to compute: • Note: A TT can be expressed as a sequence of “if-then-
else’s”
• If A = 00000000 and B = 00000000 then F = 0
else if A = 00000000 and B = 00000001 then F=0
……….
else if A = 00000001 and B = 00000000 then F=1
……….
• Essentially a re-hashing of the TT – same problems as

the TT approach

The Design Problem: Example
• Approach 2(b): Structural algorithmic approach: • D&C approach: See if the problem can be “broken
• Be more innovative, think of the structure/properties up” into 2 or more smaller subproblems that can be
of the computational problem “stitched-up” to give a sol. to the parent problem
• E.g., think if the problem can be solved in a hierarchical • Do this recursively for each large subprob until
or divide-&-conquer (D&C) manner:
subprobs are small enough for TT-based solution
• If the subprobs are of a similar kind (but of smaller
Stitch-up of solns to A1 and A2 size) to the root prob then the breakup and stitching
to form the complete soln to A
will also be similar
Root problem A
Subprob. A1 Subprob. A2
Do recursively until subprob-size

is s.t. TT-based design is doable
A1,1 A1,2 A2,1 A2,2

The Design Problem: Example 2
Example 2: Design of a Parity Detection Circuit Solution 1: A linearly connected chain of XORs:
(16 bit)
(a) A linearly-connected circuit

f = (((x(15) xor x(14)) xor (x(13) xor x(12))) xor ((x(11) xor x(10)) xor (x(9) xor x(8)))) x(0)
xor (((x(7) xor x(6)) xor (x(5) xor x(4))) xor ((x(3) xor x(2)) xor (x(1) xor x(0))))
x(1)
x(2)
X(3)
x(15) f

The Design Problem: Example 2
Example 2: Design of a Parity Detection Circuit x(15) x(14) x(1) x(0)
(16 bit)
w(3,7) w(3,5) w(3,3) w(3,1)
w(3,6) w(3,4) w(3,2) w(3,0)

f = (((x(15) xor x(14)) xor (x(13) xor x(12))) xor ((x(11) xor
x(10)) xor (x(9) xor x(8))))
xor (((x(7) xor x(6)) xor (x(5) xor x(4))) xor ((x(3) xor x(2))
xor (x(1) xor x(0))))
w(2,3) w(2,2) w(2,1) w(2,0)
w(1,1) w(1,0)
Solution 2: A parity tree
Delay = (# of levels in AND-OR tree) * td = log2 (n) *td
w(0,0) = f

The Design Problem
• No concurrency in Solution 1 ---the actual problem • Answer: (1) First of all when the operation makes
has available concurrency, though, and it is not sense for any # of operands. (2) It should be possible
exploited well in the above “linear” design to break it down into smaller operations. (3) Finally,
when the operation is associative. An operation “x” is
• Complete sequentialization leading to a delay that is
said to be associative if:
linear in the # of bits n (delay = n*td), td = delay of 1
gate a x b x c = (a x b) x c = a x (b x c).
• All the available concurrency is exploited in Solution • Thus if we have 4 operations a x b x c x d, we can
2 : a parity tree. either perform this as a x (b x (c x d)) [getting a linear
delay of 3 units] or as (a x b) x (c x d) [getting a
• Question: When can we have a tree-structured
logarithmic (base 2) delay of 2 units and exploiting
circuit for a chain of the same operation on multiple
the available concurrency due to the fact that “x” is
operands?
associative].

The Design Problem
• We can extend this idea to n operands (& n-1

operations) to perform as many of the pairwise
operations as possible in parallel (& do this
recursively for every level of remaining operations),
similar to design (b) for the parity detector [xor is an
associative operation!] and thus get a (log2 n) delay.

The Design Problem
• Let f(xn-1, ….., x0) be an associative function. • Using the D&C approach for an associative operation
results in the stitch up function being the same as the
• What is the D&C principle involved in the design of
original function (not the case for non-assoc.
an n-bit xor/parity function? Can it also lead
automatically to a tree-based ckt? operations), but w/ a constant # of operands (2, if the
orig problem is broken into 2 subproblems)
f(xn-1, .., x0)

• If the two sub-problems of the D&C approach are
balanced (of the same size or as close as possible),
Stitch-up function---same as the
f(a,b) original function for 2 inputs
then unfolding the D&C results in a balanced
a b operation tree of the type for the xor/parity function
f(xn-1, .., xn/2) f(xn/2-1, .., x0)

seen earlier

Chapter 6
ROBUST DESIGN OF ANALOG

AND DIGITAL CIRCUITS

Why Robust Design?
Remember Semester 3 of your Bachelor class, when • Determine the Q-point of the transistor circuit shown on the right-
everything was smooth & easy? All values were bottom side. Assume the following values:
constant values … • V=-15V
• Have a look at the circuit on the top-right. Calculate the current I and • R=75W
the voltage VO (Q-Point). You may assume the following values for • W/L=1/1
the power supply and transistor parameters: • VTN=0.75V, VTP=-0.75V
K n'  25A / V 2 , K p'  10 A / V 2

K n'  25A / V 2 , K 'p  10A / V 2
• VDD=10V
• W/L = 20/1 for both transistors
• VTN=0.75V, VTP=-0.75V

Why Robust Design?
XY Scatter Plot from data_int
440
And this is what a fab will produce over time … 420
400
380
360
340
As a designer, you have to live with these
VTS_0H_N10x016_P05
320
300
deviations…
280
260
240
220
More, you have to design your circuit to work 200
180
within a large range of parameter changes … 160
nf 10x0.16µm²
140
15-Oct-2003 00:00 23-Jan-2004 00:00 02-May-2004 00:00 10-Aug-2004 00:00
date-TVM

Why Robust Design?
• What we have seen on the previous slide is called

“global variation”, since it varies from wafer to wafer
Transistor M216 of Chip401 has a

transistor M216 of Chip401 on this
different VT than
wafer ... than on M216 of Chip588 on
this and the previous wafer..

Why Robust Design?
• But even on the same chip, parameters will not

be the same, this is then called “local variation”
Transistor M8 of Chip23 has a

different VT than
Transistor M7 of Chip23, which is

the 2nd transistor to form a
current mirror

How to Cope with Global Variations?
• The fab engineers and the design engineers • Anyhow, the fab engineers will try their best to run
“sign” a contract their fab as promised to the design engineers
• Fab engineers promise a nominal value for essential • The design engineers will do their best to design
design parameters (e.g. VTN=0.380V) circuits that will work within the specified range, if the
• If this is an established technology, then the fab parameters vary within the specified range
engineers are able to give upper and lower bounds,
including some statistics (e.g. my 3σ lower and upper
bound is VTN=0.295V and VTN=0.476V; whereas my 4 σ
lower and upper bounds are VTN=0.275V and
VTN=0.500V)

Digital Design
• Let´s assume, you want to design a logic circuit.

About what do you care?
• Essentially all you care about is functionality and timing …

Digital Design
• Timing is provided by so-called library files (e.g.

standard cells have .tlf files for Verilog
backannotation). Somebody has taken the nominal
transistor parameters and performed an analog
simulation for you
a y
Will your circuit work for all cases? You don´t know ... Nominal case: 0->1 change on a brings a 1->0
transistion on y after 5ns

Quick Recap on Semiconductor Physics
• What effects slow down and what speeds up • Voltage: The higher the voltage, the faster a
silicon based semiconductors? transistor is switching …
• Temperature: transistors can switch faster if
they are cold, slower if they are hot
• Process: this is how the transistor is made out
of implants, misalignment of different layers, ….

How This Was Mapped to CAD
• In the early times of digital simulation, simply the • For a great process, high voltage and low
nominal time value was taken, and the so-called PVT temperature the delay will only be 5ns*
factors were multiplied upon 0.9*0.85*0.85=3.25ns
• For a bad process, low voltage and high temperature,

the delay can be up to 5ns*1.1*1.15*1.15=7.27ns
• Example: If for the inverter the 0->1 switching delay
was 5ns for the nominal case, and KPmin=0.9, • This is almost a factor of 3, for which your circuit has
KPmax=1.1 KVmin=0.85 (e.g. for VDD=1.5V+10%), to work
KVmax=1.15 (VDD=1.5V-10%), KTmin=0.85 (e.g. for
• This multiplication is also called derating
T=0°C), KTmax=1.15 (e.g. for T=85°C), then

Case Study
• Assume this synchronous circuit: fclk=25MHz (tclk=40ns)

• For the AND/OR: td=8ns. For the inverter: td=5ns. All values
• Assume further for the DFF: nominal case. Will the circuit work?
• tsetup=6ns, thold=9ns, td(elay)=5ns
D-FF1 D-FF2
D Q a_sel D Q
a
s_bar w
s
b b_sel
clk
c
Case Study
• You have to look for the longest path first. For the nominal
case:
D-FF1 D-FF2
D Q a_sel D Q
a
1
s_bar w
s
b b_sel
clk
c
t=0ns: clk: 0->1

Case Study
• For the nominal case:
D-FF1 D-FF2
D Q a_sel D Q
1
a
1
s_bar w
s
b b_sel
clk
c
t=5ns: D is transported to the output Q of D-FF1

Case Study
D-FF1 D-FF2
D Q a_sel D Q
1
a
1 0
s_bar w
s
b b_sel
clk
c
t=10ns: Input of inverter changes its output

Case Study
D-FF1 D-FF2
D Q a_sel D Q
1 a
1 0 0
s_bar w
s
b b_sel
clk
c
t=18ns: Input of AND gives a 0 at the output

Case Study
D-FF1 D-FF2
D Q a_sel D Q
1
a
1 0 0
s_bar 0/1
w
s
b b_sel
clk
c
t=26ns: Input of OR gives a 0 or 1 at the output

Case Study
D-FF1 D-FF2
D Q a_sel D Q
1
a
1 0 0
s_bar 0/1
w
s t=26ns
b b_sel Next tclk at 40ns

clk
c
What was the latest time the signal was allowed to arrive at D of D-FF2?
40ns – tsetup = 40ns – 6ns = 34ns
This is called a positive slack of 8ns

Case Study
• But wait, the circuit needs also to work for the max • Not exactly, but you will experience certain fails in
case. So let´s assume the PVT factors we used a the application, or a lower yield. But overall, most
couple of slides ago. Then we have to multiply the companies do not accept this. For high volume
critical path with the max. PVT factor: products, this is not acceptable at all.
26ns*1.1*1.15*1.15=37.82ns
• What to do?
• This is then called a negative slack of 3.82ns
• Upsize the standard cells with larger output drivers
• Does this mean the circuit will not work? (e.g. an inverter with only td=4ns, an OR & AND with
only td= 6.5ns will do the job)

Case Study
• Are we done? • Assume that this is a perfect synchronous design, i.e.

all logical inputs are connected to registers
• No, we have to check the shortest path for timing
violations
D-FF3 D-FF2
D Q a_sel D Q
a
s_bar w
s
b b_sel
clk
c

Case Study
• Again, first for the nom case:
t=0ns: clk: 0->1
D-FF3 D-FF2
1D Q
a a_sel D Q
s_bar w
s
b b_sel
clk
c

Case Study
t=5ns: D is transported to the output Q of D-FF3
D-FF3 D-FF2
1D Q
a a_sel D Q
1
s_bar w
s
b b_sel
clk
c

Case Study

t=13ns: Input of OR changes its output to 1, arrives at D of D-FF2
• What was the earliest time the signal was allowed to arrive?
• t=thold=3ns, therefore it works!

D-FF3 D-FF2
1D Q
a a_sel D Q
1 t=13ns
s_bar 1
w
s
b b_sel
clk
c
Case Study
• But wait, the circuit needs also to work for min case. • But the hold time of a D-FF tells you that the input
So let´s assume the PVT factors we used a couple of must remain there thold-long after the rising edge of
slides ago. Then we have to multiply the critical path the clock, otherwise the output is not safely
with the min. PVT factor: 13ns*0.9*0.85*0.85=8.45ns switched. In our case thold=9ns (ok, not derated, so
let´s assume this is the case for min/nom/max ….)
• This is again called a negative slack (or hold-violation)

of 0.55ns
• What to do?

Case Study
Insert some buffers in the min-path, until the hold-time

violation disappears
D-FF3 D-FF2
1D Q
a a_sel D Q
1 t=13ns
s_bar 1
w
s
b b_sel
clk
c

Case Study
• So let´s summarize: A circuit works from a timing • In case of a setup violation, simply upsize cells in the
point of view if max-path to be faster
• In case of a hold violation, simply insert buffers in the
min-path to be slower
• The longest path in the circuit does not cause any
setup violation, even if the worst (=max) PVT derating
is assumed
• This is exactly what Place & Route CAD software is
doing
• The shortest path in the circuit does not cause any
hold violation, even if the best (=min) PVT derating is
assumed

Evaluating Circuits Using Simulation
• Imagine you have to verify the circuit using a digital

simulator
• It can be quite cumbersome to

• Find out the critical (min / max) paths
• Establish a test case with all input stimuli
• Include the timing of the wires
• Is there a better way?

Evaluating Circuits Using STA
• Yes, there is. It is called Static Timing Analysis (=STA) • The software simply reports you a setup/hold margin
/ slack, for min/nom/max. In case of a violation, it
provides an example path …
• It is a piece of CAD software, that simply traverses all
possible paths and finds out the min/max paths and
checks for violations • Sounds too good to be true, huh? Indeed, there are a
few drawbacks …
• You do not need any stimuli (that saves a lot of time)

• Not all paths in a circuit are meant to transmit data Usually they belong to a long net, are very slow,
signals. Typical examples are reset wires … asynchronous, but since the timing model of the D-FF
models this path, STA cannot distinguish between data
paths and control paths.
D-FF3 D-FF2
D Q a_sel D Q
a
s_bar w
s
b b_sel
clk
c
reset
• At STA, you save a lot amount of time since you do • Another drawback: suppose your logic circuit is
not have to create test cases extremely smart designed, so that the cases for
which a timing violation would occur will never
• But you have to spend some time to eliminate so-
happen … This cannot be detected by STA …
called “false pathes”
• Lessons learned from practice: You have to create

stimuli anyhow to check the functional behaviour, so
STA will help you to detect the critical cases – then
simulate them with a simulator

Current Research Topics
• Using our case study you have seen that a rather slow • But good news: It is quite unlikely that all stdcells in
design with fclk=25MHz can have problems to work. the min or max path follow the same pessimism.
What about fclk=2GHz? Do we have standardcells with
only 40ps delay?

• What you need is statistics from the fab, how likely it
is that neighbouring cells behave statistically similar
• Answer: No. The pessimism introduced by the PVT or not
derating factors is too large and was even increasing
for modern technologies below 90nm.

• This is then called Statistical Static Timing Analysis
(SSTA)

Robust Analog Design
• So far for digital design. What about analog design?
• For digital design, only function & timing matters. For

analog design much more parameters have to be
taken into account.

Major Ingredients for Robust Analog Design
Process
Circuits Models
Designer 100%
Parametric
Yield
Analysis Tools
Methods
Summer Term | TU Darmstadt | Integrated Electronic Systems Lab

Major Obstacles for Robust Analog Design
How can this fuzzy setup work at all?

Process
with
Variations
Circuits Models
with with
Imperfections Errors
Designer
with
Analysis
Awareness
Tools
???
often with
not done Bugs
Methods
with A responsible-minded designer/in is the
Limitations only guy who can cope with all the
obstacles and uncertainties

Process Variation and Statistical Process Control
• Global parameter variations

• Fab to fab
• Process monitoring
• Process change (yield improvement, new tools) • Statistical process
• Lot to lot control within fab
• Across wafer • Fab synchronization
• Across reticle
• Across chip (in case of process
gradients)
Extended Parameter
Data Base
• Local parameter variations, i.e. device mismatch
• Across chip (plain statistical fluctuations)
• Oxide thickness variation
• Line edge roughness
• Dopand fluctuations • Characterization of
device mismatch

Process Capability Index Cp (Juran, 1974)
LSL = -6 USL = 6 Cp = 1 LSL = -6 USL = 6 Cp = 2
0.5 0.5
0.4 0.4
P1 ( x ) P1 ( x )
0.3 0.3
P2 ( x ) P2 ( x )
P3 ( x ) P3 ( x )
0.2 0.2
0.1 0.1
0 0
6 3 0 3 6 6 3 0 3 6
x x
USL LSL
Cp
6
• Cp compares the specified range with the standard deviation
• Cp does not regard centering of the mean value in the range

Process Capability Index Cpk (Kane, 1986)
LSL = -6 USL = 6 Cpk = 1 LSL = -6 USL = 6 Cpk = 2

1 1
0.8 0.8
P1 ( x ) P1 ( x )
0.6 0.6
P2 ( x ) P2 ( x )
P3 ( x ) P3 ( x )
0.4 0.4
0.2 0.2
0 0
6 3 0 3 6 6 3 0 3 6
x x
min ( USL   LSL ) USL LSL 2 USL LSL

C pk C pk
3 6
• Statistical process control in the wafer fab aims at Cpk  1.5
• This holds for PCM parameters and for product parameters

Statistical Process Control aims at Cpk  1.5
Cpk = 1 Cpk = 1.5 Cpk = 2 Cpk = 1 Cpk = 1.5 Cpk = 2
0.5 1
0.4 0.8
P1 ( x ) P1 ( x )
0.3 0.6
P2 ( x ) P2 ( x )
P3 ( x ) P3 ( x )
0.2 0.4
0.1 0.2
0 0
6 3 0 3 6 6 3 0 3 6
x x
 Relation of standard deviation σ and the capability index Cpk

 minSpec Limit - Mean = 3σ Cpk = 1 is not manufacturable
 minSpec Limit - Mean = 4.5σ Cpk = 1.5 aim of process control
 minSpec Limit - Mean = 6σ Cpk = 2 very low risk for yield

Statistical Process Control aims at Cpk  1.5
• One might argue Cpk = 1 would be manufacturable as • And there are more parameters than only this one
well
• Just with a little yield loss of 0.26% for a Gaussian
• Y = 0.9974 1-Y = 0.0026 = 0.26%
distribution
• Y10 = 0.9743 1 - Y10 = 0.0257 = 2.57%

• But Cpk = 1 (according to  3σ) is not robust at all
• Reconsider the high sensitivity of the Gaussian
• Y100 = 0.7708 1 - Y100 = 0.2292 = 22.92%
distribution
• 1.24% yield loss by 20% increase of standard
deviation σ
• 3.59% yield loss by 20% relative shift of the mean
value μ
• Hence a reasonable margin is required

The Major Difference of DFM Compared to SPC
• Statistical process control is a closed loop procedure • Design verification can be managed either closed loop
• A wafer fab can and will react when things go wrong or open loop
• Design for manufacturing is an open loop procedure • Each update of model parameters starts a new
verification cycle or
• Prediction of Cpk based on models and assumptions
• The design is made once as robust to tolerate a
• Hence design has to cope with various uncertainties
parameter change
• Pls. reconsider, just one critically designed parameter
What’s your preference? – It’s up to you to make a
can noticeably pull down the yield of a whole product 
reasonable choice!
Redesign?

Design for Manufacturability with Uncertainties
• Device models and nominal model parameters • Process statistic and statistical models (Corner
• Model equations which can fit to the measured and MC)
characteristics • Maturity and sufficient stability of the
• Are all known effects implemented (halo, well manufacturing process
proximity, STI)? • Complete monitoring of all relevant process
• Best possible fitting of model parameters – no or parameters
small tradeoffs • Statistical models based on these process
But if the model doesn’t fit then the parameters monitoring results
can’t fit as well
• Models have always some errors (known and

unknown)

Example for Modeling of the Output Resistance Gds of a Long
Channel Device
PF: Ids vs. Vds PF: Ids vs. Vds

@ Vbs = 0V @ Vbs = VDD
PF: gds PF: gds

@ Vbs = 0V @ Vbs = VDD
Output resistance accuracy significantly improved! But still a trade-off between modeling at Vbs= 0V and Vbs= VDD necessary, since the
DITS-output resistance model has no parameter for body bias effect.

Design for Manufacturability with Uncertainties
• Algorithms and simulation methodology • Design has to live with manifold uncertainties
• Limitations of Worst Case method • Sufficient design margin is the only way to cope
• Limitations of Monte Carlo method with uncertainties
How many simulations do I need to get a • The model parameter window needs to be
confident yield estimate? reasonably wider than the assumed process
• Suitability of the simulation setup parameter window
(e. g. simulation of PSRR w/o mismatch of Vth, K’,

gds and γ)
• Limited accuracy of simulation algorithms

depending on settings

Verification of Analog Circuits W/R to PVT
Process Variations Operation Conditions
Global Variations Global Variations

• Process variation • Supply voltage
• SPICE calls this • Temperature
“model parameters” • Rule: To sweep it!
Local Variations Local Variations

• Mismatch • Input (e.g. VIN, VCM)
• SPICE calls this • Output (RL, CL, IL)
“device parameters” • Bias (VREF, IBIAS)
• Operation conditions are deterministic – just to be altered with SPICE
• In the following we will focus on statistical methods for process variation

Methods to Verify Tolerance W/R to Process Variation
Verification of Yield
Robustness and Manufacturability Estimation
Nominal Case Digital Corners Analog Corners Uniform Alternative Standard MC
MC MC
1 4 16…64 100…1000 1.000… 10.000…

Run Runs Runs Runs 10.000 100.000
Process Corner Analysis Monte-Carlo Analysis

• Digital corners • MC with an uniform distribution
• To skew nmos and pmos only • A kind of random corner analysis
• Analog corners • Alternative MC analysis
• To skew also low VT / reg VT • Stratified- / Importance Sampling
• To skew also thin / thick GOX • Standard MC analysis
• To skew also Passives R & C • Requires huge number of runs

The Basic Concept of Corner Simulation
y
Specification Test
• x>-5 • pass
• x+y<9 • pass
x • x > -3 • fail
• x+y<4 • fail
• x2 + y > -2 • “pass”
• Corners define a polyhedron which encloses the parameter range

• This polyhedron overestimates the parameter range  worst case
• A specification which depends monotonous, e.g. linear can be tested
• Test can fail for nonlinear functions / not possible to estimate yield

Corners in an n-Dimensional Parameter Range
n 2n
1 2
2 4
3 8
4 16
5 32
10 1024
• The good news
• Performance depends typically on a small subset of a few parameters only
(But the subset is generally unknown w/o detailed knowledge of the circuit)
• Hence it is not needed to simulate all 2n corners but 16…64 well selected

A 1st Generic Device Model for Process Variation

A 2nd Generic Device Model for Process Variation

How to Select the Corner Distance for Verification
A) Based on a 2-Sigma, 3-Sigma, 4.5-Sigma Corner Set
Before T7 Process Freeze After T7 Process Freeze
Class of circuits Fully All parameters in Fully All parameters in

functional specification functional specification
Power and speed critical 4.5 Sigma 3 Sigma 3 Sigma 2 Sigma

circuits
All other circuits & 4.5 Sigma 4.5 Sigma 4.5 Sigma 3 Sigma
standard blocks
B) Based on a 2-Sigma, 3-Sigma, 4-Sigma, 5-Sigma Corner Set
Before T7 Process Freeze After T7 Process Freeze
Class of circuits Fully All parameters in Fully All parameters in

functional specification functional specification
Power and speed critical 4 Sigma 3 Sigma 3 Sigma 2 Sigma

circuits
All other circuits & 5 Sigma 4 Sigma 4 Sigma 3 Sigma

standard blocks

Monte-Carlo Simulation to Estimate the Yield
• General method
• needs a statistical model for the distribution function
of the (BSIM) model parameters
Random
• a random generator derives the actual simulation generator
20
parameters out of this model 18

16
14
12
HA
• perform N statistically randomized simulations or Prozent

10
8
6
4
measurements respectively 2
0
0.03 0.015 0 0.015 0.03
INT 0.5 h
• derive the distribution of the performance SPICE

(i.e. a histogram of density or probability) 20
18
16
14
• test for each run whether it passes or fails HB

12
10
Prozent
8
the specification
6
4
2
0
0.03 0.015 0 0.015 0.03
0.5 h
estimate the yield
INT
•
pass fail
Y = 1/N * Number_of_Pass * 100%
YP = 98%

• Very important • Alternative application for MC to check the

• the yield has to be calculated / estimated directly robustness
• it is not possible in general to extrapolate the yield • Apply an uniform parameter distribution rather
from the σ than Gaussian
• exception: the performance distribution is known • Results in an equal screening of the whole
in advance parameter range
but then brut force Monte-Carlo is actually not • Helps to identify design weakness w/r to PVT
needed at all • Very powerful method - but doesn’t allow to
• Required number of MC runs is usually very high estimate the yield

• How confident (reliable) is the yield estimation? • Rule of thumb for a confident yield estimate
• Depends on the yield itself • There should be about 9...10 fails inside the
• Depends on the number of runs simulated population
• Depends on the confidence interval, i.e. a range • Rule of thumb: about 10/(1 –Y), i.e. 1000 MC
k = [Ymin  Yestimate  Ymax] runs for Y = 99%
• Depends on the confidence level (95%, 99%) i.e. • 100 runs for 90%, 1000 runs for 99%, 10.000
the likelihood of correctness runs for 99.9% yield
• But does not depend on the simulated circuit or • We can simulate single blocks / specifications
the number of variables only  N  10.000
• For exact formulas of confidence interval see text

book on statistic.

Low-Voltage PTAT Current Source and Bandgap Current Source

Low-Voltage Bandgap Reference Voltage Generator 1
Bandgap Current Output

Operational Amplifier Self Bias Cell Stage

Imperfection of the output current mirror
- Extremely dependent on process (halo)
- Since C070 monitored on a regular basis
- But not yet covered by the corner models!
Anyhow, an analog designer has to know!

This just makes the difference…

VDiode < VREF
Operational Amplifier Self Bias Bandgap Current Cell Output Stage

“Matching” of VREF to VDiode improves it.
- But temperature dependency remains!
Temperature drift might be tuned to zero.

- But process dependency remains!
This is not robust wrt process variation!

Cascode Bias Cascoded Current Mirror

Now it looks very close to an ideal mirror.
Especially it is not dependent on process.
But what about minimum supply voltage?

VDDmin @ fnsp case
VDDmin @ nom case

Cascode Bias Cascoded Current Mirror

VDD > VDiode – VTN + |VTP| + |VSATP|

Folded Cascode Operational Amplifier

Start-up Circuit and Frequency Compensation not shown!

Folded cascode amplifier yields the
margin for operation at VDD = 1.14V
(Could be further optimized).
VDDmin @ fnsp case

Chapter 7
DESIGN FOR TEST

After Taping out ... Test
• Suppose the fab finally fabricates your chip. How do • Rule of thumb: without a special repair step, no
you know your chip works perfectly fine and you can fabricated DRAM or CPU initially works! Malfunctions
sell it? have to be identified and repaired
• The fab has to run a so-called „Test“ procedure, but • You have to find a smart way of checking that all flip-
exhaustively testing a chip if out-of-the-question for flops/registers are working: DfT (Design for Test,
cost reasons already to be applied at Design level)

Design for Test: Scan Design
• General idea: make all Flip-Flops directly controllable

and observable.
• Make the circuit now combinational

Design for Test
• Circuit is designed using pre-specified design rules. • Use combinational ATPG to obtain tests for all testable faults
in the combinational logic.
• Add shift register tests and convert ATPG tests into scan
• Test structure (hardware) is added to the verified
sequences for use in manufacturing test
design:
• Add a test control (TC) primary input.
• Replace flip-flops by scan flip-flops (SFF) and connect

to form one or more shift registers in the test mode.
• Make input/output of each scan shift register

controllable/observable from PI/PO.

Scan Design: Concept
• Memory elements (latches and flip-flops) are • Test data transferred serially to and from R making
designed so that they can be reconfigured memory state completely controllable and
dynamically to form a shift register R during testing observable

Scan Cell Design
4-bit scan register

Scan Design

Steps in Scan Testing
• N/T = 1: Scan in test pattern, hold appropriate bit • Scan provides complete controllability and
pattern on controllable primary inputs observability
• Questions: Testing time? How many cycles? How to

• N/T = 0: Apply test pattern to combinational circuit
test scan registers?
• N/T=1: Scan out test responses

Scan Chains: Summary
• Allows complete controllability and observability • Test application is slow
• Limited to a few hundred latches

• Test pattern must be generated primarily for the
combinational circuit
• Hardware overhead is small: a few extra pins and

some (5 to 20%) extra logic for the latches and flip-
flops

Scan Chains: Basic rules
• Use only clocked D-type of flip-flops for all state • All clocks must be controlled from PIs.
variables.
• Clocks must not feed data inputs of flip-flops.

• At least one PI pin must be available for test; more
pins, if available, can be used.

Scan Chains: Summary
• Scan is the most popular DFT technique: • Disadvantages:

• Rule-based design • Large test data volume and long test time
• Automated DFT hardware insertion • Basically a slow speed (DC) test
• Combinational ATPG
• Advantages:
• Design automation
• High fault coverage; helpful in diagnosis
• Hierarchical – scan-testable modules are easily

combined into large scan-testable systems
• Moderate area (~10%) and speed (~5%) overheads

Chapter 8
STDCELLS, FLOORPLANNING,
PLACEMENT, ROUTING

Repetition: Standard Cell Design
• In a library several primitive cells are available • If available, more complex macros are also available
• Predefined Input/Output blocks (Pad-Cells) • SRAM/eDRAM
• FlipFlops/Registers and Latches • PLL, DLL
• Different logical blocks (inverter, NAND/NOR, 2-4 • ...

inputs, various driving strengths)
• Track: form a grid for routing

Standard Cell Design
• Tracks form a grid for routing. • Tracks form a grid for routing.
• Spacing between tracks is center-to-center distance • Spacing between tracks is center-to-center distance
between wires. between wires.
• Track spacing depends on wire layer used. • Track spacing depends on wire layer used.
• Different layers are (generally) used for horizontal

and vertical wires.
• Horizontal and vertical can be routed relatively
independently.

Standard Cell Structure

Single Row Layout Design
• Pitch: height of cell. • VDD, VSS connections are designed to run through
• All cells have same pitch, may have different widths. cells.
• . • A feedthrough area may allow wires to be routed

over the cell.

Routing Channel
• Tracks form a grid for routing. • Horizontal and vertical can be routed relatively
• Spacing between tracks is center-to-center distance independently.
between wires.
• Placement of cells determines placement of pins.
• Track spacing depends on wire layer used.
• Pin placement determines difficulty of routing
• Different layers are (generally) used for horizontal and
problem.
vertical wires.
• Density: lower bound on number of horizontal tracks

needed to route the channel.
• Maximum number of nets crossing from one end of
channel to the other.

Standard Cell Design: Cells

Standard Cell Design: Example
• Assume the following simple circuit

• Placement and routing of the individual std cells

• Placement and routing of the individual std cells

(now more layers visible)

Standard Cell Design: Example 2
• Edge triggered D-FF

• Edge triggered D-FF

• Full adder: two outputs, sum and carry

• Generate candidates, evaluate area and speed. • Generate candidates, evaluate area and speed.
• improve candidate without starting from scratch. • improve candidate without starting from scratch.
• To generate a candidate:
• place gates in a row;
• draw wires between gates and primary

inputs/outputs;
• measure channel density.

Starting point:

• How to improve this starting version: • Exchange larger groups of cells.

• Swapping order of sum and carry groups doesn’t help
either.
• Swap pairs of gates.
• Doesn’t help here.
• This seems to be the placement that gives the lowest
channel density
• Cell sizes are fixed, so channel height determines area.

Physical Design: Floorplanning
• Goal of Floorplanning: • Late in design:

• Make sure the pieces fit together as planned.
• Implement the global layout.

• Early in design:
• Prepare a floorplan to budget area, wire area/delay.
• Tradeoffs between blocks can be negotiated.

Physical Design: Placement
• Blocks have: • Cannot ignore wiring during block placement—

• area; large wiring areas may force rearrangement of
• aspect ratio. blocks.
• Blocks may be placed at different rotations and • Wiring plan must consider area and delay of
reflections. critical signals.
• Uniform size blocks are easier to interchange.
• Blocks divide wiring area into routing channels.

Physical Design: Routing
• Channels end at block boundaries.
• Several alternate channel definitions are possible:

• Channels end at block boundaries.

• The channel and block boundaries can be represented by a graph:
• Nodes are channels, edges placed between two channels that touch.

• Wire out of end of one channel creates pin on side of next channel:

• Some problems: windmill constellation
• Can create an unroutable combination of channels with circular constraints

Slicability
• A slicable floorplan can be recursively cut in two

without cutting any blocks.
• A slicable floorplan is guaranteed to have no

windmills, therefore guaranteed to have a feasible
order of routing for the channels.
• Slicability is a desirable property for floorplans.

Global Routing / Prerouting
• Goal: assign wires to paths through channels.
• Don’t worry about exact routing of wires within

channel.
• Can estimate channel height from global routing

using congestion.

Line Probe Routing
• Heuristic method for finding a short route.
• Works with arbitrary combination of obstacles.
• Does not explore all possible paths—not optimal.

Channel Utilization & Switchbox Routing
• Want to keep all channels about equally full to minimize

wasted area.
• Important to route time-critical signals first.
• Shortest path may not be best for global wiring.
• In general, may need to rip-up wires and reroute to

improve the global routing.
• Switchbox: Can’t expand a switchbox to make room for

more wiring.
• Switchbox may be defined by intersection of channels.

Routing Order and Switchboxes
• Switchboxes frequently need more

experimentation with wiring order because
nets may block other nets:

Interconnect
• Even assuming logic structure is fixed, we can: • Multipoint nets

• change wire topology;
• resize wires;
• Two-point nets are easy to design.
• add buffers;
• size transistors.
• Multipoint nets are harder:

• How do we connect all the pins using two-point
connections?

Steiner Tree
• Styles of a multipoint wire

Sized Steiner Tree
• The individual wires of a multipoint net can be sized

differently

Reality is Even More Complex
• Buffer insertion in a sized Steiner Tree: More complex • Capacitive coupling introduces crosstalk –
than placing buffers along a transmission line: capacitance and slew-rate determine impact
• complex topology;
• Crosstalk slows down signals to static gates
• unbalanced trees;
• Crosstalk can be controlled by methodological and
• differing timing requirements at the leaves.
optimization techniques, but at the cost of larger
• Neighboring wires influence each other: Crosstalk area
(XTalk)

Crosstalk Analysis
• Assume worst-case voltage swings, signal slopes. • Coupling situations:
• Measure coupling capacitance based on geometrical

alignment/overlap.
• Long running wires on adjacent layers are also bad:

Crosstalk Analysis
• Xtalk has a huge impact also on delay!

Minimize the Impact of Xtalk
• Add ground wires between signal wires:

• coupling to VSS, a stable signal, dominates;
• can use VSS to distribute power, so long as power line

is relatively stable.
• Extreme case: add ground plane. Costs an entire

layer.

Xtalk Aware Routing
• Xtalk and routing:
• Can route wires to minimize required adjacency

regions.
• Take advantage of natural holes in routing areas to

decouple signals.
• Minimizes need for ground signals.

Xtalk Aware Routing: Example
• Example for channel routing:

• Take into account coupling only to wires in adjacent
tracks.
• Ignore coupling of vertical wires.
• Assume that coupling/crosstalk is proportional to

adjacency length.

Bad Solution
• Example for channel routing:

Better Solution
• Example for better channel routing:

Xtalk Analysis
• How to estimate delays induced by crosstalk? • Coupling effects depend on relative switching time of
nets.
• Must use iterative algorithm to solve for coupling

• Effect of coupling capacitance Cc depends on relative
capacitances and delays.
transitions.
• Aggressor changes, victim does not: Cc.
• Aggressor, victim move in opposite directions: 2Cc.
• Aggressor, victim move in same direction: 0.

Chapter 9
VERILOG

History of Verilog
• 1982: HiLo is a popular hardware description • 1991: Cadence „opens“ the Verilog language by
language (by GenRad) founding the OVI (Open Verilog International)
initiative for developing and standardizing the HDL.
• 1984: Phil Moorby (who was co-developing HiLo)
invents Verilog • 1992: Many companies offer Verilog simulators
• 1986: The Verilog-XL simulator (by Gateway) is the • 1995: The Verilog LRM provided by OVI becomes the
most-powerful simulator for digital circuits IEEE standard 1364.
• 1990: Cadence acquires Gateway and owns now • 2001: Latest version of Verilog is called Verilog-2001
Verilog and the Verilog-XL simulator. At the same (IEEE1364-2001)
time, Synopsys is pushing towards top-down logic
• Several extensions are available towards verification
synthesis.
and system description

History of VHDL
• 1981: The United States Department of Defense • 1994: Publication of the VHDL-1993 standard
recognizes the need for an HDL to „overcome the
• 2000: Publication of the VHDL-2000 standard
hardware life cycle crisis“. Sponsored with more than
• 2002: Publication of the VHDL-2002 standard
30 Mio US$
• 2007: Publication of the VHDL Procedural Language
• 1983-85: Development of VHDL by IBM, Intermetrics
Application Interface standard (VHDL 1076c-2007)
and TI
• 2009: Revised standard (VHDL 1076-2008)
• 1986: The DoD transfers all rights of VHDL to the IEEE
• 1987: Publication of an IEEE standard

Comparison of HD Languages
Ideal for coding large

systems on RT level. Ideal for description of ultra-large or
Preferred by European & high data-stream oriented systems.
Canadian companies Synthesis possible (e.g. CatapultC
extension by Mentor Graphics)
SystemVerilog
SystemC
System
VHDL
Algorithm
Verilog
Logic
VITAL
Gate
Layout
Ideal for coding testbenches &
Ideal for coding hardware. The
Device verification. Not accepted for RT
standard for synthesis results on GL.
synthesis by industry
Preferred by US&AP companies

Free Software for HDL Simulation
There´s plenty of free software available for trying out Verilog at home:
• EDAPlayground (VHDL, Verilog, UVM, Python, ….): https://www.edaplayground.com/
• Modelsim (various download sites, also for FPGA, digital logic etc.)
• Icarus Verilog (on GITHub)

Gate-Level Verilog
• Verilog is perfectly suited for the description of gate- module mux2 (in1, in2, sel, out);
level netlists. Therefore de-facto all tool-written output out;
input in1, in2, sel;
netlists on gate-level are in the Verilog language. But
it is also possible to design on Gate-level in Verilog: and a1(a1_o, in1, sel);
not n1(n1_o, sel);
and a2(a2_o, in2, n1_o);
a1 or o1(out, a1_o, a2_o);
in1
a1_o
endmodule
o1
out
in2 n1 a2
n1_o a2_o
sel

Structural Verilog
• You may also combine gate-level primitives with

unary operators:
module mux2 (in1, in2, sel, out); a1

in1
output out; a1_o
input in1, in2, sel; o1
sel out
and a1(a1_o, in1, sel); in2

a2
and a2(a2_o, in2, ~sel);
a2_o
or o1(out, a1_o, a2_o); ~sel
endmodule

How to Test ...
initial
• You may add lines of code inside the module to begin
test the behavior.... $monitor($time,,,
“in1 = %b in2 = %b sel = %b
out = %b“,
in1, in2, sel, out);
module mux2_with_test ();
#10 in1 = 0; in2 = 1; sel = 0;
wire out;
#10 sel = 1;
reg in1, in2, sel;
#5 in1 = 1; in2 = 0;
and #1 a1(a1_o, in1, sel); #8 $finish;
and #1 a2(a2_o, in2, ~sel); end
or #2 o1(out, a1_o, a2_o); endmodule

The Result ....
• Results of simulation
0 in1 = x in2 = x sel = x out = x

10 in1 = 0 in2 = 1 sel = 0 out = x
13 in1 = 0 in2 = 1 sel = 0 out = 1
... But mixing the stimulus and the structure is not
20 in1 = 0 in2 = 1 sel = 1 out = 1 a good idea ...
23 in1 = 0 in2 = 1 sel = 1 out = 0
25 in1 = 1 in2 = 0 sel = 1 out = 0
28 in1 = 1 in2 = 0 sel = 1 out = 1

Creating a Testbench
• Splitting the description and the test code is much

better:
Testbench
Test Generator
Device under Test
&
(DUT)
Monitor

How to Test Better ...
• Simply use the module mux2 defined some pages

initial
ago ... Plus: begin
$monitor($time,,,
“in1 = %b in2 = %b sel = %b
module testbench; out = %b“,
wire w1, w2, ws, wo; in1, in2, sel, out);
#10 in1 = 0; in2 = 1; sel = 0;
mux2 dut (w1, w2, ws, wo);
#10 sel = 1;
testgen_and_monitor tgam (w1, w2, ws, wo);
#5 in1 = 1; in2 = 0;
endmodule
#8 $finish;
module testgen_and_monitor (output reg in1, end
in2, sel, endmodule
input out);

Graphically ...
testbench
tgam dut
s1 w1 s1
s2 w2 s2
sel ws sel mux2
out wo out
testgen_and_monitor

Behavioral Verilog
• The behavior of a block can also be expressed in in1 in2 sel out
terms of procedural statements, rather than gate- 0 0 0 0

level primitives.
0 1 0 1
a1 1 0 0 0
in1
a1_o
1 1 0 1
o1
sel out 0 0 1 0
in2
a2
0 1 1 0
a2_o
~sel 1 0 1 1
1 1 1 1

Behavioral Verilog
• The behavior of a block can also be expressed in terms of procedural statements, rather than gate-level primitives
module mux2_behav (in1, in2, sel, out);

output reg out;
input in1, in2, sel; The “=“ inside a procedural
statement must be made to a
register
always @(in1, in2, sel) begin
if(sel)
out = in1; The “@” is an event control
else statement, followed by a
sensitivity list.
out = in2;
end
endmodule If a change occurs in one of
the signals in the sensitivity Don´t get confused: if the synthesis tool
list, the part between finds out we do not need a register, it is not
begin…end is executed
created!

Processes
• The statement always @(in1, in2, sel) tells the • The basic essence of a behavioral model is the
process:
simulator to suspend execution if there is no change
• Independent thread of control
in at least one of the three inputs
• Think of a complex system as a large set of
• The always continously repeats its statement, never independent, but communicating processes
exiting or stopping.
• In contrast to always : The initial construct is
• Don´t get confused that „out“ is now a reg. Looking executed only once, otherwise similar
from the outside, the circuit is clearly still a

combinational block.
Processes are a group of sequential statements; indicated either

by initial, always statements or of „continuous“ nature

Concurrency
• Concurrent processes „live“/“happen“ at the • What stops a process? Only a delay (#10), wait,
same time: not one after the other. or @ statement. If this is missing, the process
• One model waits for an event that happens will run forever.
(concurrently) in another model

A Few Reflections on Time
• In a behavioral model, time is not existent! • An initial block will only execute once. It will
Behavioral statements (if, loop, while, ....) take always start at time=0 (even if there are more
zero time to execute than one initial block in the system).
• Time only advances in a process, if a wait, @ or • An always process will execute forever
delay is executed

Continuous Assignments
• Continuous assignments are always active:
module oneBitFullAdder (output cOut, sum, XOR of the three inputs

input in1, in2, in3);
assign sum = in1 ^ in2 ^ in3,

cOut = (in1 & in2) | (in1 & in3) | (in2 & in3);
endmodule
Majority of the three
inputs

Blocking vs. Nonblocking Assignment
• Blocking assignment done with „=„ • Only to reg
• Only inside a process • Value of the left-hand-side (lhs) changes

immediately
reg x1, qbar;
initial
begin
qbar = 0; // blocking assignment
#100; // wait for 100ns
qbar = 1;
end

Blocking vs. Nonblocking Assignment
• Nonblocking assignment done with „<=„ • Value of the left-hand-side (lhs) changes only
after rhs has been evaluated
• Only inside a process
• Only to reg • A time may be given additionally as the

assignment delay
reg q;
Doing Synthesis, this
will always create a
always @(posedge clk) flip-flop (register)
q <= #10 d; // non-blocking
@ is perfectly suited for

synthesis of
synchronous systems!

Continuous Assignment (Rep´d)
• Continuous assignment done with „=„
• Only to wire
• Always sensitive to changes on the rhs
assign x1 = a & b; // continuous assignment

Blocking a Process
• Processes can be blocked the following way: • Wait for a signal to change
@(sel_bar);
• Wait for time to elapse:

#300; // waits for 300ns
• Wait for a logical value
wait(sel_bar);
• Wait for a signal edge

@(posedge clk)

Behavioral Verilog
• Some nice & useful procedural Verilog statements:
... The execution stops

input go; here, unless go=1.
Then, the following
always begin code is executed once.
// initialize your block
wait (go);
// do some computations
...
wait (~go);
end The execution stops
endmodule here, unless go=0.
Then, the module
becomes inactive
Wait is not used
again.
for synthesis!

Behavioral Verilog
• The conditional statement: If ....else
...
Else / else if is
if ((a > b) && (c < b))
// then-statement goes here a) Optional
else if (a > d) b) Always matched to
// else-if statement goes here the nearest “if”
else
// else clause goes here

Control Structures
• Control structures are very much like C

• if, while, for, forever, repeat, case, fork/join
A “zero” would be
interpreted as “false”
i = 16;
Condition must change inside
while(i) your process!
begin
// do something useful here
i = i – 1;
end

Control Structures
• if, while, for, forever, repeat,

case, fork/join
module a_very_abstract_dram;
always
begin
read_spd; // read timing/latency settings
forever
begin
get_commands_from_mem_ctrl;
end
end
endmodule
Control Structures

• if, while, for, forever, repeat,
case, fork/join
begin:break
for(i=0; i<n;i=i+1) Proceed with i=i+1; but
begin:continue stay in loop
if(a==0)
disable continue;
... // other statements
if(a==b)
Exit the loop
disable break;
... // other statements
end
end

Control Structures
• Comparison of if and case

reg signed [15:0] dram[0:8191]; // signed 8192 x 16 bit memory
reg[15:0] ir, pc, acc; // 16 bit instruction register
always Very powerful and general comparisons

begin possible – but exact evaluations
needed
@(posedge clk)
ir <= dram[pc]; // get the ir from a dram address
@(posedge clk)
if(ir[15:13] == 3‘b000) // begin decoding
pc <= dram[ir[12:0]]; // and executing
else if(ir[15:13] == 3‘b001)
pc <= pc + dram[ir[12:0]];
else if(ir[15:13] == 3´b010)
acc <= -dram[ir[12:0]]; But readibility suffers a bit
... ….
pc <= pc + 1;
end

Control Structures
• Comparison of if and case

reg[15:0] ir; // 16 bit instruction register
always
begin
@(posedge clk)
ir <= dram[pc]; // get the ir from a dram address
@(posedge clk)
case (ir[15:13])
3‘b000: pc <= dram[ir[12:0]];
3‘b001: pc <= pc + dram[ir[12:0]];
3‘b010: acc <= -dram[ir[12:0]];
...
endcase Much better readable,
pc <= pc + 1; or?
end

Control Structures
• The case statement may even include unknown

and high-impedance values
reg reg1; // a one bit register

case (reg1)
1‘bz: $display(„reg1 is high impedance“);
1‘bx: $display(„reg1 is unknown“);
default: $display(„reg1 has the following value: %b“, reg1);
endcase

Control Structures
• Casex and casez even allow high-impedance or unknown values in case-statements:
module decoder;
reg [7:0] reg1;
always
begin
Case 2 …
reg1 = 8‘bx1x0x1x0;
casex(reg1)
8‘b001100xx: $display(“Case 1“);
8‘b1100xx00: $display(“Case 2“);
8‘b00xx0011: $display(“Case 3“);
8‘bxx001100: $display(“Case 4“);
endcase
end
endmodule
• Note that casez treats z and x as don´t care values

Functions and Tasks
• Using modules you are able to partition large pieces of code. But modules imply structural boundaries. If this
is not the case, you may use functions and tasks.
• Example (see next page):
module advanced_decoder_with_function;
reg signed [15:0] m [0:8191];// signed 8192 x 16 bit memory
reg signed [12:0] pc; // signed 13 bit program counter
reg signed [12:0] acc; // signed 13 bit accumulator
reg ck; // a clock signal
always
begin: executeInstructions
reg [15:0] ir; // 16 bit instruction register
@(posedge ck)
ir <= m [pc];
@(posedge ck)
case (ir [15:13]) // as before
3'b111: acc <= multiply(acc, m [ir [12:0]]);
endcase
pc <= pc + 1;
end

Functions and Tasks
function signed [12:0] multiply

(input signed [12:0] a,
input signed [15:0] b);
begin: serialMult
reg [5:0] mcnd, mpy;
mpy = b[5:0];
mcnd = a[5:0];
multiply = 0;
repeat (6)
begin
if (mpy[0])
multiply = multiply + {mcnd, 6'b000000};
multiply = multiply >> 1;
mpy = mpy >> 1;
end
end
endfunction
endmodule

Functions and Tasks
module advanced_decoder_with_task;
reg signed [15:0] m [0:8191];// signed 8192 x 16 bit memory
reg signed[12:0] pc; // signed 13 bit program counter
reg signed[12:0] acc; // signed 13 bit accumulator
reg ck; // a clock signal
always
begin: executeInstructions
reg [15:0] ir; // 16 bit instruction register
@(posedge ck)
ir <= m [pc];
@(posedge ck)
case (ir [15:13]) // as before
3'b111 : multiply (acc, m [ir [12:0]]);
endcase
pc <= pc + 1;
end

Functions and Tasks
task multiply
(inout [12:0] a,
input [15:0] b);
begin: serialMult
reg [5:0] mcnd, mpy;//multiplicand and multiplier
reg [12:0] prod; //product
mpy = b[5:0];
mcnd = a[5:0];
prod = 0;
repeat (6)
begin
if (mpy[0])
prod = prod + {mcnd, 6'b000000};
prod = prod >> 1;
mpy = mpy >> 1;
end
a = prod;
end
endtask
endmodule

Functions and Tasks: Comparison
• We learned, that functions and tasks are similar to • However, there are differences you must know as a
software functions and procedures. Their main goal is programmer:
to make code more readable.
Category Task Function
Calling A task call is a separate procedural statement. It A function call is an operand in an expression. It is called
cannot be called from a continuous statement from within the expression and returns a value used in
the expression. Functions may be called within
procedural and continuous assignment statements
I/O Can have zero or more arguments of any kind. Has at least one input, but no inouts or outputs. At least
one value is returned.


Timing and event A task can contain timing and event control (#, @ and Functions do not contain these statement. They are
control
wait). It can therefore be concurrently active if called executed in zero time.
from concurrent always/initial blocks.
Calling others tasks or A task may enable other tasks and functions A function can enable other functions, but not
functions
other tasks.
Storage Storage of the inputs, outputs and internally declared Storage of the inputs and internally declared
variables is static – concurrent calls share the storage. variables is static. If the function is declared
Exception: if the task is declared automatic, then the automatic, then the storage is dynamic and
storage is dynamic and each call gets its own copy. recursive calls get their own copies.

Returned values A task does not return values. If inout or output ports A function returns a single value to the expression
are changed, then this is copied back at the end of the that called it. The value to be returned is assigned
task execution. to the function identifier within the function.

Scope and Hierarchy
• Building up system hierarchically is essentially needed for mastering the complexity. We need to understand how to
address variables and where data are known across hierarchies.
top
reg r, w
Named block (begin ... end): y
Block1 (instantiated as b) task t
reg s
Block2 (instantiated as d)
reg s

Scope and Hierarchy
• Look at the following code:

module top;
reg r; //hier. name is top.r module b;
wire w; //hier. name is top.w reg s, r; /* hierarchical name is
top.instance1.s */
b instance1(); always
begin
always t; //OK
begin: y disable y; //OK
reg q; //hier. name is top.y.q disable c; //Nope, c is not known
end disable t.c; //OK
s = 1; //OK
task t; r = 1; //Nope, r is not known
begin: c // hier. name is top.t.c top.r = 1; //OK
reg q; // hier. name is top.t.c.q t.c.q = 1; //OK
disable y; //OK y.q = 1; /* OK, a different q
end than t.c.q */
endtask end
endmodule endmodule

Scope and Hierarchy
• Registers and wires are not forward referencing, can • Consequence:
only be accessed in the local scope! • Functions and tasks used in many parts of your design
should be defined in the top module.

• Since module b is instantiated in module top, a
• Be aware, in principle anything can access anything
procedural statement in b can enable tasks, functions
else through hierarchical naming, but this is not good
and named blocks defined in the local scope of
design style.
module top.

Finite State Machines
• Combinational logic and sequential elements can be

combined to the specification of a Finite State
Machine. In general, Finite State Machines (FSMs)
can be divided into 2 classes:
• Moore Machines
• The outputs depend only on the current state
• The next state depends on current state and inputs
• Mealy Machines
• The outputs depend on current state and inputs
• The next state depends on current state and inputs

FSM: Moore Machine
next state
state
Logic
State
Register
outputs
 Logic
inputs
• Characteristics of a Moore Machine:
• Outputs depend only on the current state
• Next state depends on current state and inputs

FSM: Mealy Machine
next state
state
State
Logic
Register outputs
inputs
• Characteristics of a Mealy Machine:
• Outputs and next state both depend on

current state and inputs

Table Notation
• FSMs can be represented as a State Transition Table

• The table exactly defines the values for the next state and all outputs (right side of the table)
depending on the current state and the inputs (left side)
• Logic functions can be easily derived from the table, e.g.
S0 '  S2 S1S0 ab  S2 S1S0a  ...
• Current state and next state are encoded binary (in the example: 3 bits)
• “Don‘t cares” in the input conditions current next

inputs outputs
state state
are indicated by an ‘x’
S2S1S0 a b S2‘S1‘S0‘ x y
• In each state, every possible 000 0 0 000 0 0
combination of input values should 000 0 1 001 0 0
000 1 x 101 0 0
be covered by exactly one line in
001 1 0 010 0 1
the table (not more, not less)
... ... ... ...

Graph Notation
• FSMs can also be represented as a graph

• Every state is a node in the graph
• Every state transition is an edge (arrow)
• The arrows indicate which state is taken in the next cycle, depending on the inputs and
the current state
• State encoding is displayed inside the nodes
state transition input condition

(boolean expression)
binary state encoding

001 010
initial state some other state

Example for a Moore Machine
a=0
00 current next
inputs outputs
0 state state
S1S0 a S1‘S0‘ x
a=0 00 0 00 0
always
a=1 00 1 01 0
01 0 00 0
01 1 11 1
11 01
11 x 00 0
1 a=1 0
current state
S1S0
Notation:
x assigned output
value

Example for a Mealy Machine
a = 0 / x 0
00 current
inputs
next
outputs
state state
S1S0 a S1‘S0‘ x
a=0/x0 00 0 00 0
always / x  1
00 1 01 1
a=1/x1
01 0 00 0
01 1 11 1
11 a=1/x1
01 11 x 00 0
• Because the outputs of a Mealy Machine also depend on the inputs, the values assigned
to them are annotated at the transitions
• The notation is: input condition / output assignment

State Encoding
• The encoding of the states plays a key role for the • The optimum choice depends on the used
implementation of a FSM technology (ASIC, PLA, FPGA, etc.) as well as on the
• It influences the complexity of the logic functions, the
given design goals
hardware costs of the circuits, timing issues, power,

etc.
• Therefore, several common coding styles with

different features exist
• regular encoding
• „one hot“ encoding

• ...

State Encoding
• Regular Encoding • Disadvantages:

• The minimum number of bits is used to encode the • Due to the compactness of the state encoding, the logic
states functions for calculating the next state and the outputs
can be become more complex

• At least N bits are required to encode up to 2N
• On average, many bits switch when the state changes
states
 Higher power consumption
• Codes can be assigned to states arbitrarily or according
 Glitches can occur
to certain rules (e.g., in order to minimize complexity
of the logic)
• Advantages:
• Minimum number of flipflops required

State Encoding
• One Hot Encoding • Advantages:

• N bits are used to encode N states • In many cases, less logic is required
• In each state, exactly one bit is ‘1’, all others are • many small logic functions are used instead of
‘0’ few complex functions
•  therefore the name “one hot” encoding • particularly advantageous for FPGA
implementations
• Low switching activity, resulting in ...

 lower power consumption
 less glitches
• Disadvantages:
• The number of required flipflops grows linearly

with the number of states
 High hardware costs for large FSMs

State Encoding
• One Hot Encoding – Implementation Aspects

some specific
• Best suited for distributed implementation functional
block
• One flipflop for each state
enable
• One small transition logic for each flipflop
• Each flipflop can be used to directly activate some other Logic FF
hardware block or logic function that is only needed in

this state
current
Logic FF state
Logic FF
Logic FF
From an abstract point of view, all N flipflops together can also

be seen as one single state register of size N

Verilog Coding of a FSM
• There is more than one way to model a FSM. We will show

one example of explicit style modeling of a FSM using case
statements.
• The following example uses 6 states, using one input and

three output bits.
0/010
0/000
1/100 1/101 1/110 1/110
1/100
Reset
A B C D E F
0/000 0/010
?/101
0/000

module fsm always @(*)

// The combinational logic
(input i, clock, reset, case (currentState)
output reg [2:0] out);
A: begin
reg [2:0] currentState, nextState = (i == 0) ? A : B;
nextState; out = (i == 0) ? 3'b000 : 3'b100;
end
localparam [2:0] B: begin
A = 3'b000, nextState = (i == 0) ? A : C;
// The state labels and their out = (i == 0) ? 3'b000 : 3'b100;
// assignments end
B = 3'b001, C: begin
C = 3'b010, nextState = (i == 0) ? A : D;
D = 3'b011, out = (i == 0) ? 3'b000 : 3'b101;
E = 3'b100, end
F = 3'b101;

D: begin default: begin
nextState = (i == 0) ? D : E; // oops, undefined states.
out = (i == 0) ? 3'b010 : // Go to state A
3'b110; nextState = A;
end out = (i == 0) ? 3'bxxx :
E: begin 3'bxxx;
nextState = (i == 0) ? D : F; end
out = (i == 0) ? 3'b010 : endcase
3'b110; always @(posedge clock or
end negedge reset) // The state
F: begin register
nextState = D;
out = 3'b101; if (~reset)
end currentState <= A;
// the reset state
else
currentState <= nextState;
endmodule
Concurrent Processes
• Processes should interact with each other. You may

formulate every system as one process, but readibility and
modularity are very poor.
• We need a mechanism of interactions between processes –

events and wait-statements
module dEdgeFF (output reg q, input clock, data);
always @(negedge clock) q <= data;
endmodule

Event Control Statements
• General form of event control statements (see also

BNF for Verilog):
<event_control>
::= @ <identifier>
||= @ ( <event_expression> )
<event_expression>
::= <expression>
||= posedge <scalar_event_expression>
||= negedge <scalar_event_expression>
||= <event_expression> or <event_expression>

Event control statements
• Example of two flip-flops
module toplevel(input clock, reset, output reg flop1, flop2);
always @ (posedge reset or posedge clock)

if (reset)
begin
flop1 <= 0;
flop2 <= 1;
end
else
begin
flop1 <= flop2;
flop2 <= flop1;
end
endmodule

Fibonacci Numbers
• A more abstract form of event control is the named • with F0=0 and F1=1.
event statement. It allows a trigger to be sent to • The first numbers of the Fibonacci Sequence are:
another part of the design. We will use a Fibonacci 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ...
number generator as one example.
• Quick recap: Fibonacci numbers or sequence is a

series of number which follow the following
recurrence relation:
Fn  Fn 1  Fn 2

Fibonacci Number Generators: Named Events
module topFib;
wire [15:0] number, numberOut;
numberGen ng (number);
fibNumCalc fnc (number, numberOut);
endmodule
module numberGen(number);
output reg [15:0] number = 0;
event ready; //declare the event

always
begin
#50 number = number + 1;
#50 -> ready; //generate event signal
end
endmodule

Fibonacci Number Generators: Named Events
module fibNumCalc(startingValue, fibNum);

input [15:0] startingValue;
output [15:0] fibNum;
reg [15:0] count, fibNum, oldNum, temp;
always
begin
@ng.ready //wait for event signal
count = startingValue;
oldNum = 1;
for (fibNum = 0; count != 0; count = count - 1)
begin
temp = fibNum;
fibNum = fibNum + oldNum;
oldNum = temp;
end
$display ("%d, fibNum=%d", $time, fibNum);
end
endmodule

VHDL-Fibonacci Number Generators
-- Fib.vhd
--
-- Fibonacci number sequence generator
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all;
entity Fibonacci is port (

Reset : in std_logic;
Clock : in std_logic;
Number : out unsigned(31 downto 0) );
end entity Fibonacci;

VHDL-Fibonacci Number Generators
architecture Rcingham of Fibonacci is Registers:
process (Clock, Reset) is
signal Previous : natural; begin
signal Current : natural; if Reset = '1' then
signal Next_Fib : natural; Previous <= 1;
begin Current <= 1;
Adder: elsif
Next_Fib <= Current + Previous;
rising_edge(Clock) then
Previous <= Current;
Current <= Next_Fib;

end
if;
end process Registers;
Number <= to_unsigned(Previous, 32);
end architecture Rcingham;

Wait a Moment ....
• The wait statement is a level sensitive statement. The
execution of a process stops, if the value of the conditional
is FALSE. It continues, if the value evaluates to TRUE:
module consumer (input [7:0] dataIn, input ready);

reg [7:0] in;
always
begin
wait(ready);
in = dataIn;
// do something smart here ...
end
endmodule

Consumer/Producer Handshake
• The previous example may lead to problems, since the consumer nevers signals to the producer that it
successfully accepted the data. A consumer/producer handshake mechanism overcomes this deficiency:
You´re welcome!
prodRdy
consRdy
I am ready to send you

data. Since consRdy is
TRUE, I put the data on Data are on the bus –
signal prodRdy=TRUE Thanks, I successfully And now I am ready to
the bus
received the data ... receive more ...

Consumer/ Producer Handshake
• Here we go with the code: always // The producer process
begin
module ProducerConsumer; prodReady = 0; /* indicate
reg consReady, prodReady; nothing to transfer */
reg [7:0] dataInCopy, dataOut; forever
always // The consumer process begin
begin /* …produce data and put into
consReady = 1; /* indicate “dataOut” */
consumer ready */ wait (consReady) /* wait for
forever consumer ready */
begin dataOut = $random;
wait (prodReady) prodReady = 1; /*indicate
dataInCopy = dataOut; ready to transfer */
consReady = 0; /* indicate wait (!consReady) /* finish
value consumed */ handshake */
//... Do some smart stuff here ... prodReady = 0;
wait (!prodReady) end
// complete handshake end
consReady = 1; endmodule
end
end

A concurrent Process Example
(Memory Controller)
• Let´s continue with a more complex example, a synchronous bus protocol:
clock
rwLine
addrLines
dataLines

A concurrent Process Example
(Memory Controller)
• A read cycle takes two clock periods to complete:
Bus master
drives rwLine=0
clock
and puts
addresses on
addrLines
rwLine
addrLines
dataLines

A Concurrent Process Example
(Memory Controller)
• A read cycle takes two clock periods to complete:
Bus slave has

clock read
addrLines,
and puts data
on dataLines
rwLine
addrLines Bus master

loads data
from
dataLines dataLines into
internal
registers

A Concurrent Process Example
(Memory Controller)
• A write cycle takes one clock period to complete:
Bus slave
clock writes data on
dataLines to
specified
address
rwLine
addrLines Bus master

puts data on
dataLines and
dataLines an address on
addrLine

Memory Controller
`define READ 0 initial
`define WRITE 1 begin
$readmemh ("memory.data", m);
module sbus; clock = 0;
parameter tClock = 20; $monitor ("rw=%d, data=%d, addr=%d
reg clock; at time %d",
reg[15:0] m[0:31]; rwLine, dataLines, addressLines,
//32 16-bit words $time);
reg[15:0] data;
// registers names xLine end
// model the bus lines using
// global registers
reg rwLine; //write = 1, read = 0
reg [4:0] addressLines;
reg [15:0] dataLines;

Memory Controller
always task wiggleBusLines
#tClock clock = !clock; (input readWrite,
input [5:0] addr,
initial // bus master end inout [15:0] data);
begin
#1 begin
rwLine <= readWrite;
wiggleBusLines ( `READ, 2, data);
if (readWrite) begin
wiggleBusLines ( `READ, 3, data);
// write value
data = 5; addressLines <= addr;
wiggleBusLines ( `WRITE, 2, data); dataLines <= data;
data = 7; end
wiggleBusLines ( `WRITE, 3, data); else begin //read value
wiggleBusLines ( `READ, 2, data); addressLines <= addr;
wiggleBusLines ( `READ, 3, data); @ (negedge clock);
$finish; end
end @(negedge clock);
if (~readWrite)
data <= dataLines;
// value returned during read cycle
end
endtask

Memory Controller
always // bus slave end
begin
@(negedge clock);
if (~rwLine) begin //read
dataLines <= m[addressLines];
@(negedge clock);
end
else //write
m[addressLines] <= dataLines;
end
endmodule

Another Deep Look into our Mux2
• We have already seen examples for hardware description using primitive logic gates
module mux2 (in1, in2, sel, out);

output out;
input in1, in2, sel; a1
in1
a1_o
o1
and a1(a1_o, in1, sel); out
in2 n1 a2
not n1(n1_o, sel);
and a2(a2_o, in2, n1_o); n1_o a2_o
or o1(out, a1_o, a2_o);

endmodule
sel
• The question is now, which gate and switch level primitives are built into Verilog?

Gate and Switch Level Primitives
N-input gates N-output Tristate Pull gates MOS Bi-directional

gates gates switches switches
and buf bufif0 pullup nmos tran
nand not bufif1 pulldown pmos tranif0
nor notif0 cmos tranif1
or notif1 rnmos rtran
xor rpmos rtranif0
xnor rcmos rtranif1

User Defined Primitives
• You may define your own primitives, such as:

primitive carry
(output carryOut,
input carryIn, aIn, bIn);
table
0 00 : 0;
0 01 : 0;
0 10 : 0;
0 11 : 1;
1 00 : 0;
1 01 : 1;
1 10 : 1;
1 11 : 1;
endtable
endprimitive

• Also possible in a shorter way ...

primitive smart_carry
(output carryOut,
input carryIn, aIn, bIn);
table
0 0? : 0;
0 ?0 : 0;
? 00 : 0;
? 11 : 1;
1 ?1 : 1;
1 1? : 1;
endtable
endprimitive

• UDPs are not only limited to logical primitives:
primitive latch
(output reg q,
input clock, data);
table
// clock data state output
0 1 : ? : 1;
0 0 : ? : 0;
1 ? : ? : -;
endtable
endprimitive

• Edge sensitive UDPs are also possible:
primitive dEdgeFF
(output reg q,
input clock, data);
table
// clock data state output
(01) 0 : ? : 0;
(01) 1 : ? : 1;
(0x) 1 : 1 : 1;
(0x) 0 : 0 : 0;
(?0) ? : ? : -;
? (??) : ? : -;
endtable
endprimitive
User defined primitives (UDP) have strict rules:
• Only exact one output port allowed, but multiple inputs • All primitive ports are scalar – sorry, no vectors please
• The output port must be the first port to be listed • Only logic values of 0, 1 and x can be treated on inputs and
output. The z-value cannot be specified, but if put on an
input, it is equivalent to x

Shorthand Notations for UDPs
Symbol Interpretation Comments
0 Logic 0
1 Logic 1
x unknown
? Can be 0, 1, and x Cannot be used in the
output field
b Can be 0 and 1 Cannot be used in the
output field
- No change May only be given in the
output field of a sequential
primitive

Shorthand Notations for UDPs
Symbol Interpretation Comments
(vw) Change from a value v to w v and w can be one of 0, 1, x

or b
* Same as (??) Any value change on input
r Same as (01)
f Same as (10)
p Iteration of (01), (0x) and Positive edge including x

(x1)
n Iteration of (10), (1x) and Negative edge including x
(x0)

Modeling Interconnect
• Wires (or nets or interconnect) are simply used to connect • In Verilog, you can declare wires using the keyword wire,
ports or signals. Usually they do not store values, but and also assign a delay with this interconnect:
transmit values that are driven on them by structural
elements such as gate outputs, assign statements and
registers in a behavioral model.
wire #3 x1; // a wire named x1 with a delay of 3
wire #(3,5) x2; // a wire named x2 with a rising

// delay of 3, and a falling delay of 5

Oh No, Again Mux2 ...
• Have a look again at mux2 – why did´t we have to declare a1_o, a2_0, n1_o?
module mux2 (in1, in2, sel, out);

output out;
input in1, in2, sel; a1
in1
a1_o
o1
and a1(a1_o, in1, sel); out
in2 n1 a2
not n1(n1_o, sel);
and a2(a2_o, in2, n1_o); n1_o a2_o
or o1(out, a1_o, a2_o);

endmodule
sel
• Wires can be declared implicitly. If an identifier appears in the port list of an instance of a gate primitive, module
instantiation, or on the LHS of a continous assignment, it will be implicitly declared as a net.

More on Nets ...
• By default, an implicitly declared net will be of type „wire“, and will have the default width of the connected module (or
gate primitive port).
• You may change or even turn off this default by setting the compiler directive
´default_nettype none // now an undeclared net will be

// flagged as an error
• Note that this is a fundamental difference between Verilog and VHDL – in VHDL you must declare everything that you
use, Verilog is much more „get the work done“. Be aware, typing errors in port-names (e.g. 0 instead of o) do not
automatically lead to an error in Verilog

Net Types
Net Type Modeling Usage
wire and tri Used to model connections with no logic function.

Only difference is the name. Use appropriate name
for readibility
wand, wor, triand, trior Used to model the wired logic function. Only
difference between wire and tri version of the same
logic function is the name
tri0, tri1 Used to model connections with a resistive pull to

the given supply
supply0, supply1 Used to model the connection to the power supply
trireg Used to model charge storage on a net

Examples for Wired Logic Functions
• With a net declared as wand, its output will be 0 if any of its drivers will be 0. Or, the output c will be 1 if both a and b
are 0. Output d will be unknown if any input is unknown or z, or if a = !b
module wired_and_example
(input a, b, output wand c, output d);
a
not (c,a);
not (c,b); c
b
not (d,a);
not (d,b);
a
d
endmodule

Switch Level Modeling
• Verilog enables you to even model hardware on the transistor level, but of course still only for digital
signals. Have a look at the following MOS shift register:
VDD VDD VDD
phase1 wb1 phase2 wb2 phase1

out
in
wa1 wa2 wa3
GND GND GND
phase1
phase2

module shreg
/* IO port declarations, where 'out' is the inverse
of 'in' controlled by the dual-phased clock */
(output tri out, //shift register output
input in, //shift register input
phase1, //clocks
phase2);
tri wb1, wb2; //tri nets pulled up to VDD

pullup (wb1), (wb2), (out);//depletion mode pullup devices
trireg (medium) wa1, wa2, wa3; //charge storage nodes
supply0 gnd; //ground supply
nmos #3 //pass devices and their interconnections

a1(wa1, in, phase1), b1(wb1, gnd, wa1),
a2(wa2, wb1, phase2), b2(wb2, gnd, wa2),
a3(wa3, wb2, phase1), gout(out, gnd, wa3);
endmodule

module waveShReg;
wire shiftout; //net to receive circuit output value
reg shiftin; //register to drive value into circuit
reg phase1, phase2; //clock driving values
parameter d = 100; //define the waveform time step
shreg cct (shiftout, shiftin, phase1, phase2);
initial
begin :main
shiftin = 0; //initialize waveform input stimulus
phase1 = 0;
phase2 = 0;
setmon; // setup the monitoring information

repeat(2) //shift data in
clockcct;
end
task setmon; //display header and setup monitoring

begin
$display(" time clks in out wa1-3 wb1-2");
$monitor ($time,,,,phase1, phase2,,,,,,shiftin,,,, shiftout,,,,,
cct.wa1, cct.wa2, cct.wa3,,,,,cct.wb1, cct.wb2);
end
endtask
task clockcct; //produce dual-phased clock pulse

begin
#d phase1 = 1; //time step defined by parameter d
#d phase1 = 0;
#d phase2 = 1;
#d phase2 = 0;
end
endtask
endmodule

Result of Simulation
time clks in out wa1-3 wb1-2
0 00 0 x xxx xx
100 10 0 x xxx xx
103 10 0 x 0xx xx
106 10 0 x 0xx 1x
200 00 0 x 0xx 1x
300 01 0 x 0xx 1x
303 01 0 x 01x 1x
306 01 0 x 01x 10
400 00 0 x 01x 10
500 10 0 x 01x 10
503 10 0 x 010 10
506 10 0 1 010 10
600 00 0 1 010 10
700 01 0 1 010 10

Strength Definitions
• Sometimes hardware is designed to overwrite signals by stronger drivers. Verilog enables you to model
this. Have a look at the following SRAM example:
Bitline
Wordline
address
g4
w1
g2 w4 w3
write g3
g1
dataIn 4-T SRAM cell
g5
dataOut

SRAM Cell: Here´s the Code
module sram
(output dataOut,
input address, dataIn, write);
tri w1, w3, w4;
bufif1 g1(w1, dataIn, write);

tranif1 g2(w4, w1, address);
not (pull0, pull1) g3(w3, w4), g4(w4, w3);
buf g5(dataOut, w1);
endmodule

module wave_sram //waveform for testing the static RAM cell
#(parameter d = 100);
wire dataOut;
reg address, dataIn, write;
sram cell (dataOut, address, dataIn, write);
initial begin
#d dis;
#d address = 1; #d dis;
#d dataIn = 1; #d dis;
#d write = 1; #d dis;
#d write = 'bx; #d dis;
#d address = 'bx; #d dis;
#d address = 1; #d dis;
end

task dis; //display the circuit state
$display($time,, "addr=%v d_In=%v write=%v d_out=%v",
address, dataIn, write, dataOut,
" (134)=%b%b%b", cell.w1, cell.w3, cell.w4,
" w134=%v %v %v", cell.w1, cell.w3, cell.w4);
endtask
endmodule

Result of Simulation
time addr d_in wr d_out 134 Comment

100 x x x x xxx
300 1 x x x xxx
500 1 1 x x xxx
700 1 1 1 1 101 Write function
900 1 1 0 1 101 Read function
1100 1 1 x 1 101
1300 x 1 x x x01 SRAM holds value
1500 1 1 x 1 101
1700 1 1 0 1 101 Read function

Strength Definition
Strength name Strength Element modeled Declaration Printed

level abbreviation abbreviation
Supply Drive 7 Power supply supply Su
Strong Drive 6 Default gate and strong St
assign output strength
Pull Drive 5 Gate and assign pull Pu

output strength
Large Capacitor 4 Size of trireg net large La
capacitor
Weak Drive 3 Gate and assign weak We
output statement
Medium Capacitor 2 Size of trireg net medium Me
capacitor
Small Capacitor 1 Size of trireg net small Sm
capacitor
High Impedance 0 Not applicable highz Hi

Logic Delay Modeling
• The realistic delay of a circuit can only be modeled if each component can be assigned a realistic logic delay. Verilog offers a wide
variation of options to model hardware to a so-called „sign-off“ point.
module triStateLatch d
qQ qDrive
(output qOut, nQOut, qOut
input clock, data, enable);
tri qOut, nQOut; clock
nQ nQDrive
nd nQOut
not #5 (ndata, data);
data
nand #(3,5) d(wa, data, clock),
nd(wb, ndata, clock);
enable
nand #(12, 15) qQ(q, nq, wa),
nQ(nq, q, wb);
bufif1 #(3, 7, 13) qDrive (qOut, q, enable),
nQDrive(nQOut, nq, enable);
endmodule

Specifying Time Units
• We have used the time units up to now without knowing the meaning of it. There is a compiler directive `timescale which
specifies the time unit and the time precision
`timescale 10ns / 1ns
#7 // The delay is now 7*time_unit = 70ns

// Time is maintained to the scale of 1ns
#7.748 // The delay is 77ns
`timescale 10ns / 10ns

#7.5 // The delay is rounded to 80ns
• You may use s for seconds, ms for milliseconds, us for microseconds, ns for nanoseconds, ps for picoseconds and fs for
femtoseconds

Arrays of Instances
• The previous way of instantiating eight xors was quite cumbersome.... A nicer way is:
module xor8 (output [1:8] xout, input [1:8] xin1, xin2);
xor a[1:8](xout, xin1, xin2);
endmodule

Arrays of Instances
• Defining register banks is now quite easy,

even using scalar inputs for all dff:
module register_bank
(output [7:0] q,
input [7:0] d,
input clock, clear);
dff r[7:0](q, d, clock, clear);
endmodule

Generate Blocks
• Arrays of instances are only limited to quite simple repetitive structures. Much more power has the
generate command:
module xorGen
#(parameter width = 4,
delay = 10)
(output [1:width] xout,
input [1:width] xin1, xin2);
generate
genvar i;
for(i=1; i<=width; i=i+1)begin :xi
assign #delay xout[i] = xin1[i] ^ xin2[i];
end
endgenerate
endmodule

A Complex Example: N-Bit Adder
• Consider modeling an n-bit adder (with n>1) that also • In this case, not all generated instances of the adder
has condition code outputs to indicate if the result are connected the same
was negative, produced a carry, or produced a 2´s
• You may then use if-then-else and case-statements in
complement overflow.
the for-loop to generate these differences.

Backtrack: Theory of Adders
Basic Adder Cells
• Half Adder:
• Can be used to calculate the sum of two bits A1 and A2.
C  A1 A2
S  A1  A2
• Full Adder: Cout  Cin ( A1  A2 )  A1 A2

S out  A1  A2  Cin
• For adding binary numbers having a bitwidth of more than one single bit.
• These equations can be realized either by logic gates (AND, OR, XOR) or by two half-adders and an OR gate.

Adders / Subtracters for Binary Coded Integers
Parallel Adders
• Ripple Carry Adder:
• Chained full-adders where the carry „ripples“ through the whole chain from the LSB to the MSB.
• The addition time depends on the wordlength of the operands.

module adderWithConditionCodes
#(parameter width = 1)
(output reg [width-1:0] sum,
output reg cOut, neg, overFlow,
input [width-1:0] a, b,
input cIn);
reg [width -1:0] c;

generate
genvar i;
for (i = 0; i <= width-1; i=i+1) begin: stage
case(i)
0: begin
always @(*) begin
sum[i] = a[i] ^ b[i] ^ cIn;
c[i] = a[i] & b[i] | b[i]& cIn | a[i] & cIn;
end
end

width-1: begin
always @(*) begin
sum[i] = a[i] ^ b[i] ^ c[i-1];
cOut = a[i]&b[i] | b[i]&c[i-1] | a[i] & c[i-1];
neg = sum[i];
overFlow = cOut ^ c[i-1];
end
end
default: begin
always @(*) begin
sum[i] = a[i] ^ b[i] ^ c[i-1];
c[i] = a[i]&b[i] | b[i] & c[i-1] | a[i] & c[i-1];
end
end
endcase
end
endgenerate
endmodule

Verilog Timing Model
• Welcome to the black-belt section of this Verilog lecture!

• Let´s have a look on how the Verilog timing models really works.
• Using Gate level primitives, the complete model is always sensitive to input changes. In other words:
inputs are always evaluated and determine the output to change.
• Any input change at any time will cause the gate instance to execute the evaluation of its output.
module nandLatch (output q, qBar,

input set, reset );
nand #2 (q, qBar, set),

(qBar, q, reset);
endmodule

• Let´s assume that a scheduled event in our gate level timing example has not yet been executed (the model was still
busy executing a scheduled event – e.g. due to the 2 timeunits delay). If now a new event is generated for the output of
that element, the previously scheduled event will be cancelled and the new one will be put in the event queue instead.
set
reset No impact on output!

2ns
• If a pulse is shorter than the propagation time at a gates input, the output of the gate will not change
• Inertial delay is the minimum time a set of inputs must be present for a change in the output to be seen
• Verilog gate models have – by definition – inertial delays just greater than their propagation delay – so watch out!

• Using behavioral models, you have to specify the sensitivity list yourself
• The sensitivities are context dependent – you decide to which input your model is sensitive!
• The model below is only sensitive to clockedges which are more than 5 timeunits apart
module dff (output reg q,

input d, clock );
always @ (posedge clock)

#5 q <= d;
endmodule

• The procedural timing model of Verilog does not cancel events in the event queue.
• If there are multiple events scheduled for the same time, the execution order is indeterminate. Therefore
you must avoid writing such bad code:
module dff (output reg q,

input d1, d2, clock );

begin
q <= d1;
q <= d2;
end
endmodule

Event Queue
• Event Queue in_gate1 15ns

Example i1 &
10ns out_gate
s1
sel_inverter 1
& result
selbar s2
sel 1
Time Time 8ns
Signal Signal &
Value Value i2
in_gate2 12ns
i1
i2
sel
0 ns 20 ns 40 ns 60 ns 80 ns 100 ns 120 ns 140 ns

Event Queue
• Event Queue Example (cont.)

• Event queue 0 ns 0 ns 0 ns 10 ns 30 ns 70 ns 100 ns
i1 i2 sel i1 i2 sel i1
before Initialization 0 1 0 1 0 1 0
10 ns 10 ns 12 ns 15 ns 30 ns 70 ns 100 ns
• Event queue i1 selbar s2 s1 i2 sel i1
1 1 0 0 0 1 0
for t = 0 ns
12 ns 15 ns 22 ns 30 ns 70 ns 100 ns
s2 s1 s2 i2 sel i1
0 0 1 0 1 0
• Event queue
for t = 10 ns selbar U
s1 U
...
s2 U
result U
0 ns 20 ns 40 ns 60 ns 80 ns 100 ns 120 ns 140 ns

Event Driven Simulator
• The following figure shows the basic elements of an event driven simulator
Schedules new events
Removes all
Time-ordered Current time events
Scheduler
event list
updates looks at evaluates
Network connections
Gate outputs Gate Models
(inputs)

A View to a Black Hole: <= and =

A View to a Black Hole
• Consider again block and nonblocking assignment • Look at a = #4 b; and a <= #4 b;

below:
• The first statement is identical to
• In isolation, a = b and a <= b will perform the same
bTemp = b;
function – they will assign the value currently in b to
#4 a = bTemp;
the register a
• The second statement calculates the value of b,
• This even holds true for the statements #3 a = b; and
schedules an update event at 4 time units in the
#3 a <= b;
future, and continues executing the process in the
• The difference is how the assignment is made, and
current time
what consequences it has on other variable
assignments

A View to a Black Hole
• Another example on the difference (assume initially b=1, a=0):
begin begin
a = #2 b; a <= #2 b;
c = #2 a; c <= #2 a;
end end
 for the left example, b is taken, stored into a temporary variable, and delays for 2 timeunits the update
event for a. When this update event is executed (a = 1), the process continues. Same for c, so c gets the
value 1 after 4 timeunits.
 for the right example, in the first line b is evaluated, an event is scheduled 2 timeunits in the future, and
the process continues. Another event (for c) is scheduled also 2 timeunits in the future. Therefore, c is 0
after 2 timeunits.

How the Event Scheduler Deals with <= and =
• In fact, the event scheduler deals differently with „regular“ and non-blocking events. The following
table tries to shed some light onto this:
a = b; b is calculated and used immediately to update a. Note that the next

statement in the behavioral process that uses a will use this new value. If
a is an output of the process, elements on a´s fanout list are scheduled in
the current time as a regular evaluation event.
a<=b; b is calculated and a non-blocking update event is scheduled for a during

the current time. Execution of the process continues. This new value for
a will not be seen by other elements, until the non-blocking update event
is executed

a = #0 b; b is calculated and an update event is scheduled as a regular event in the

current time. The current process will be blocked until the next simulation
cycle when the update of a will occur and the process will continue
executing
a <= #0 b; This is identical to a <= b;
a = #4 b; This is like a = #0 b; except that the update event and the continuation of
the process is scheduled 4 time units in the future.
a <= #4 b; This is like a <= #0 b; except that a will not be updated (using a non-blocking
update event) until 4 time units in the future.

#4 a = b; Wait 4 time units before doing the action for a = b;
The value assigned to a will be the value of b 4 time units later.
#4 a <= b; Wait 4 time units before doing the action for a <= b; The value assigned
to a will be the value of b 4 time units later.

A View to Another Black Hole
• Mixing gate level instances and behavioral code might make you loose your head ...
module blackhole
(output reg f,
input a, b);
reg q;
initial
f = 0;
always
@ (posedge a)
#10 q = b;
not (qbar, q);
always
@ q
f = qbar;
endmodule
Synthesizable Verilog
• Most Verilog language constructs are synthesizable (i.e. can be implemented as Hardware gates). This may
even be dependent on the Synthesis tool used (e.g. Synopsys DC, or FPGAExpress)
• In general, the following constructs are not supported by a synthesis tool:
• Wait construct
• Repeat, fork, join
• Data types: time, real, realtime
• User defined primitives
• Initial (a one-time sequential active flow)
• Delay operator (#)
• Switch level primitives: *mos where * is n, p, c, rn, rp, rc; pullup, pulldown; *tran+ where * is (null), r and + (null),
if0, if1 with both * and + not (null)
• 7-signal strength (or higher) logic values
• Tri-State Net definitions (such as triand,trior, tri0, tri1, trireg; but wand, wor, supply0, supply1 are!)
• Some operators (/, %, ===, !==)

Coding for Synthesis
When writing HDL code keep in mind the hardware intent.
• When you start writing HDL think about the hardware you wish to
produce.
• Draw a picture and sketch out a timing diagram so you'll know

exactly what the hardware should look like when the HDL is
synthesized.

Latches:
1. If possible, avoid using latches in your design. Using latches can be more difficult to design correctly and to
verify.
2. If latches are used, partition the logic in a separate module.
3. You can avoid inferred latches by using any of the following coding techniques.
• Assign default values at the beginning of a process
• Assign outputs for all input conditions
• Use else (instead of elseif) for the final priority branch

Latches:
In Verilog, latches are synthesized for a variable when all the following statements are true.
1. Assignment to the variable occurs in at least one but not all of the branches of a Verilog control statement.
2. Assignment to the variable does not occur on a clock's edge.

Latches:
Verilog - latch inferred example
always @ (enable or data)

begin
if (enable)
begin
Q= data;
end
end

Latches:
Verilog - latch avoidance example
always @ (enable or ina or inb)

begin
if (enable)
begin
data_out = ina;
end
else
begin
data_out = inb;
end
end

Latches:
Verilog - latch inferred example
input [3:0] data_in;

always @ (data_in)
begin
case (data_in)
0 : out1 = 1'b1;
1,3 : out2 = 1'b1;
2,4,5,6,7 : out3 = 1'b1;
default : out4 = 1'b1;
endcase
end

Latches:
Verilog - latch avoidance example
input [3:0] data_in;

always @ (data_in)
begin
out1 = 1'b0;
out2 = 1'b0;
out3 = 1'b0;
out4 = 1'b0;
case (data_in)
0 : out1 = 1'b1;
1,3 : out2 = 1'b1;
2,4,5,6,7 : out3 = 1'b1;
default : out4 = 1'b1;
endcase
end

Flip-Flops:
For Verilog , flip-flops are inferred when edges occur in an event list of
posedge clock or negedge clock.
For Verilog, non-blocking assignments should be used to model

synchronous circuits.
Verilog - flip flop inferred example

begin
data_out <= data_in;
end

Synchronous Reset:
1. Is easy to synthesize.
2. Requires a free-running clock for reset to occur.
Verilog- Flip Flop with synchronous Reset

begin
if (reset)
data_out <= 1'b0;
else
end

Asynchronous Reset:
1. Does not require a free-running clock for a reset to occur
2. An asynchronous reset is harder to implement because it is a special
signal like a clock. Usually, a tree of buffers is inserted at place and route.
3. Must be synchronously de-asserted in order to ensure that all flops exit the reset condition on the
same clock. Otherwise, state machines can reset into invalid states.
4. For both VHDL and Verilog, the asynchronous signal must be in the process and always sensitivity list.

Asynchronous Reset:
Verilog- Flip Flop with asynchronous Reset
always @ (posedge clock or negedge reset_n)

begin
if (!reset_n)
data_out <= 1'b0;
else
end

Combinatorial Logic:
For both Verilog and VHDL,
1. envision the combinational circuit that will be synthesized. 3. when modeling purely combinational logic, ensure signals
are assigned in every branch of conditional signal
assignments.
2. avoid combinational feedback that is the looping of
combinational processes.
4. ensure the sensitivity list of process statements in VHDL
and the event list of always statements in Verilog are
complete.

5. For VHDL, do not include the after clause in a signal assignment. This clutters the code and makes it harder to read.
VHDL - after clause used

C <= a and b after 10ns;
VHDL - after clause removed

C <= a and b;

6. For Verilog, do not include delays in assignment statements.
Verilog - Delay used

assign #10 c = a & b;
Verilog - Delay not included

assign c = a & b;
7. For Verilog, the always statement is supported by synthesis. The initial statement is not.

Sensitivity Lists: Verilog example

always @ (posedge clk)
1. Specify complete sensitivity lists in each of your process
begin
VHDL statements and Verilog
q <= d;
always blocks. If you don't use a complete sensitivity list, the end
behavior of the pre-synthesis design may differ from the VHDL example
post-synthesis netlist. seq_example: process (clk)
begin
if (clk'event and clk = '1') then
2. For sequential blocks, the sensitivity list must include the a <= d;
clock signal that is read by the end if ;
block. If the sequential block uses a reset signal, include the

reset signal in the sensitivity list.

Sensitivity Lists: VHDL example

comb_ex: process (a, b)
begin
3. For combinational blocks, the sensitivity list must include
if (b = '0') then
every signal that is read by the process.
sum <= a + 1;
else
sum <= a - 1;
Verilog example
end if;
always @ (a or b)
end process;
begin
if (b == 0)
sum = a + 1;
else
sum = a - 1;
end

Global Variables: module mod(a, b, c, d);

input a, b, c;
1. Using non-local variables in tasks or functions may cause a
output [2:0] d;
simulation and synthesis
function [2:0] func;
mismatch. input in1, in2;
begin
In the following example simulation does not re-evaluate
func = {in1, in2, c};
func when the value of
end
input c changes: endfunction
assign d = func(a, b);
endmodule
2. Modify the Verilog RTL source to pass all needed variables

as inputs to the task or function.

Block vs. Non-blocking:
Verilog only.
1. Blocking assignments execute in sequential order, non- 2. When writing synthesizable code use non-blocking
blocking assignments executed concurrently. assignments in sequential blocks (-> always @ (posedge
clock) blocks).
3. Use blocking assignments in pure combinational blocks.

Otherwise, the simulation behavior of the HDL and
gate_level designs may differ.

Signal vs. variable:
VHDL only.
• During simulations, signal assignments are scheduled for • Variable assignments take effect immediately, and they
execution in the next simulation cycle. take place in the order in which they appear in code.
• When writing synthesizeable code, use signals instead of

variables to ensure that the simulation behavior of the pre-
synthesis design matches post-synthesis netlist.

Case vs. if-then-else:
For VHDL and Verilog
• a case statement infers a single-level multiplexer. • if-then-else statement infers a priority-encoded, cascaded
combination of multiplexers.
• if-then-else statement infers a priority-encoded, cascaded

combination of multiplexers. • A mux is a faster circuit. If the priority-encoding structure
is not required use the case statement rather than if-then-
else statement.

Case vs. if-then-else: 3. The default in Verilog case branch is essential to ensure all
branch values are covered and avoid inferring latches.
For VHDL and Verilog
4. The others in VHDL default case branch are optional to
ensure all branch values are covered.
for combinational logic from a case statement, ensure
that
1. Default outputs are assigned immediately before the case

statement
OR
2. The outputs are always assigned regardless of which

branch is taken through the case statement. This will avoid
latches being inferred.

Constraining Synthesis
Overview
HDL – Coding
Synthesis
(e.g. Synopsys
Design Vision)
Place and Route

(e.g. Cadence
Innovus)

Overview
Verilog, System
Verilog,…
HDL – Coding
Synopsys Design
Constraints (SDC)
Synthesis
(e.g. Synopsys
Common Power Design Vision)
Format (CPF)
Place and Route

PnR Constraints (e.g. Cadence
Innovus)

Overview
Verilog, System
Verilog,…
HDL – Coding
SDF File
Synopsys Design (Contains Delay
Information)
Constraints (SDC)
Synthesis
(e.g. Synopsys
Design Vision) Netlist
Common Power (The actual
Format (CPF) Circuit as Verilog
Netlist)
Place and Route
GDSII
Innovus)
(The finished
Layout)
Overall debugging workflow
HDL – Coding Compile Errors?
Testing & Sim. Behavioral Errors?

Implement SDC &
CPF Files
Synthesis Synthesis Errors?
SDF backann. Sim. Behavioral Errors?
Adjust PNR- PnR Issues? Place and Route

Scripts
Post-layout Sim. Behavioral Errors?
Focus of this lecture
Verilog, System
Verilog,…
HDL – Coding
SDF File
Synopsys Design (Contains Delay
Information)
Constraints (SDC)
Synthesis
(e.g. Synopsys
Design Vision) Netlist
Common Power (The actual
Format (CPF) Circuit as Verilog
Netlist)
Place and Route
GDSII
Innovus)
(The finished
Layout)
Constraining
• Format for Constraints is in SDC (Synopsys Design Constraints)

• Used to add real world behavior to simulation / circuit
• E.g. Delay
• Output:
• Netlist
• Physical Netlist
• SDF-File
• Timing Reports
• First Area estimation
• Etc.

Overview Too many constraints defined:

 Too tight timing
 Use of higher-drive cells
 Higher power demand
 Larger area
 Unneccessary effort for the engineer to meet
unneccessary timings
 Wrong calculation of critical path
 Sub-optimal routing
 Etc.
Too less constraints defined:

 Probably no functional netlist output
 Does not work for desired clocks
 Setup time violation
 Hold time violation
 Reset Pin is not connected correctly
 Etc.

What can be constrained?
• Reset Pin  Exclude from „critical path“ calculation

• Setup Time & Hold Time
• Clock Frequency
• Jitter
• Data Input Characteristics
• Asynchronous Clocks
• Which Cells should be used for certain modules
• Etc.

False path constraining
Examples of „false paths“
• Clock domain crossings in which double synchronizer logic has been added
• Registers that might be written once at power up
• False paths for static signals due to node merging
• Reset or test logic
• Ignore paths between the write and asynchronous read clocks of an
asynchronous distributed RAM (when applicable)

Syntax set_false_path
[-setup] Command applies only for setup path
[-hold] Command applies only for hold path
[-rise] Command applies only to rising edges
[-fall] Command applies only to falling edges
[-from object_list] False Path originates at this object
[-to object_list] False Path ends at this object
[-through obect_list] False Path passes this object
[-fall_from object_list] Only falling edges /origin at this object
…
[-comment string] Add a comment to your false path

Examples:
Example 1: Non-functional path example - The following picture shows an example of a

non-functional path. Because both multiplexers are driven by the same select signal, the
path from Q to D does not exist, and should be defined as a false path.
e.g.: set_false_path -through [get_pins MUX1/a0] -through [get_pins MUX2/a1]

Examples:
• Example 2: False paths for static signals arising due to merging of modes: Suppose you have a
structure as shown in the picture below. You have two modes, and the path to multiplexer output is
different depending upon the mode. However, in order to cover timing for both the modes, you have
to keep the “Mode select bit” unconstrained. This result in paths being formed through multiplexer
select also. You can specify "set false path" through select of multiplexer as this will be static in both
the modes, if there are no special timing requirements related to mode transition on this signal.
Specifically speaking, for the scenario shown:
• Mode 1 : set_case_analysis 0 MUX/SEL

Mode 2 : set_case_analysis 1 MUX/SEL
Mode with Mode1 and Mode2 merged together : set_false_path -through MUX/SEL
Examples:
• Example 3: Synchronized signals: Let us say we have a two flop synchronizer

placed between a sending and receiving flop (The sending and receiving flops
may be working on different clocks or same clock). In this scenario, it is not
required to meet timing from launching flop to first stage of synchronizer. We can
consider the signal coming to flop1 as false, since, even if the signal causes flop1
to be metastable, it will get resolved before next clock edge arrives with the
success rate governed by MTBF of the synchronizer. This kind of false path is also
known as Clock domain crossing (CDC).

Examples:
set_ideal_network [get_ports rst_n]

set_false_path –fall_from [get_ports rst_n]
 In this example, if regs are resetted by negative edge: All Regs can be
resetted asynchronously. (No timing checks during reset assertion)
However, the deassertion from their reset state must occur
synchronously! (As command only applies to falling edge)

Clock constraining:
create_clock
–period period_value describes the clock period
[source_objects] explains the source of the clock e.g.
[get_ports X] or [get_nets X]
[-name clock_name] Sets the name of the Clock signal
[-waveform edge_list] Used to define a custom clock waveform
[-add] Generate more than one clock from same
source
[-comment string] Add a comment to your clock

Clock constraining:
• Creating a Clock Example:

create_clock –period 10.0 [get_ports clk] –name “CLOCK“ -waveform {5 10}

Clocl Constraining:
• Multiple clocks from same source

create_clock –period 10.0 [get_ports clk] –name “CLOCK“
-waveform {5 10} –comment „Standard Speed“
create_clock –period 8.0 [get_ports clk] –name “FASTER_CLOCK“
-waveform{4 8} –comment „Maximum Speed“ –add

Clock constraining:
• Setting Transition times for a clock

set_clock_transition
[-rise] Transition time for the rising clock edge
[-fall] Transition time for the falling clock edge
[-max] Used to set a maximum transition time
[-min] Used to set a minimum transition time
transition_time The value of the transition time
clock_list List of all clocks, this setting is applied to
e.g. [get_clocks …] or [all_clocks]

Clock constraining:
• Setting Transition times for a clock Example:

set_clock_transition –rise –max 0.2 [get_clocks CLOCK]
set_clock_transition –rise –min 0.1 [get_clocks CLOCK]
set_clock_transition –fall –max 0.2 [get_clocks CLOCK]
set_clock_transition –fall –min 0.1 [get_clocks CLOCK]

Clock constraining:
• Setting jitter of clocks
set_clock_uncertainity
[-from | -rise_from | -fall_from Startpoint of uncertainity
clock]
[-to | -rise_to | -fall_to clock] Endpoint of uncertainity
[-setup] Command applies to setup time
[-hold] Command applies to hold time
[-rise] Command applies to rise of clk
[-fall] Command applies to fall of clk
Remark:
-from & -to are Uncertainity_value The actual value
obsolete [-object_list] Clocks the command applies to
Clock constraining:
• Setting jitter of clocks Example:

set_clock_uncertainty –setup 0.5 –rise [get_clocks CLOCK]
set_clock_uncertainty –hold 0.2 –rise [get_clocks CLOCK]

Clock constraining:
• Setting latency from clock source (e.g. PLL) to clocked devices
set_clock_latency
[-rise] Command applies to rising edge
[-fall] Command applies to falling edge
[-min] Used to set a min value
[-max] Used to set a max value
[-source] Set the delay of (off-chip) source to clock
[-late] Set the option for the late edge of the clk
[-early] Set the option for the early edge of the clk
[-clock clock_list] List of the clocks this option applies to
delay The actual delay
object_list The source of the clock
Clock constraining:
• Setting latency from clock source to clocked device Example:

set_clock_latency –source –early 0.5 [get_clocks CLOCK]
set_clock_latency –source –late 1.0 [get_clocks CLOCK]

Clock constraining:
• Working with multiple clocks

set_clock_groups
[-logically_exclusive] Set clocks as mutally exclusive. Coupling possible
[-physically_exclusive]Set if clocks do not coexist in real design
[-asynchronous] Define clock relation as asynchronous
[-allow_path] (Use only with –asynchronous) Allows cross-talk
analysis if asynchrouns option is selected
[-name group_name] Set a name for the clock group
[-comments string] Add a comment to the clock group

Clock constraining:
• Working with multiple clocks Example

set_clock_groups –asynchronous –group [get_clocks CLK1]
–group [get_clocks {CLK2 CLK3}] –allow_paths

Port constraining:
set_input_delay
[-clock clock_name] Specify the reference Clock
[-clock_fall] Sets this command to be referring to clocks neg edge
[-level_sensitive]
[-rise]
[-fall]
[-max]
[-min]
[-add_delay]
delay_value
port_pin_list
Chapter 10
VERILOG-AMS

Verilog, Verilog-A and Verilog-AMS
• Analog and mixed-signal modules should be interfaceable with Verilog models
• Analog signals are continous: the value of a signal at any point may be any value from a continuous range of values
• Verilog-A is designed to allow modeling of systems that process continuous-time signals
• Verilog-AMS combines Verilog-HDL and Verilog-A !
Verilog-AMS
Verilog-
Verilog-A
HDL
Digital, Analog-discrete events and analog-time-continous signals

Applications of Verilog-AMS
• There are five main reasons why engineers use Examples:

Verilog-AMS
• Modeling basic components: R, C, L, …
• Modeling/using semiconductor components: BJTs,

1. To model components
MOSFETs, …
2. To create testbenches
• Functional blocks: Filters, S&H, D/A or A/D-Converters
3. To accelerate simulation
• Multi-disciplinary components: Sensors, Actuators, …
4. To verify mixed-signal systems
• Logic components (Gates, latches, registers, …)
5. To support the top-down design process

Applications of Verilog-AMS
• Top-down design flow has been discussed before • There is no commercial simulator available that is not
partitioning the analog (timecontinuous) and digital
• With heterogeneneous systems and/or
(timediscrete), for efficiency reasons – but quite good
multidisciplinary components, Verilog-AMS offers
hidden from the user
analog, mixed-signal and digital language constructs
that allow the creation of an abstract system-level
model – and simulation in de-facto one simulator

The Resistor in Verilog-A
• The known equation for a linear resistor is v=r*i

In Verilog-A the description of the resitor looks like this:
Collection of physical
signal types A real valued
parameter r
ìnclude “disciplines.vams”
module resistor (p,n);

Ports make the connection to
parameter real r=0; // resistance
the components: direction and
inout p, n;
electrical p, n; type need to be specified
analog
V(p,n) <+ r*I(p,n); Across and Through
endmodule quantities

The Capacitor in Verilog-A
module capacitor (p,n);

parameter real c=0; // capacitance (F)
inout p, n;
ddt is the time derivative of
electrical p, n; its argument
analog
I(p,n) <+ c*ddt(V(p,n));
endmodule

The Inductor in Verilog-A
module inductor (p,n);

parameter real l=0; // inductance (H) ddt is the time
inout p, n; derivative of its
electrical p, n;
argument
analog
V(p,n) <+ l*ddt(I(p,n));
endmodule

Constant-Valued Sources
In Verilog-A the description of a constance valued voltage resp. current source looks like this:
Note: direction is only output
here!
module vsrc (p,n);
parameter real dc=0; // DC voltage (V)
output p, n;
electrical p, n;
analog
V(p,n) <+ dc;
endmodule
module isrc (p,n);

parameter real dc=0; // DC current (I)
output p, n;
electrical p, n;
analog
I(p,n) <+ dc;
endmodule

Controlled Sources
Voltage controlled-voltage source:
module vcvs (p,n, ps, ns);

parameter real gain=1; // Voltage gain
output p, n;
input ps, ns;
electrical p, n, ps, ns;
analog
V(p,n) <+ gain*V(ps, ns);
endmodule

Controlled Sources
Current controlled-current source:
module cccs (p,n, ps, ns);

parameter real gain=1; // Voltage gain
output p, n;
input ps, ns;
electrical p, n, ps, ns;
analog
I(p,n) <+ gain*I(ps, ns);
endmodule
Respectively, the other controlled sources (current controlled voltage source,

Voltage controlled current sources can be modeled in this way:
V(p,n) <+ gain*I(ps, ns);
I(p,n) <+ gain*V(ps, ns);

Structural Models
In Verilog-A the description of a structural model (voltage source and resistor) looks like this:
ìnclude “vsrc.vams”
ìnclude “resistor.vams”
module smpl_ckt;
electrical n;
ground gnd;
vsrc #(.dc(1)) V1(n,gnd);

resistor #(.r(1k)) R1(n,gnd);
endmodule
Note that every circuit has one node designated as ground or reference node. This node defines zero potential for all
disciplines (and has no discipline itself).

Ideal Diode
Here we go with the listing for an ideal diode:
module diode (a, c);

inout a, c;
electrical a, c;
analog begin
@(cross((V(a, c)+I(a,c), 0));
if((V(a,c)+I(a,c)) > 0)
V(a,c) <+ 0;
else
I(a,c) <+ 0);
endmodule
Adding voltage and current looks strange, but this is a very robust way to check if the
diode is in quadrant one!

RC chain Example
RC chain:
SPICE netlist Verilog-A netlist
// RC Circuit
*RC Circuit
R1 in out 10k module RC(in, out);
C1 out gnd 10u inout in;
inout out;
electrical in;
electrical out;
ground gnd;
resistor #(.r(10k)) r1 (in, out);
capacitor #(.c(10u)) c1(out,gnd);
endmodule

Inverter Example
Comparison of inverter implementation based on two MOS-transistors:

An inverter SPICE netlist Verilog-A netlist Pulse generator using Verilog-A Netlist
*CMOS Inverter //CMOS Inverter ‘include” discipline.h”
MP1 out in vdd vdd module INVERTER(in, module pwlgen (pwlOut);
+ pch L=1u W=32u out); inout pwlOut;
MN1 out in gnd gnd input in; electrical pwlOut;
+ nch L=1u W=16u output out; ground gnd;
V1 in gnd electrical in; vsource #(.pwl ([0,0 10e-6, 5]) vpwlgen()
+ pwl( 0, 0, 10e-6, 5 ) electrical out; (pwlOut, gnd);
.model nch NMOS ground gnd; Endmodule
+ level=49 PM#(.l(1e-6),.w(32e-6))
.model pch PMOS mp1(out,in,vdd,vdd); Toplevel Testbench
+ level=49 NM#(.l(1e-6),.w(16e-6)) ‘include” discipline.h”
.tran 1n 100u mn1(out,in,gnd,gnd); ‘timescale 1ns/1ns
.save v(in) v(out) endmodule module testbench();
.end electrical in;
electrical out;
pwlgen pwlgen1 (in);
rc rc1 (in, out);
endmodule
Ideal Opamp
Contribution statements are not the only way to assign values to analog signals. Sometimes a method called indirect
branch assignments is helpful. One example is the ideal opamp (inf gain, input resistance, zero output resistance, …). We
know that Vin = 0 (the difference voltage of the two inputs pins, and the output adjusts to a voltage according to the
feedback. Here is the code:
module ideal_opamp(pout, nout, pin, nin);

output pout, nout;
input pin, nin; What is this?
electrical pin, nin, pout, nout;
branch (pout,nout) out;
branch (pin,nin) in;
analog begin
V(out): V(in) == 0;
end
endmodule

Ideal Opamp
analog begin
V(out): V(in) == 0;
This is an indirect branch assignment which reads „drive V(out) The left-hand-side of this indirect branch assignment must be
so that V(in) == 0, meaning: V(out) is driving with a voltage either an access function (such as V(out)), or ddt or idt applied
source, and the source voltage needs to be assigned in a way to an access function). The tolerance for the equation is taken
so the given assignment is satisfied. Any branches in the from the argument on the left side of the equality operator.
equation are only probed, but not driven! Such as V(in).

Real Opamp
Modeling a real opamp with imperfections (offset, finite gain, finite slew-rate, frequency behaviour, limits, ….) is much more difficult.
Examples for this are manifold available online.
The key is to add/delete/limit the currents or voltages at the boundary of the opamp. Example:
Input bias currents

//
I(n7) <+ IB;
I(n9) <+ IB ;
//
// Input current offset
//
I(n7,n9) <+ IOFF/2;

A Simple A/D Converter
Two basic operations:
• Quantization:
Mapping of a continuous signal into a set of
discrete ranges
• Coding:
Source: Sedra&Smith: Microelectronic Circuits.
VFS = KVREF
Assignment of a binary code to each discrete
range Change VREF until unkown vx is
determined within  0.5 LSB error:
in
VFS
v x  VFS  bi 2 i
 n 1
i 1 2

A/D Converter Techniques
1. Linear quantization
2. Nonlinear quantization: µ-law and A-law methods
- Combination of linear quantization and nonlinear
compression/expansion
e.g. Compander chip used to perform both com-
pression and expansion for speech samples
3. Delta modulation (Oversampling)
4. Sigma-delta modulation (Oversampling)
5. Adaptive differential quantization(e.g. ADPCM), ...
fT
Analog ADC Digital
Anti-aliasing
(Quantization +
Filter (LPF) Sample&Hold Coding)
Circuit

A/D Converter - Errors
• Differential linearity error (DLE) = Actual code step width - 1 LSB

k
• Integral linearity error =  DLE (i )
i0
• Offset error
• Missing code
DLE > 1 LSB
missing code
• Nonmonotonicity
Input voltage inceases, output code decreases
Good ADC:
DLE < 0.5 LSB, no missing code Source: Sedra&Smith: Microelectronic Circuits.

In Verilog-A the description of an ideal A/D converter model looks like this:
ìnclude “constants.vams”
module ideal_adc(in,clk,out);
input in,clk;
output [0:adc_size-1] out;
voltage in,clk,out;
parameter integer adc_size = 8 from [1:inf);

parameter real fullscale = 1.0;
parameter real delay_ = 0, trise = 10n, tfall = 10n;
parameter real clk_vth = 2.5;
parameter real out_high = 1, out_low = 0 from (-inf:out_high);
real sample,thresh;
real result[0:adc_size-1];
integer i;

In Verilog-A the description of a A/D converter model looks like this:
analog
begin
@(cross(V(clk)-clk_vth, +1))
begin
sample = V(in);
thresh = fullscale/2;
for(i=adc_size-1;i>=0;i=i-1)
begin
if (sample > thresh)
begin
result[i] = out_high;
sample = sample - thresh;
end
else result[i] = out_low;
sample = 2*sample;
end
end
V(out) <+ transition(result,delay_,trise,tfall);
end
endmodule

Filters
In Verilog-A the description of a low-pass filter model looks like this:

module lowpass1(in,out);
input in;
output out;
voltage in,out;
parameter real freq_p1 = 1M from (0:inf);
analog V(out) <+ laplace_nd(V(in), [1] , [1,1/(`M_TWO_PI*freq_p1)] );
endmodule
laplace_nd implements the numerator-denominator form of the Laplace transform filter.

laplace_nd(expr, n, d [ , ε ])
Where n is an vector of M real numbers that contains the coefficients of the numerator, and d is a vector of N real
numbers that contains the coefficients of the denominator.
The transfer function is

Energy Domains
Potential Flow Conserved

Electrical
Electromotive Force Charge (C) Energy (J)
(V)
Electromotive Force Current (A) Power (W)
(V)
Magnetic
Magnetomotive Force Magnetic Flux (Wb) Energy (J)
(A-turn)
Magnetomotive Force Magnetic Flux Rate Power (W)
(A-turn) (Wb)

Energy Domains

Translational
Kinematic
Displacement (m) Force (N) Energy (J)
Velocity (m/s) Force (N) Power (W)

Energy Domains

Rotational Kinematic
Angle Torque (Nm) Energy (J)

Angular Velocity Torque (Nm) Power (W)
(rad/s)
Thermal
Temperature Entropy Flow (W/K) Power (W)
Temperature Heat (J) Energy-Temperature (JK)
Temperature Heat Flow (J/s) Power-Temperature (WK)

Energy Domains

Fluidic
Pressure (N/m2) Flow (m3) Energy (J)
Pressure (N/m2) Flow Rate (m3/s) Power (W)
Radiant
Luminous Intensity (cd) Optical Flux (lm) cd2 sr)

Motor Model
ìnclude “vsrc.vams”
module test;
ground gnd;
vsrc #(.dc(1)) V1(drive,gnd);

motor M1(drive, gnd, shaft);
endmodule
… // continued next page

Motor Model
// motor
module motor (shaft, p, n);
parameter real km = 4.5; // motor constant (Vs/rad)
parameter real kf = 6.2; // flux constant (Nm/A)
parameter real j = 0.004; // inertial of shaft (Nms2/rad)
parameter real d = 0.1; // drag (friction) (Nms/rad)
parameter real r = 5.0; // motor winding resistance (Ohms)
parameter real l = 0.02; // motor winding inductance (H)
inout shaft, p, n;
rotational_omega shaft;
electrical p, n;
analog begin
V(p,n) <+ km*Omega(shaft)+r*I(p,n)+l*ddt(I(p,n));
Tau(shaft) <+ kf+I(p,n) –d*Omega(shaft)-j*ddt(Omega(shaft));
end
endmodule

CAD4Soc - 2024 - Singlepage - 115 - End

Uploaded by

Copyright:

Available Formats

You might also like

CAD4Soc - 2024 - Singlepage - 115 - End

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CAD4Soc - 2024 - Singlepage - 115 - End

Uploaded by

Copyright:

Available Formats

Chapter 4

ANALOG AND TRANSISTOR

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 115

SPICE = Simulation Program with IC Emphasis

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 116

• In June 1998, the father of SPICE and driving

• Donald Pederson (72) holds no patents, but

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 117

• PSPICE by MicroSim was the first and became the

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 118

• SPICE is a general-purpose circuit simulation program • Circuits may contain

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 119

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 120

T=1E12 G=1E9 MEG=1E6 K=1E3 MIL=25.4E-6

M=1E-3 U=1E-6 N=1E-9 P=1E-12 F=1E-15

immediately following a scale factor (SPICE is

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 121

• passive devices: R, L, C, G • Dependent Sources

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 122

 Some SPICE implementations have up to 20 different levels of MOS models (distinguished by

LEVEL=1 Shichman-Hodges model All

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 123

• Typical questions • Simulation types and domains

• Are the setup/hold time requirements met? analysis (.four)

• Additional analysis types: temperature analysis

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 124

• But how do we get the results, for example,

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 125

Topology of the circuit

Device Models Whereby:

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 126

• But how do we get the results? (example)

Set of nonlinear ADEs (algebraic differential equations)

 Pivot methods (reordering of matrix elements)

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 128

• Solving a linear equation system (Gauss)

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 129

• Solving a linear equation system (LU Method)

a11 a12 a13 l11 0 0 a11 1a12u12au1313 l 11 l 11 u 12 l 11 u 13

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 130

• Solving a nonlinear equation system G

• If not, continue iteration: u2

• Things look more complex u1

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 131

• Solving a nonlinear equation system G

i.e.: find solution u1 for nonlinear function f(u1) = 0

Linearize, solve, check u1

in matrix case, but the

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 132

• Solving a differential equation system

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 133

Initial trial operating point OP Analysis

Linearize eqs. around trial point

Next discrete time step Solve linear equations

Summer Term | TU Darmstadt | Integrated Electronic Systems Lab 134

consumption) error limits) result in higher run-times

• Simulators offer a wide set of parameters and options

• Algorithms may sometimes fail due to convergence