Professional Documents
Culture Documents
Aging Analysis of Digital Integrated Circuits
Aging Analysis of Digital Integrated Circuits
Dominik Lorenz
Doktor-Ingenieurs
genehmigten Dissertation.
3
Contents
1. Introduction 9
1.1. Objective of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2. Semi-custom design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3. Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2. Fundamentals 15
2.1. (Static) timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1. Gate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2. Timing graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3. Incremental timing analysis . . . . . . . . . . . . . . . . . . . . . . 18
2.1.4. Sequential circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.5. Path enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2. State of the art of aging analysis . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1. Circuit level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.2. Gate level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5
Contents
6. Conclusion 119
Bibliography 125
6
Contents
Acronyms 143
7
1. Introduction
In biology, aging of an organisms is defined as a progressive, irreversible process that
inevitably ends with death. The maximal lifetime of an individual is significantly affected
by aging [Wikipedia, 2011].
The same is true for integrated circuits (ICs). Aging effects cause the circuit per-
formance to degrade and they have a significant impact on the specified lifetime of a
circuit.
Circuit aging can be regarded as a time-dependent variation. Aging is not the only
variability the IC industry must cope with. In fact, variability has always been a fact
of life in the IC industry. The reasons for variability can be classified into these three
categories:
Variations of the operating conditions: Primarily changes in supply voltage and oper-
ating temperature.
Process variations: These denote deviations in process parameters from their nominal
values that are present in an IC after it has been manufactured. Examples are
variations in the concentration of dopants or the oxide thickness. In contrast to
aging, manufacturing variations do not change over time once the IC has been
manufactured.
Variations of the operating conditions are handled during the design process by speci-
fying a range (e.g. VDD,min and VDD,max ) within which the IC has to meet the specified
properties (e.g. frequency or power consumption). Process variations have traditionally
been considered by specifying so-called process corners which describe e.g. for delay the
best or worst realistic combinations of process parameters, thus establishing generous
guardbands against parameter variations. This modeling is increasingly considered to
be problematic and statistical design methodologies have therefore been proposed as a
remedy for dealing with manufacturing variations. A detailed overview of this field is
given in Blaauw et al. [2008].
Time-dependent variation caused by aging effects, on the other hand has by far not re-
ceived a similar amount of attention. Aging effects lead to a change of device parameters
over time dependent on the operating conditions over lifetime and the workload. The
workload defines the portion of the lifetime a device spends in a particular operating
point. Negative bias temperature instability (NBTI), for instance, is regarded as the
most severe aging effect nowadays. NBTI results in an increased threshold voltage (Vth )
9
1. Introduction
of PMOS transistors whenever the transistor is in inversion. The threshold voltage drift
(∆Vth ) is accelerated by elevated temperature or supply voltage.
The impact of variations on the circuit performance increases due to the continued
technology scaling [Nassif, 2000]. The same absolute variation of the gate length, for
instance, increases the relative variation since the nominal gate length is scaled by a
factor of 0.7 × every two years according to Moore’s law [Moore, 1965]. The supply
voltage is scaled as well. Therefore, a supply voltage variation or a threshold voltage
variation have a larger impact on circuit performance. This is the case if a constant
absolute variation for the different variability mechanisms is assumed. However, the
variation caused by aging effects is going to increase, since these effects strongly depend
on the strength of the electrical fields. The electrical fields continue to increase with
scaling, because the transistor sizes are scaled more aggressively than the supply voltage
since several technology generations1 .
Variability is the reason why performance and power consumption vary from chip
to chip and over time. To be able to still manufacture working and reliable products
despite increasing variability, the performance guardbands must be increased or other
techniques must be applied to make a product robust against variations. Examples of
such techniques are dynamic voltage frequency scaling (DVFS) [Semeraro et al., 2002;
Talpes and Marculescu, 2005; Herbert and Marculescu, 2009] or the use of redundant
circuitry [Lyons and Vanderkulk, 1962]. Therefore, the operating frequency is not as
high as it may be, chip area is wasted and the power consumption is higher than nec-
essary. Hence, conservative safety margins and variation-aware design techniques make
the design of competitive products more difficult and lead to a minimization or even
elimination of the advantages of moving to the next technology node. One way out of
this dilemma according to Austin et al. [2008] are innovative design techniques to reduce
the reliability costs again.
10
1.2. Semi-custom design flow
In the course of this thesis seven pre-publications [Lorenz et al., 2009a,b, 2010a,b,c,d,
2012] have been contributed to the scientific community. Furthermore, a patent for a
time margin monitor for the assessment of aging and process variation was filed and
granted [Henzler et al., 2009].
“The RTL to GDSII flow has undergone significant changes in the last 25
years. The continued scaling of CMOS technologies significantly changed the
objectives of the various design steps. The lack of good predictors for delay
has led to significant changes in recent design flows. Challenges like leakage
power, variability, and reliability will continue to require significant changes
to the design-closure process in the future”.
Everything starts with a product specification which includes constraints for perfor-
mance, area, and power. Further constraints, especially in advanced technologies, are
reliability and yield. The next step is to write a synthesizable description in a HDL
(VHDL or Verilog). This representation at RTL is then transferred into a logic rep-
resentation by logic synthesis [Sentovich et al., 1992]. A netlist of generic cells (e.g.,
NAND and NOT cells), which represent the logic function, is obtained and mapped to
cells from a standard cell library. Next, the cells of the netlist are placed and the nets
are routed. Before the chip can be processed, tested and packaged, the sign-off is per-
formed by thoroughly verifying that the timing and other electrical performances meet
the specification.
2
The first microprocessor, Intel’s 4004, was fabricated 1971.
11
1. Introduction
12
1.3. Structure of the thesis
Figure 1.2.: Aging-aware timing analysis of a circuit. Aging effects degrade transistor
parameter, which results in increased gate delays over time. The critical
path delay increases as well and the timing specification might be violated
during the specified lifetime.
increases the accuracy of the TA by knowing the parasitic capacitance and resistance of
the nets. Finally, the coupling capacitances are available for timing sign-off, which again
increases the accuracy of the TA. Hence, an aging-aware TA is beneficial at all synthesis
steps from technology mapping on.
13
2. Fundamentals
15
2. Fundamentals
A
B "1" Z
CL
(a) NAND gate with (b) Corresponding wave-
a transition at input forms
A. Timing arcs are de-
picted as lines.
assumed that the output transition is caused by the switching of just one input signal
(single input switching assumption). A simultaneous transition at two or more inputs
can significantly increase the gate delay. Hence, gate models that take simultaneous
input switching into account are more accurate [Chen et al., 2001].
To obtain the gate delays, the gates of a standard cell library are pre-characterized
by SPICE simulations. Those simulations are used to create a gate model. During the
STA, just the gate model is evaluated. This is the reason, a STA is much faster than
performing a SPICE simulation for the entire circuit.
There are several techniques to model the gate delay. One of the first was to use the
following equation [Sapatnekar, 2004, chap. 4]:
d = k1 · CL + k2 (2.1)
The gate delay is split into two parts. The dependence of the gate delay on the output
load (CL ) is given by k1 and the intrinsic gate delay is given by k2 . CL is given by
the input capacitance of succeeding gates and the interconnect capacitance. This quite
simple model neglects the impact of the input slope (sIN ) on the gate delay.
To consider the impact of the slope, signals are modeled as ramps for STA (see Fig-
ure 2.1(b)). A signal is defined by two values: the arrival time (AT) and the correspond-
ing slope. The slope (s) is given by the transition time. This is the time a signal takes to
change from logic “0” to logic “1”. Hence, bounds for the logic values have to be defined
(e.g., 50 % of VDD for signal crossing and 20 % and 80 % of VDD for transition time).
A commonly used gate model is based on a look-up table (LUT). The industry quasi-
standard, the liberty file format from Synopsys, is such a LUT-based gate model. It
stores the gate delays in 2-dimensional LUTs dependent on input slope and output load
(see Figure 2.1):
d = f (sIN , CL ) (2.2)
Values in between the stored values of the LUTs are obtained by interpolation. The
input slope is now required in addition to the output load in order to compute the gate
delay. For this reason, the output slope (sOU T ) is stored dependent on sIN and CL in
LUTs as well. Now, the input slope of a gate can be calculated based on the output slope
of its predecessor gate. An advantage of LUT-based gate models is that their accuracy
can easily be increased by characterizing the gate at additional supporting points.
16
2.1. (Static) timing analysis
Due to the ongoing miniaturization, the input capacitance of the gates decreases and
the resistance of the interconnect network increases. This leads to an increased inaccu-
racy when purely capacitive loads are assumed. Due to this, an effective capacitance
was introduced by Qian et al. [1994]. The effective capacitance represents the complex
interconnect network by a single value. This enabled the continued usage of the existing
models.
However, the signal waveform in advanced technologies differs significantly from a
simple ramp (signals have a long “tail” now), which leads to inaccuracies as well. This
is the reason why current source models (CSMs) are developed. The goal of CSMs is
to model the signal waveform more accurately by modeling gates as voltage controlled
current sources which charge the complex interconnect network and the fan-out gates.
Several approaches have been published. The composite current source model (CCSM)
[Synopsys, 2006] stores time-current waveforms in the LUTs. The effective current source
model (ECSM) [Cadence, 2007] differs only slightly from the CCSM by storing time-
voltage waveforms, which are again converted to current waveforms and applied to the
interconnect network. CCSM and ECSM have the advantage that they are compatible
to the existing timing analysis tools and were adopted quite fast by the industry.
Another CSM approach by Croix and Wong [2003] is to store the static output current
depending on gate input voltage and gate output voltage in LUTs. By solving differential
equations the voltage waveform at the succeeding gate input can be computed.
The aging-aware gate model introduced in Chapter 4 is LUT-based. However, Knoth
et al. [2011] show that the approach can be combined with a CSM [Knoth et al., 2010]
to an aging-aware CSM.
17
2. Fundamentals
1
7
2 10
6
S 3 T
8
11
4
9
5
(a) Gate level netlist for ISCAS’85 cir- (b) Simplified timing graph for c17
cuit c17 (for every net just one node is added
and not two, as it is described in the
text)
The gate model provides a delay for a rising and a falling input transition. Hence,
every TG edge has two edge weights. To be able to use unmodified standard graph
algorithms, this should be avoided. A very clean and elegant way is described by Ju and
Saleh [1991]: For every net two nodes are added to the timing graph, one for a rising
transition, and another one for a falling transition. If two nets, u and v, are connected
by an inverting gate, the node u for a rising (falling) transition is connected to the node
v for a falling (rising) transition. If it is a non-inverting gate, the node u for a rising
(falling) transition is connected to v for a rising (falling) transition. That way every
edge in the timing graph has just one edge weight.
Two additional nodes are added to the TG. A source node node (S) connected to all
primary input (PI) nodes; and all primary output (PO) nodes are connected to a sink
node (T ) (see Figure 2.2). To model unequal arrival times at the primary inputs, delays
can be assigned to the edges from S to the PIs.
When the TG is annotated with gate delays as edge weights, the circuit delay can be
determined. The circuit delay is defined by the path (P ) with the longest path delay
(D(P )). This path is called critical path (Pcrit ), its path delay is the critical path delay
(D(Pcrit ) or just Dcrit ).
The circuit delay can be determined by path-based or block-based methods. The
path-based method enumerates all paths in the TG and computes their path delays by
adding up the gate delays along the path. The critical path with the longest path delay
determines the circuit delay. The path-based method has an exponential worst-case
time-complexity because the number of paths in a circuit increases (in the worst case)
exponentially with the number of nodes.
The block-based method propagates the arrival times (ATs) through the circuit, start-
ing at S until T is reached. For a given node n, AT(n) is the maximal point in time
18
2.1. (Static) timing analysis
that the signal at n can change1 . The arrival time of a node n can be calculated when
the arrival times of all predecessor nodes i and the gate delays d of all incoming edges
are known (see Figure 2.3):
AT(n) = max AT(i) + d((i, n)) (2.3)
i∈predecessors(n)
AT(T ) corresponds to the circuit delay. In contrast to the path-based method, each
node is just visited once, hence, the time complexity is O(|N |).
Hence, the difference between the block-based and the path-based method is that the
former calculates maximal arrival times for each node whereas the latter computes all
path delays first and then calculates the maximum out of them.
Both methods add up the gate delays without considering the logic function of the
gate. Hence, the critical path may not be sensitizable. A path is not sensitizable if there
doesn’t exist an input assignment that enables a signal to propagate along the path (see
Section 5.3.3). A path that is not sensitizable is called false path. If the critical path
is a false path, then the circuit delay is overestimated. The path-based method can
easily recognize a false path by checking every path whether it is sensitizable. For the
block-based method this is more difficult, since one cannot easily determine the path
with the next longest path delay if the critical path is a false path. An efficient method
to enumerate the paths with respect to the path delay is discussed in Section 2.1.5.
When the static timing analyzer is used in the inner optimization loop, the design is
often modified only slightly before the timing must be reevaluated. It would be very
inefficient to analyze the complete design again in this case. The incremental timing
analysis instead just analyzes the part of the timing graph that is affected by the change.
The foundation of an incremental timing analysis is that every timing quantity (e.g.,
arrival time or gate delay) has a valid flag (e.g., ATvalid or dvalid ). It is crucial that
whenever the circuit and therefore the timing graph changes the valid flags of timing
quantities that are affected are reset. This is done by two recursive functions reset_node
and reset_edge. In reset_edge the controlling node of the arrival time (ATctrl ) is
needed. The controlling node is the predecessor node that defines the arrival time (i.e.,
the node i in Equation 2.3 that is responsible for the maximal arrival time at n)
1
or minimal time a signal changes if hold time constraints should be checked
19
2. Fundamentals
Function reset_node(node)
/* Function to set the arrival time of a node to invalid */
ATvalid (node) ← F alse;
foreach successor suc of node do
/* Delay of outgoing edges are invalid because edge input slope is
invalid */
reset_edge(node, suc);
end
Function reset_edge(u,v)
/* Recursive function to set the delay of an edge (u, v) to invalid */
dvalid ((u, v)) ← F alse;
if ATctrl (v) == u then
/* Arrival time at node v is invalid because it was controlled by
edge (u, v) */
reset_node(v);
end
Figure 2.4 shows an example for the incremental timing analysis. Due to a de-
sign change the arrival time at node 6 is invalid, resulting in the other nodes marked
red (or dark gray) also being invalid. Now the circuit delay is reevaluated by calling
update_node(T ). This results in recursively calling update_node for all invalid nodes
20
2.1. (Static) timing analysis
Function update_node(node)
/* Recursive function to update the arrival time of a node */
if ATvalid (node) == T rue then
return AT(node)
else
AT(node) ← maxi∈predecessors(n) update_node(i) + update_edge((i,node))
end
Function update_edge(u,v)
/* Recursive function to update the gate delay of an edge (u, v) */
if dvalid ((u, v)) == F alse then
/* Update gate delay based on input slope and output load */
slope = get_slope_from_node(u);
load = get_load_from_node(v);
d((u, v)) = get_delay_from_LUT(slope, load);
dvalid ((u, v)) = T rue
end
return d((u, v))
down to node 6.
The methods to identify possible critical paths in an aged circuit, discussed in Chap-
ter 5, continuously modify the TG by removing nodes and edges. Hence, without an
incremental TA, the STA would have to be performed whenever the TG is modified.
There are several other timing quantities of interest. AT gives the maximal time a
signal takes from the source node to a given node. Delay to sink (D2S), on the other
hand, defines the maximal time a signal takes from a given node until it reaches the sink
node. D2S is calculated as follows:
D2S(n) = max D2S(i) + d((n, i)) (2.4)
i∈successors(n)
To calculate D2S for all nodes, one starts at T and computes D2S for the predecessor
nodes until S is reached.
The required time (REQT(n)) is the time a signal must be at a node n such that it
arrives at T in time. Therefore, REQT at T must be specified first. REQT at a node n
is the difference between REQT(T ) and the D2S at n:
The difference between required time and arrival time is called slack (SLACK):
21
2. Fundamentals
1
7
2 10
6
S 3 T
8
11
4
9
5
Figure 2.4.: Example of the incremental timing algorithm. Arrival time at red (dark
grey) nodes is not valid. To update arrival time at node T, all invalid
arrival times are recursively updated (dashed arrows).
A negative slack implies that the signal arrives at a node after it has to in order to fulfill
the required time at the sink node. The slack of a node is an important information for
circuit optimization.
• setup time (tSU P ) is the time interval the data signal has to be stable before the
active clock edge to sample the date correctly. This can be verified during STA by
the following inequality:
The clock-to-Q delay (dCLK−to−Q ) is the delay from an active clock edge until the
output of the sending FF changes. Dmax is the maximal delay of the combinational
circuit to the receiving FF input.
22
2.1. (Static) timing analysis
PI PO
combinatorial
logic
D Q
Clk
TSUP THLD
Clk
Figure 2.5.: Diagram of a sequential logic circuit. The timing constraints (setup and
hold time) of a flip-flop are given as well.
• hold time (tHLD ) is the time interval that the data signal has to remain stable
after the active clock edge to sample the date correctly. This can be checked by
the following inequality:
dCLK−to−Q + Dmin > tHLD (2.8)
Dmin is the minimal circuit delay to the receiving FF input. Dmin is obtained by
the STA tool in the early mode.
The STA algorithm must be modified slightly to analyze sequential circuits. The flip-
flops are removed from the netlist. Every signal connected to a FF input becomes a PO
and every signal connected to a FF output becomes a PI. The remaining circuit is now
purely combinational and the TG can be set up. The timing constraints for the flip-flops
are considered by weights of edges to the sink node and from the source node. Edge
weights from S to former FF outputs are set to dCLK−to−Q .
To check the setup time constraints, the edge weights from former FF inputs to T are
set to tSU P . If the maximal arrival time at the sink node is less than tCLK , then all
setup time constraints are met.
To check the hold time constraints, the edge weights from former FF inputs to T are
set to tHLD . Now, if the minimal arrival time at the sink node is greater than tCLK ,
then all hold time constraints are met. The minimal arrival time at a node is calculated
by simply exchanging the max-operation in Equation 2.3 with the min-operation.
23
2. Fundamentals
controlling nodes are stored for the delays to sink. The controlling node of a node n
is the successor node which is responsible for the maximal D2S at n. By following the
path from a node to its controlling node starting at S, the critical path is determined.
However, often not only the critical path itself is of interest, but also those paths with
the next longest path delays. These paths are required, for instance, to simulate their
delay again on circuit level. This problem is referred to as k most critical paths problem.
Determining the next longest paths is not as easy as determining Pcrit in a block-based
STA approach.
Ju and Saleh [1991] propose an efficient way to compute the k most critical paths.
One advantage of their algorithm is that k does not have to be specified in advance, but
the path enumeration can be suspended and continued as required. The key idea of the
algorithm is the introduction of branch slacks (BSs).
In an initialization phase, the BSs are calculated for every edge in the TG. Therefore,
the successor nodes vi of a node u are sorted according to the following cost function
fcost :
fcost (u, vi ) = d((u, vi )) + D2S(vi ) (2.9)
This is the maximal delay from node u to T over the edge (u, vi ). The branch slack is
now the difference between the cost function of two nodes vi and vi+1 next to each other
in the sorted successor list of u:
The branch slack of an edge (u, vi ) tells us that the path with the next longest path
delay, which branches out from node u, goes over edge (u, vi+1 ) and its path delay is
BS(u, vi ) shorter. Figure 2.6 shows the calculation of the branch slacks.
In the path enumeration phase, the next longest paths are determined by means of
the branch slacks. First, Pcrit is determined as discussed before. The path with the next
longest path delay branches out of Pcrit at the edge (u, vi ) with the smallest branch slack.
This path can be determined by branching off at u to vi+1 and following the controlling
nodes of vi+1 recursively until the sink node is reached.
Additional paths can be computed as follows. The path Pk+1 with the next longest
path delay should be determined. Pk+1 can be generated by branching out at a branch
24
2.1. (Static) timing analysis
point from one of the k already determined paths. Therefore, a data structure list[i] is
required, which keeps a list of branch points for every path Pi that is already determined.
This list is sorted according to the branch slacks. Hence, the branch point resulting in
the path with the next longer path delay which branches out from Pi comes first in the
list. The data structure next_delay is another sorted list, which contains the delay of
the next longest path branching out from every already determined path Pi . The next
longest path delay for Pi can be calculated as follows:
When the next longest path should be determined one takes the first path from
next_delay and looks in list[i] for the first branch point for this path (see Algorithm 2).
In Figure 2.7 an annotated TG with branch slacks and delays to sink is given. Table 2.1
shows the corresponding execution trace of the k most critical path algorithm for the
first five iterations. Given are the determined path and its delay, the branch points with
corresponding branch slacks and the next longest path delay of a path branching out
from this path. The first path is Pcrit with a path delay of 12. Pcrit has two branch points
S with BS = 1 and node 6 with BS = 2. The branch points are ordered in non-decreasing
order with respect to the branch slack. Hence, next_delay is 11 (= D(Pcrit )−BS((S, 2)))
and the corresponding path is branching out from Pcrit at S. To determine the path in
the second iteration the path with the largest next_delay is taken. In this case there is
only one next_delay, hence, the path in the second iteration is branching out from Pcrit
at S. The used branch point is crossed out (indicated by the arrow with the 2 on top
standing for the iteration in which it is crossed out). The next_delay = 9 is computed
for the second path and a new next_delay for the first path must be calculated as well
(indicated by the arrow with the 2 on top). The execution trace shows how the algorithm
continues to determine the next three longest paths.
25
2. Fundamentals
9
1 4 5
12 7 5
8 3 0
2 4
10
2
6 BS=2
=
BS
= 1 7 2 4 0
BS 4
S BS=
3 3
BS=2
T
2 8 2 0
3
11 5 11
BS
BS=
4
=
4 2 2
1 6 9
4
5
Figure 2.7.: TG with branch slacks (arc between to edges) and delays to sink (number
next to the node)
Table 2.1.: Execution trace of the k most critical paths algorithm for the five slowest
paths.
26
2.2. State of the art of aging analysis
The algorithm discussed so far is not only capable of enumerating all paths from S
to T , it can determine all paths from an arbitrary node to T . In order to enumerate
all paths from the source node to an arbitrary node, the algorithm must be slightly
changed. Most important is to introduce join slack (JS). Join slacks are quite similar to
branch slacks. The join slack is the delay difference between two path segments from S
to a given node.
In this thesis the k most critical paths algorithm is required in Chapter 5. It is used
to consider common edges when the possible critical paths of a circuit are identified and
to determine whether a possible critical path of an aged circuit is sensitizable.
27
2. Fundamentals
After generating the degraded device models, the degraded circuit performance can be
simulated in the third step.
Commercial reliability simulators, like RelXpert [Cadence, 2003], are already available
and the latest versions of HSPICE [Synopsys, 2008] and ELDO [Karam et al., 2001]
come with an integrated reliability analysis. RelXpert can consider the impact of HCI
and NBTI. ELDO is capable of determining the degraded device parameters iteratively.
Therefore, the specified lifetime is divided into n time intervals (of equal length). The
steps one and two are conducted in every time interval. That way, the impact of the
degraded waveforms on the parameter drift can be considered.
Maricau and Gielen [2010] analyze the combined impact of aging and process variation
on circuit behavior. Like ELDO, it is an iterative approach, but the length of the time
intervals is variable. In Section 4.5.1 it is proven by a simple experiment that such an
iterative approach is (at least for digital circuits) not necessary.
A drawback of commercial tools like RelXpert and ELDO is that the degradation
equations are proprietary. Hence, the user has to trust the tool and cannot verify how
the degradation is calculated. Kufluoglu et al. [2010] show that RelXpert only reaches
an acceptable accuracy when the proprietary degradation equations are replaced by
improved user defined equations.
Reliability simulators on circuit-level can be very accurate. However, a reliability
simulation on circuit-level is quite time consuming and realistic input vectors are re-
quired. For the first step of the aging analysis, input vectors are needed that cause a
realistic/worst-case degradation of the circuit. The third step requires input vectors to
measure the degraded circuit performances. In general, the input vectors in the first and
third step are not equal.
Like SPICE simulators for timing analysis (see Section 2.1), these tools are not capable
of simulating complex digital circuits. Nevertheless, they can be used to verify the critical
aged path determined by a aging-aware timing analysis on gate level.
Although reliability simulators on circuit level are not applicable for timing analysis of
complex digital circuits, they can be used to characterize aged gate models.
28
2.2. State of the art of aging analysis
Figure 2.8.: Aged LUT-based gate model as proposed in [Chen et al., 2011].
Chen et al. [2011] propose a path-based analysis flow, although the gate model can
also be used for a block-based approach. HSPICE [Synopsys, 2008] is used to generate
several aged LUTs for different conditions like lifetime, temperature or signal probability.
This approach results in a lot of LUTs, especially when the workload at the gate inputs
should be considered. If, for instance, LUTs should be generated for five different signal
probabilities, 5 LUTs would be enough for a gate with one input (see Figure 2.8). A gate
with three inputs already needs 125(= 5 · 5 · 5) LUTs and there are gates in a standard
cell library that have even more inputs.
The aging-aware gate model GLACIER [Wu et al., 2000] considers HCI and defines a
factor α as follows:
daged
α(sIN , CL , T D) = (2.15)
df resh
The aged gate delay daged and the fresh gate delay df resh have to be simulated. df resh
is dependent on input slope sIN and output load CL . daged is also dependent on the
transition density T D at the input. For a multiple input gate, daged depends on T D at
every input. To reduce the complexity, it is assumed that the gate delay for each input
can be calculated by considering the contribution from the switching of all gate inputs
separately from one another as follows:
n
!
α= αi − (n − 1) (2.16)
X
i=1
Where n is the number of transistors connected in series and αi is the contribution of one
input pin i when just this input switches. However, this approach neglects the impact
of the workload at the other inputs and of the internal gate structure on the parameter
drift (see Section 4.3.3).
When a reliability simulator on circuit level is used to characterize a gate library, then
the gate models are valid just for one specific use profile. Hence, the gate models are
dependent on the use profile. If, for example, the specified life time changes, the entire
library has to be re-characterized.
29
2. Fundamentals
The advantage of such a gate model is that it is independent of the use profile and the
workload, because they only impact the parameter drift and the drift is computed during
the analysis and not in advance during the gate model characterization.
As long as the parameter drift caused by aging is small enough, a linear approximation
for the dependence of ∆d and ∆Vth can be used (see Figure 2.9):
∂d
daged = df resh + · ∆p (2.18)
∂Vth
Paul et al. [2006] use the α-power law [Sakurai and Newton, 1990] to obtain the
sensitivity ∂V
∂d
th
:
Id ∝ (Vgs − Vth )α (2.19)
It is assumed that the gate delay is solely determined by recharging the output load (no
intrinsic gate delay):
CL · VDD const.
d= = (2.20)
Id (Vgs − Vth )α
Differentiating the expression with respect to Vth results in:
∂d α·d
= (2.21)
∂Vth (Vgs − Vth )
In contrast to that, Kumar et al. [2006] determine the dependence ∆d(∆p) by simu-
lation and store the results in LUTs. Kumar et al. [2006] also describe how to calculate
the threshold voltage drift iteratively based on the reaction diffusion (RD) equations for
NBTI (see Section 3.1.1). However, this involves solving an equation for every stress and
recovery phase during the lifetime and makes the calculation of the drift very inefficient,
especially for long lifetimes. A third contribution is that arbitrary signals result in the
30
2.2. State of the art of aging analysis
Figure 2.10.: Transformation of arbitrary signals into periodic signals with same signal
probability and transition density.
long term
prediction model
ΔVth
time
Figure 2.11.: Drawing of an NBTI threshold voltage drift caused by consecutive stress
and relaxation phases (thin black line) and the ∆Vth drift given by the long
term prediction model (thick orange line).
same drift as periodic signals with same signal probability and transition density. Hence,
it is not necessary to know the exact waveform of the gate input signals, but it is enough
to know their signal probabilities and transition densities (see Figure 2.10). Otherwise,
aging analysis would not be feasible, if exact input signals are unknown when a circuit
is developed.
Wang et al. [2007b] derive a closed form equation to calculate the upper bound of
the parameter drift caused by NBTI (see long term prediction model in Figure 2.11).
Hence, the drift does not have to be calculated iteratively. It is also shown that NBTI
has a negligible impact on the clock distribution network of a sequential circuit. For
sequential circuits it is important that the delay of the clock distribution network to the
sending and the receiving FFs have the same delay. Only that way it is assured that the
signals in the combinational logic have one full clock period to propagate from sending
to receiving FFs. Wang et al. [2007b] argue that the clock period is unaffected by aging,
because the clock signals to the sending and receiving FFs are delayed equally. However,
clock gating is not considered. If the sending and receiving FFs are in separate clock
domains, both clock signals can degrade differently. This would have to be considered
during the analysis of sequential circuits.
The gate model by Luo et al. [2007b] is based on the α-power law as well. It considers
different temperatures in active and standby mode. In standby mode the transistors
degrade as well, but due to the lower temperature and the exponential dependence of
parameter drift on temperature, the parameter drift is much smaller. In Section 4.3.3 it
is shown how different temperatures can be considered for the gate model introduced in
this thesis.
Luo et al. [2007a] introduce a model that takes the stacking effect into account. Stack-
ing effect describes the effect that not all transistors in a transistor stack have VDD as
31
2. Fundamentals
32
Table 2.2.: Comparison of state-of-the-art gate models with the proposed aging-aware gate model AgeGate.
Gate model Description NBTI HCI Individual Aged output Use profile
transistor slope independent
drifts model
[Chen et al., 2011] aged LUT 3 3 3 7 7
[Wu et al., 2000] aged LUT 7 3 3a 3 7
[Paul et al., 2006] α-power law 3 7 7 7 3
[Kumar et al., 2006] simulated sensi- 3 7 7 7 3
tivities
[Wang et al., 2007b] closed form ex- 3 7 7 7 3
pression for pa-
rameter drift
[Luo et al., 2007b] different tempera- 3 7 7 7 3
ture in active and
standby mode
[Luo et al., 2007a] considers stacking 3 7 7 7 3
effect
[Kumar et al., 2007a] individual tran- 3 7 3b 7 3
sistor drifts
considered
[Lu et al., 2009] jointly considers 3 7 7 7 3
aging effects and
process variation
AgeGate based on canoni- 3 3 3 3 3
cal gate model
a
neglects impact of the workload at other inputs and of internal gate structure on parameter drift
b
Doesn’t describe formal way to calculate individual transistor drifts
33
2.2. State of the art of aging analysis
3. Aging effects and their impact on
standard cells
The objective of this thesis are methods to analyze the degradation of complex digital
circuits due to aging. But prior to that, the aging effects and their impact on the
performance of single gates are investigated.
Aging effects can be classified into effects that cause a catastrophic failure of a device
and effects that cause a drift of device parameters with time. For the analysis of the
circuit degradation the drift-related aging effects have to be taken into account. In
addition, the amount of gate performance degradation due to an aging effect and on
which factors it depends1 is investigated. This helps to decide which dependencies have
to be modeled by the aging-aware gate model that is developed in Chapter 4.
To determine the impact of aging effects on the degradation of the gate performance,
it is proceeded as follows (see Figure 3.1): The parameter drifts, caused by aging effects,
and the sensitivity of a gate performance with respect to a parameter drift are obtained.
Combining both information provides the degradation of the gate performance.
Finally, it is identified how the degradation due to aging evolves over different process
technologies. The parameter drifts due to HCI do not show a consistent trend, but it is
shown that the circuits are getting more and more sensitive to a parameter drift because
of the reduced supply voltage.
35
3. Aging effects and their impact on standard cells
0 0
0.8 1 1.2 1.4 0 0.02 0.04 0.06 0.08 0.1
Supply Voltage VDD [V] |∆V | [V]
th
(a) (b)
Figure 3.1.: 36 mV Vth drift due to NBTI at 1.2 V VDD (a). Sensitivity of the gate delay
degradation to a threshold voltage drift (b). Hence, NBTI causes about
10 % degradation of the output delay for a rising input transition.
Aging effects that cause a parameter drift, on the other hand, can be treated determin-
istically. They cause a degradation of the transistor characteristics, which, in turn, leads
to a degradation of the gate performance. This is the reason why drift-related aging ef-
fects have to be considered for an aging-aware timing analysis. The two dominant effects
that cause a parameter drift are negative bias temperature instability (NBTI) and hot
carrier injection (HCI). Both effects are described in detail in the following subsections.
Unfortunately, the classification of drift-related aging effects and aging effects that
cause a catastrophic failure are not as unambiguous as described so far. For the latter, a
parameter drift can be observed as well before the catastrophic failure takes place. The
resistance of a wire first increases and then an open is generated due to electromigration.
For TDDB, conducting paths lead to a gradually increase of the gate current during
the SBD phase before the transistor actually fails. If the time interval in which a
parameter drift can be observed is short, it is not required that this effect is considered
for an aging-aware TA — the device is going to fail anyway within a short period of
time. Lee et al. [2006] show that the time between a SBD and a HBD is significant in
advanced technologies. A gate model for the SBD phase of TDDB is already proposed
in [Choudhury et al., 2010]. The equivalent circuit used to model the impact of SBD on
a transistor could also be used to incorporate SBD into the proposed aging-aware gate
model discussed in Chapter 4. EM does not affect the gate itself, but the delay of signal
lines and the voltage drop across supply lines. Hence, if EM becomes relevant, it must
be considered in the wire load model for timing analysis.
36
3.1. Aging effects
gate oxide
O O
Si O
Gate Si O
O O
Source Drain Si Si
O H O H H O O H O
Si Si Si Si Si Si Si Si Si
Si Si Si Si Si Si Si Si Si
channel
37
3. Aging effects and their impact on standard cells
for the degradation of the transistor parameters. There are contradictory opinions about
what happens with the vacant H atoms. It is still under discussion whether there is a
diffusion of neutral H atoms, a diffusion of H2 molecules, or a drift of H+ ions in the
direction of the gate. Alam et al. [2007] argue that H atoms react to H2 and H2 then
diffuses.
The generation of the interface states and the diffusion of the hydrogen can be modeled
by a RD system. In a RD system two processes are involved: A local reaction and a
diffusion (or drift) of the reaction products.
The rate of interface state generation due to NBTI is given by the following equation:
dNit
= kF (N0 − Nit ) − kR NH (0)Nit (3.1)
dt | {z } | {z }
generation annealing
N0 is the initial number of Si-H bonds, Nit is the number of interface states and kF
is the rate constant of broken bond creation (dissociation rate constant). NH (0) is the
number of hydrogen atoms at the Si/SiO2 interface. The process of Si-H bond breaking
can also be reversed. This is described by the second term. kR is the rate constant of
reverse annealing of a dangling bond and a H atom to a Si-H bond. This annealing or
recovery effects is a special property of NBTI. It means that the number of interface
states decreases again when the stress is removed.
The creation of interface states is limited by the diffusion (or drift) of hydrogen. This
is modeled by a second rate equation:
dNit dNH
= −DH + NH · µH · Eox (3.2)
dt dx
DH is the diffusion coefficient, µH is the mobility and Eox the electrical field across
the oxide. The second term can be neglected for neutral atoms or molecules. kF , kR and
DH are temperature dependent. kF depends on the electrical field as well. This means
that for the generation of interface states an electrical field is required but not for the
annealing and the diffusion. Equations 3.1 and 3.2 form a system of partial differential
equations. This system can either be solved numerically or a closed form equation can
be derived if some justified assumptions are made:
s
kF N0
Nit (t) = (DH t)1/4 (3.3)
2kR
The assumptions are that the rate of interface states is small and Nit is much smaller
than N0 . The time dependence for H diffusion is 1/4 and for H2 diffusion it is 1/6. The
dependence of Nit on Vth is given by [Schroder and Babcock, 2003]:
qNit (ΦS )
Vth ∝ − (3.4)
Cox
Cox is the oxide capacitance and ΦS is the surface potential. By increasing Nit the
absolute value of Vth is increased. Other device parameters are also going to change due
to Vth :
38
3.1. Aging effects
−4
x 10
0.5
|∆ Vth|=0mV
0 |∆ Vth|=33mv
−0.5 |∆ Vth|=66mV
|∆ Vth|=100mV
−1
Id [A]
−1.5
−2
Degradation
−2.5
−3
−1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0
Vds [V]
Figure 3.3.: Output characteristic of a PMOS transistor for altered values of ∆Vth .
The drain current Id is important for the performance of digital circuits and the
transconductance gm is relevant for analog circuits. Figure 3.3 shows the output char-
acteristic of a PMOS transistor for altered values of ∆Vth .
Unfortunately, the reaction diffusion theory is not able to explain all properties of
NBTI. The RD theory cannot model the temporal behavior of the recovery effect, the
bias dependence of the recovery effect, and the dependency of the parameter drift on
the duty cycle of the signal at the gate terminal [Grasser et al., 2009].
One attempt to explain this is by extending the RD model by a second component
[Islam et al., 2007]. Besides the creation of interface states, hole trapping might be
responsible for the threshold voltage drift as well. The holes are trapped by already
existing traps in the oxide. Another explanation is a two-stage model based on E’
centers [Grasser et al., 2009]. E’ centers are a well known defect in SiO2 oxides. In
the first stage the E’ centers are charged and discharged. This explains the recovery
effect. In the second stage a dangling bond can be created at the Si/SiO2 interface by a
positively charged E’ center.
Modeling of NBTI
To compute the threshold voltage drift for NBTI, degradation equations from an industry
partner are used:
Ea 1+C
∆Vth = A · exp · Vgs b · tstress n · (3.7)
kB · T W
The drift is dependent on temperature T , the gate-source voltage Vgs , the time tstress
the transistor is in NBTI stress mode and the transistor width W . A, Ea , kB , b, n and
39
3. Aging effects and their impact on standard cells
∆Vt [mV]
1
10 0 1
10 10
lifetime [y]
0.04
0.02
0
0 50 100 150
T [°C]
C are constants. The time dependence (n) is shown in Figure 3.4. Reported values for
n in the literature are between 0.15 and 0.30 [Massey, 2004]. This could be a clue for
H as well as for H2 diffusion. ∆Vth increases monotonically with time (without taking
recovery into account). For an aging-aware timing analysis, this means that it is enough
to verify that a circuit is fast enough at the end of the specified lifetime. Due to the
power law, the drift increases very fast at the beginning and settles with time. Suppose
n is 0.25. If you have a certain threshold voltage drift after a time t1 , it takes 16 · t1 to
have a threshold voltage drift twice as high.
The temperature dependence (see Figure 3.5) is modeled by the Arrhenius equation.
The reported values for the activation energy Ea vary between 0.1 and 0.36 eV [Massey,
2004]. The voltage dependence is given by a power law. The higher the gate-source
voltage is, the higher is the electrical field across the gate oxide and the resulting drift.
For the drift, the temperature and voltage over the lifetime are important. From now
on, they are referred to as effective temperature (Tef f ) and effective supply voltage
(Vef f ), to distinguish them from the current temperature Tcurr and voltage Vcurr at the
moment the circuit is analyzed. The current values of temperature and voltage define
the sensitivities, as can be seen later in Section 3.2.1.
40
3.1. Aging effects
6Vth [V]
0.06
0.04
0.02
0 0.5 1 1.5 2 2.5 3
Width [µm]
Figure 3.6.: Transistor width dependence. Marked is the minimal transistor width used
in the standard cell libraries.
Just a vertical electrical field and no lateral field exists during the homogeneous stress
mode for NBTI. The creation of interface states is uniformly distributed over the whole
gate area and a dependence on transistor sizes should not be observable. A dependence
on transistor length for very short transistors is reported in literature [Massey, 2004],
but not modeled in the degradation equations. However, a transistor width dependence
for small transistors is modeled by the degradation equations. Some kind of edge effects
are assumed to be responsible for the dependence on transistor sizes. Figure 3.6 shows
the transistor width dependence for different technologies. Marked are the minimal tran-
sistor widths used in the standard cell libraries. One can see that for some technologies
(65 nm LP) the transistor width actually affects the drift and for other technologies
(120 nm, 90 nm) the minimal transistor width used in the standard cell library is too
large to have a significant effect on transistor drift.
NBTI strongly depends on the process technology as well. Manufacturing steps that
have an impact on NBTI drift are, for instance, concentration of hydrogen, deuterium
and nitrogen in the oxide, the gate material, and initial quality of the Si/SiO2 interface
[Schroder and Babcock, 2003].
NBTI is a statistical process [Schlünder et al., 2011]. A Si-H bond is broken with a
certain probability. Hence, the threshold voltage drift for defined stress parameters is
a probability distribution. However, the degradation equations just provide the mean
value for the drift. Rauch III [2002] shows that the sigma of the threshold voltage drift
is dependent on the transistor area:
1
σ(∆Vth ) ∝ √ (3.8)
W ·L
It is also shown that ∆Vth due to aging and ∆Vth due to process variation are uncorre-
lated [Fischer et al., 2008].
41
3. Aging effects and their impact on standard cells
ΔVth
time
NBTI2 is the only aging effect that shows a recovery effect. In the RD model, recovery
can be explained by the second term in Equation 3.1. This term describes the reverse
annealing of Si-H bonds. There is no consensus about whether the complete drift recovers
or a permanent part remains [Massey, 2004]. What has been understood is that the
recovery of a certain amount of drift takes substantially longer than the time needed to
generate this drift. In [Grasser et al., 2009] a proportion of recovery to degradation of
2.5/1 in logarithmic timescale is reported. This means, for instance, when a threshold
voltage drift is generated with 25 mV/decade the recovery has a slope of 10 mV/decade.
The recovery effect makes it more difficult to characterize NBTI and complicates the
analysis of a circuit as well. To extract the constants for the degradation equation, single
transistors are stressed under defined conditions and the resulting drifts are measured.
Before the drift can be measured, the stress has to be removed. Reisinger et al. [2007]
argue that a conventional measurement set up takes up to 1 s to obtain the threshold
voltage drift. Hence, the transistor has 1 s to recover before the drift is measured.
Reisinger’s proposed on-the-fly measurement just takes 1 µs and it is shown that the
drift already recovered 50 % of its value in the interval between 1 µs and 1 s. How much
of the drift is recovered before 1 µs is unknown. 1 µs seems already sufficient fast, but in
a circuit that is operated with 1 GHz the recovery time might just be 1 ps. Hence, the
error between the real drift value and the measured, already recovered value might be
larger than 50 %.
The degradation due to NBTI is frequency independent, but it strongly depends on
the duty cycle of the signal at the transistor gate. NBTI is a static aging effect. The
drift is determined by the portion of the lifetime the gate voltage is negative with respect
to source and drain and not by the number of signal transitions (frequency). Although
the degradation is frequency independent, a substantial difference between a DC and an
AC stress is observed [Massey, 2004]. This is due to the recovery effect. For a DC stress
the drift cannot recover, it will monotonically increase. For an AC stress, the drift can
recover in between the stress phases. This results in a tooth saw curve for the drift over
time as depicted in Figure 3.7. Due to the fact that the drift builds up faster than it
recovers, the mean of the drift increases monotonically.
Figure 3.8(a) shows the dependence of the drift on the stress-duty-cycle as modeled
by the degradation equations. For a stress-duty-cycle of 100 %, the transistor is con-
stantly stressed (DC stress) and the drift is maximal. For a stress-duty-cycle of 0 %, the
2
except from its counterpart positive bias temperature instability (PBTI)
42
3.1. Aging effects
∆Vth [%]
0.02 50
0 0
0 20 40 60 80 100
Stress duty cycle [%]
3
if the workload is taken into account
43
3. Aging effects and their impact on standard cells
DAHC and CHC are the two major mechanisms and are further discussed.
44
3.1. Aging effects
Vg
Vs Vd
Gate Ig
Id
Source Drain
Vs Vd
Gate Ig
Id
Source Drain
again cause impact ionization (avalanche multiplication). Some generated carriers are
injected into the oxide or damage the interface. DAHC is maximal for Vds = 2 · Vgs .
Modeling of HCI
HCI damage can be modeled by an increase of the absolute value of the threshold voltage
Vth and an decrease of the mobility µ0 [Strong et al., 2009]. The degradation equations
used in this thesis provide a reduction of the drain saturation current Ion in terms of
percentage:
Ion,f resh − Ion,aged Ea
∆Ion = = Ae kB ·T · Vds b · tstress n · L−m (3.9)
Ion,f resh
∆Ion depends on Tef f , the effective drain-source voltage (Vds ), the stress time (tstress )
and the transistor length (L). Figure 3.11 shows the supply voltage, temperature, and
45
3. Aging effects and their impact on standard cells
∆ ION [%]
∆ ION [%]
∆ ION [%]
6 6
0
10
4 4
2 2
0 0 −2 0
1 1.2 1.4 0 50 100 10 10
supply voltage [V] T [°C] lifetime [y]
lifetime dependence of HCI. The dependence on supply voltage and lifetime follows a
power law. To determine the time the transistor is stressed, a duty factor DF is given.
The stress time tstress is tlif e /DF . A DF of 100 means that the transistor is stressed
for 1/100 of its lifetime. Reported values for n from literature are 0.25 for PMOS and
0.5 for NMOS transistors. Furthermore, a negative temperature dependence is reported,
hence HCI is the only effect that gets worth when the temperature is decreased. This
is explained by an increase of the free way length of the hot carriers. However, in the
degradation equations used in this thesis there is almost no temperature dependence for
NMOS transistors and for PMOS transistors it is positive.
∆Ion is, unlike ∆Vth , not a parameter of the transistor model. Hence, ∆Ion can not be
directly used to simulate a degraded transistor. However, there is an equivalent circuit
for a degraded transistor due to HCI (see Figure 3.12(a)). The equivalent circuit is used
to simulate an aged transistor on circuit level. It maps ∆Ion on a threshold voltage drift
∆Vth and a mobility degradation ∆µ0 . ∆Vth is realized by a voltage source VDeg and a
current controlled current source IDeg is responsible for the mobility degradation. The
value of VDeg and IDeg depend on ∆Ion .
46
3.1. Aging effects
−4
x 10
8
Degradation
D 6
Id [A]
id 4 ∆ Ion=0%
∆ Ion=5%
IDeg = 2
G ∆ Ion=10%
f (id , ∆Ion )
VDeg = 0 ∆ Ion=20%
f (∆Ion)
0 0.2 0.4 0.6 0.8 1 1.2
Vds [V]
S
(a) (b)
Figure 3.12.: (a) HCI equivalent circuit for a degraded transistor. VDeg and IDeg depend
on ∆Ion . (b) Output characteristic of an NMOS transistor for altered
values of ∆Ion .
A
A Z
Z
1 2 3 4
The simplest logic gate is the inverter. Its pull-up and pull-down networks just consist
of one transistor (see Figure 3.13).
For NBTI the gate terminal of the PMOS transistor has to be negatively biased with
respect to source and drain. Therefore, a logic “0” is applied to the gate input. In this
case, the gate-source voltage Vgs is −VDD , the transistor is in inversion and the channel
is conducting. Hence, the drain of the transistor is charged to VDD as well (Vds = 0 V).
Whenever a logic “0” is applied, the PMOS transistor degrades due to NBTI. NBTI is
frequency independent and Kumar et al. [2006] have shown that every arbitrary signal
can be converted into a periodical signal that causes the same NBTI drift as the original
signal. Hence, it is enough to know the portion of the lifetime a signal is at logic “0”.
This can be expressed by 1 − SP . The static signal probability (SP ) is a statistical
signal property that is defined as the average amount of time a signal is at logic “1”.
For more complex pull-up networks it is more difficult to determine the time a tran-
sistor is stressed due to NBTI. The NOR gate in Figure 3.14 has two PMOS transistors
connected in series, called a stack. For transistor MP B , the condition is the same as for
the single transistor of an inverter (logic “0” at input B ). For transistor MP A again a
47
3. Aging effects and their impact on standard cells
B MP B
A MP A
Z
MN B MN A
logic “0” at the gate terminal is required, but that is not enough. To have the source
of this transistor connected to VDD , a logic “0” has to be applied to input B as well
[Kumar et al., 2007b]. Hence, whenever MP A is stressed due to NBTI, MP B is stressed
as well and the Vth drift of MP B is always equal or larger than the Vth drift of MP A . A
formal method to calculate the portion of the lifetime a transistor is stressed, depending
on the signal probabilities at the inputs and the internal gate structure, is derived in
Section 4.3.3.
NBTI only affects PMOS transistors, hence, only the pull-up network is degraded.
This increases the gate delay just for a falling input transition. The gate delay for a
rising input transition only degrades indirectly. NBTI degrades the output slope as
well. The output slope serves as the input slope for succeeding gates. If the input slope
degrades, the gate delay increases as well. Due to this, the gate delay for a rising input
signal can increase as well.
For HCI, a strong lateral electrical field is needed that accelerates the carriers in the
channel. This is true for the NMOS transistor of the inverter (see Figure 3.13) when a
rising transition is applied to the inverter input. When the signal at the input is still
logic “0”, the NMOS transistor is in its non-conducting and the PMOS transistor is in
its conducting state. The drain of the PMOS transistor is at VDD , the voltage drop and
the electric field across the transistor are maximal. As soon as the NMOS transistor
begins to conduct, hot electrons are generated which damage the transistor.
Vgs of the NMOS transistor is equal to the input voltage and Vds is equal to the output
voltage of the inverter. The conditions for which the hot carrier generation is maximal
is a high Vds and Vgs = Vds for CHC and Vgs = 1/2 · Vds for DAHC. However, at least
the conditions for CHC are never met for an inverter (or any other logic gate), because
Vds has already started to decrease when Vgs is maximal (see waveforms in Figure 3.13).
To consider that the HCI stress in a logic gate is different from the DC stress of a
single transistor, a empirical correction factor for the degradation equation is given.
This correction factor is multiplied by the time the transistor is in stress. It reduces
the stress time dependent on the signal slopes of Vds and Vgs . The considerations above
are valid for a PMOS transistor as well. A PMOS transistor degrades due to HCI for a
falling input slope.
Let’s again take a look at the PMOS transistor stack in Figure 3.14. For transistor
48
3.2. Impact on gate performance
MP A to degrade due to HCI, a falling transition at input A is required but not sufficient.
A strong lateral electrical field only exists if transistor MP B is in its conducting state as
well. This results in a conducting path from VDD to the output capacitance. Current
flows through transistor MP A until the output capacitance is recharged and the number
of hot carriers is proportional to the drain current. The same considerations are true
for transistor MP B in the stack. A formal method to calculate the portion of lifetime
a transistor is stressed due to HCI, depending on the signal probabilities and transition
densities at the gate inputs and the internal gate structure, is derived in Section 4.3.3.
Now the waveform in Figure 3.13 can be divided into regions when a transistor is
stressed due to NBTI or HCI. For a rising input slope (region 1) the NMOS transistor is
degraded due to HCI. When the input signal is logic “1” (region 2) the NMOS transistor
would be stressed due to PBTI (PBTI is not considered yet). For a falling input slope
(region 3) the PMOS transistor is degraded due to HCI and when the input signal is
logic “0” (region 4) the PMOS transistor is in NBTI stress. NBTI increases the gate
delay and output slope for a falling input transition; HCI increases delay and output
slope for both transitions.
49
3. Aging effects and their impact on standard cells
DUT
Figure 3.15.: Fan-out-3 structure: All gates in the test structure are identical to the
DUT. The voltage source generates a step function. To have a realistic
input signal at the DUT, the step function has to propagate through two
gates before reaching the DUT. Those two gates and the DUT have to
drive three gates.
0.7V
60
0.9V
1.2V 50
40 1.5V
40
20 30
0 20
0 0.02 0.04 0.06 0.08 0.1 0.8 1 1.2 1.4
|∆Vth| [V] VDD [V]
(a) (b)
degradation of the gate delay ∆delay is the change of the gate delay normalized to the
gate delay without a parameter drift. Figure 3.16(b) depicts the degradation of the gate
delay over the supply voltage for a ∆Vth of 100 mV. The lower the supply voltage the
larger is the sensitivity.
The impact of temperature is much smaller (see Figure 3.17). Again, the sensitivity
is increased by a lower temperature. As described in Section 3.1.1, it is important to
distinguish between the current and the effective temperature and supply voltage. The
current value defines the sensitivity and the effective value, which is the value over the
lifetime, defines the parameter drift. The worst case is a high effective and a low current
temperature and supply voltage. This is, for instance, the case for a circuit with a high
performance mode and a low power mode. If the circuit is operated for a long time in the
high performance mode, the transistors experience a large parameter drift. If the circuit
is then switched into a low power mode, the circuit becomes very sensitive. Hence, a
large degradation of the circuit performance can be observed.
50
3.2. Impact on gate performance
−40°C 25
27°C
85°C 20
20
125°C 15
10 10
0 0
0 0.02 0.04 0.06 0.08 0.1 0 50 100
|∆Vth| [V] T [°C]
(a) (b)
INV; 90nm; 1.2V; 27°C 90nm; 27°C; 1.2V; all PMOS identical ∆ Vth
30 25
6delay (falling input) [%]
A INV
B 20 NAND2
20 C NOR2
D 15 NOR3
10
10
5
0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|6Vt| [V] |∆Vth| [V]
(a) (b)
Figure 3.18(a) shows that the driving strength of a cell has almost no effect on the
sensitivity. The gate type, on the other hand, has an impact on the sensitivity (see
Figure 3.18(b)). For the NAND and NOR gates, it is assumed that all PMOS transistors
have the same threshold voltage drift. The NOR gates degrade much stronger than the
NAND gate and the inverter for the same ∆Vth . This is caused by the stacked PMOS
transistors in a NOR gate. For a falling input signal the output load is recharged over
two (NOR2) or three (NOR3) degraded PMOS transistors.
In Figure 3.19(a) and 3.19(b) the process corner (fast, nominal, slow) and the transistor
type (low Vth , regular Vth , or high Vth ) are altered. Both have only a minor impact on
the sensitivity.
To determine the impact of input slope and output load, a single gate is simulated
and the input slope and output load are altered. Figure 3.20(a) gives the sensitivity
for four different slope load combinations (slow/fast input slope and small/large output
51
3. Aging effects and their impact on standard cells
10 10
0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|∆Vth| [V] |6Vth| [V]
(a) (b)
(a) (b)
load). The sensitivity stays almost constant except for the case with a slow input slope
and a small output load, which shows a much higher sensitivity. Figure 3.20(b) depicts
the degradation of the gate delay for a ∆Vth of 100 mV over the range of characterized
input load and output slope pairs.
Besides the gate delay, the impact on the output slope is investigated as well. Fig-
ure 3.21(a) and 3.21(b) show the impact of supply voltage and temperature, respectively.
The degradation of the output slope and the degradation of the gate delay (in terms of
percentage) are about the same.
52
3.2. Impact on gate performance
80 0.7V −40°C
0.9V 27°C
60 1.2V 20 85°C
1.5V 125°C
40
10
20
0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|∆Vth| [V] |∆Vth| [V]
(a) (b)
0.9V −40°C
30
40 1.2V 27°C
1.5V 85°C
30 20 100°C
20
10
10
0 0
0 5 10 15 20 0 5 10 15 20
∆ION [%] ∆ION [%]
(a) (b)
is performed. This time, ∆Ion is varied, which changes the values of the voltage source
and the current controlled current source of the equivalent circuit.
Figure 3.22 shows the dependence of the sensitivity on supply voltage and temper-
ature. It is similar to the dependence with respect to NBTI. Due to that, no further
dependencies are given for HCI. By comparing the degraded transistor characteristics
for NBTI and HCI (see Figure 3.3 and 3.12(b)), one can see that both aging effects have
a similar impact on the transistor characteristics. This explains their similar impact on
the sensitivities.
53
3. Aging effects and their impact on standard cells
CN
CP
D IV1 TG1 IV2 TG2 IV4 IV6 Q
CN
CP
IV3 IV5
CN IV8 CP
CLK IV7
Comparison of setup time and delay degradation Comparison of hold time and delay degradation
40 200
PMOS_IV8
30 150 PMOS_IV2
20 all PMOS
100 Inverter
10
PMOS_IV1
0 PMOS_IV8 50
PMOS_TG1
10 PMOS_TG2
all PMOS 0
20
Inverter
30
0.00 0.02 0.04 0.06 0.08 0.10 50
0.00 0.02 0.04 0.06 0.08 0.10
|∆Vth|[V] |∆Vth|[V]
(a) (b)
54
3.2. Impact on gate performance
D1
D Q ... D Q
D2
D Q
Clk
tSUP tHLD
Clk
From this study it can be seen that the sensitivities can well be linearized and that
the degradation of tSU P and tHLD is in the same order as the degradation of gate delays.
This has the following implications for the timing behavior of a sequential circuit:
• For a long timing path (e.g., path ending at D1 in Figure 3.25), the setup time
constraint is relevant. It is violated if the data signal arrives after the setup time at
the receiving FF. Due to aging, the gate delays along the data path degrade and,
therefore, the path delay increases. Whether tSU P increases or decreases depends
on which transistors degrade the most. If tSU P decreases, this would compensate
some amount of the slower data path. If tSU P increases, the timing problem due
to the slower data path is amplified. One has to consider that a long data path
consists of many gates and the degradation of one gate delay is approximately
as large as the setup time degradation. For the investigated MSFF, tSU P is in
the same order of magnitude as the delay of combinational gates (several tens of
picoseconds). Hence, the degradation of tSU P plays a minor role compared to the
degradation of the gate delays along the path.
• For a short timing path (e.g., path ending at D2 in Figure 3.25), the hold time
constraint is relevant. It has to be ensured that the data signal at the MSFF does
not change before the hold time. This time the data path only consists of a few
gates or even none at all. If the path consists of a few gates, the degradation of
the path delay and the hold time degradation can cancel each other out. This is
not the case when there are no gates along the path. For the investigated MSFF
one has to consider that the nominal hold time is only a few picoseconds. This
means also a degradation by 150 % (as seen in Figure 3.24(b)) does not change the
absolute value of the hold time much.
Following this argumentation, it is shown that for timing verification the modeling of
55
3. Aging effects and their impact on standard cells
the gate delay degradation is more important than the modeling of the degraded setup
and hold time. A long timing path consists of many gates and the degradation of one
single gate is comparable to the setup time degradation. For a short timing path without
any gates the degradation of the hold time can be relevant, but not for the investigated
MSFF, because its hold time is only a few pico seconds.
Leakage power dissipation (Pleakage ): Leakage power originates from the leakage cur-
rents Ileakage of a transistor when it is in off-state:
For sub-100 nm technologies, the gate tunneling current and the subthreshold leak-
age current are the two dominant factors [Piguet, 2005]. The gate tunneling current
strongly depends on oxide thickness, whereas the subthreshold leakage current de-
pends, amongst others, on the threshold voltage.
Pswitching and Pshort−circuit are combined to the dynamic power dissipation and Pleakage
is also known as static power consumption. For the fan-out-3 test structure, Pshort−circuit
is responsible for about 10 % of the dynamic power. The portion of Pshort−circuit would
be increased by a slower input transition or by a smaller output load. To investigate the
impact of aging on these components of power dissipation, Vth of the PMOS transistors is
increased. Pswitching does not depend on the threshold voltage and stays constant. The
same is true for the gate tunneling current. The subthreshold current is exponentially
dependent on gate-source voltage Vgs and threshold voltage Vth :
56
3.3. Technology trend
INV; 90nm; Vnom; 27°C PMOS; 90nm; Vds=Vnom; Vgs = 0V; 27°C
100 100
90 80
Pshort−circuit [%]
Ileakage [%]
80 60
70 40
rising input
60 falling input 20
50 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|∆ Vth| |∆ Vth|
(a) (b)
Figure 3.26.: (a) Change of Pshort−circuit by altering Vth . Pshort−circuit decreases for a
rising and a falling input transition. (b) Subthreshold current for a PMOS
transistor (with Vgs = 0 V and Vds = 1.2 V) for altered ∆Vth values.
57
3. Aging effects and their impact on standard cells
8
x 10
0
130nm 90nm 65nm 45nm
Technology
Figure 3.27.: Vertical electrical field over technologies at nominal supply voltage.
the gate oxide thickness. Low power technologies have a thicker gate oxide, resulting in
less leakage currents.
The common opinion in the literature is that the Vth drift due to NBTI increases
with newer technologies (e.g., see [Strong et al., 2009; Huard et al., 2009]). This is
due to the strong dependence of the drift on the vertical electrical field. The electrical
field increases because the transistor sizes are scaled more aggressively than the supply
voltage. Figure 3.27 shows the increasing electrical field. This was calculated from data
in the international technology road map for semiconductors [ITRS, 2001, 2009]. The
vertical electrical field is given by:
Vnom
Evertical = (3.14)
tox
Vnom is the predicted nominal supply voltage for a technology and tox is the correspond-
ing physical oxide thickness. However, the degradation equations for the technologies
do not show such a clear picture. The correlation between drift and Vnom can be seen
by comparing Figure 3.28(a) and Figure 3.28(b). The 120 nm technology with a Vnom of
1.5 V has the largest drift, followed by 90 nm, 65 nm LP and 45 nm LP (Vnom = 1.2 V).
The 65 nm HP technology (Vnom = 1.0 V) shows the smallest drift over the lifetime. If
the drifts over the lifetime are calculated for a VDD of 1.2 V for all technologies, the
difference between the technologies (see Figure 3.28(b)) is less than 10 mV.
All five technologies still have a SiO2 gate dielectric, hence, the impact of high-k
metal gates is not considered yet. With high-k metal gates, the gate dielectric becomes
thicker again. This reduces the electrical field. However, it is observed that NMOS
transistors experience a Vth drift due to PBTI in the same order of magnitude as the
PMOS transistors due to NBTI.
In the last several years, the research focus was on the NBTI effect, but Huard et al.
[2009] argue that HCI is no longer negligible due to a constant lateral field increase
since the 120 nm technology node. As can be seen in Figure 3.29(a) the lateral electrical
field increases as well with newer technologies (Elateral = Vnom/Lmin with Lmin being the
minimal gate length).
Figure 3.29(b) shows the ∆Ion drift for PMOS and NMOS transistors due to HCI
as calculated with the degradation equations. The PMOS transistors degrade stronger
58
3.3. Technology trend
PMOS; 125°C; Vnom; W=10µm; Lmin PMOS; 125°C; 1.2V; W=10µm; Lmin
2
10 2
120nm 10
120nm
90nm 90nm
|∆ Vth| [% of Vth0]
|∆ Vth| [% of Vth0]
65nm LP 65nm LP
65nm HP 65nm HP
1 1
10 45nm LP 10 45nm LP
0 0
10 0 1
10 0 1
10 10 10 10
lifetime [y] lifetime [y]
(a) (b)
Figure 3.28.: Transistor drifts due to NBTI and for different technologies at nominal
supply voltage (a) and at a supply voltage of 1.2 V (b).
3 65nm HP
45nm LP
∆ ION [%]
0 NMOS
10 120nm
2
90nm
65nm LP
1 65nm HP
45nm LP
0
130nm 90nm 65nm 45nm 10
0 5
10
Technology lifetime [h]
(a) Lateral electrical field over technologies (b) Transistor drifts due to NBTI and for different
at nominal supply voltage. technologies at nominal supply voltage
than the NMOS transistors. The PMOS transistors show a clear technology trend. The
drift increases with newer technologies. The only exception is the 45 nm LP technology
with a drift smaller than the one of the 65 nm technologies.
However, the parameter drift is only half the truth, it is equally important how the
sensitivities evolve over the technologies. Figure 3.30(a) shows the sensitivity of the
gate delay with respect to a Vth drift. 120 nm, with 1.5 V nominal supply voltage, has
the lowest sensitivity. 65 nm HP (1.0 V nominal supply voltage) reveals the largest
sensitivity. The other technologies have a nominal supply voltage of 1.2 V and lie in
between. Hence, the sensitivities show the completely opposite behavior than the drifts.
The 65 nm HP technology, for instance, has the smallest drifts, but the highest sensitivity.
To compare the degradation of the gate delay for those five technologies, seven use
profiles from the business units of an industry partner were chosen. Use profiles specify
the operating conditions a circuit must be able to sustain during its lifetime. It con-
sists, among other parameters, of a specified lifetime, a maximal supply voltage and a
temperature profile. The temperature is either given by a mean value, by intervals or
59
3. Aging effects and their impact on standard cells
∆delay/∆Vth [%/100mV]
65nm LP
25
20 65nm HP
45nm LP 20
15
10
10
5
0
0 0.02 0.04 0.06 0.08 0.1 0
120nm 90nm 65nm HP
∆Vt [V] Technology node
(a) (b)
20 Profile B
Profile C
Profile D
15
Profile E
Profile F
10
Profile G
5
0
65nm LP 45nm LP 120nm 90nm 65nm HP
Technology
Figure 3.31.: Degradation of inverter delay for different technologies and use profiles.
3.4. Summary
For an aging-aware timing analysis aging effects that cause a parameter drift are rele-
vant. The two most severe drift-related aging effects nowadays are NBTI and HCI. In
order to determine the degradation caused by aging effects, the parameter drifts due
to a particular aging effect have to be considered as well as the sensitivity of a gate
performance with respect to a parameter drift.
The physical mechanism behind NBTI is not yet completely understood. NBTI can
best be modeled by an increased Vth of the PMOS transistors. A special characteristic
of NBTI is that the Vth drift recovers when the transistor is no longer stressed. NBTI
is strongly dependent on the supply voltage. As soon as high-k metal gates are used,
PBTI must also be considered because then for such gates, PBTI shows a drift in the
60
3.4. Summary
61
4. Aging-aware static timing analysis
For performing an aging-aware TA on gate level, a gate model is required that provides
the aged gate delay instead of the fresh one. This is the main difference compared to a
traditional STA without aging. The proposed aging-aware gate model is called AgeGate
[Lorenz et al., 2009a] and it has the following advantages compared to the state-of-the-art
approaches discussed in Chapter 2.2.2:
Analyzing impact of NBTI and HCI: The proposed aging-aware gate model is not lim-
ited to just one aging effect. It analyses the combined impact of NBTI and HCI.
From the aging-aware gate models that were already introduced, all except the
LUT-based gate model in [Chen et al., 2011] considers just one aging effect. The
results of our proposed approach show that the mean degradation of the circuit
delay is 10.1 % for NBTI and 3.2 % for HCI. Hence, HCI can not be neglected
although NBTI is the dominant aging effect for the investigated 90 nm technology
and the chosen operating conditions.
Individual parameter drifts: The single transistors of a gate degrade individually, be-
cause due to the workload at the gate inputs and the internal gate structure the
time the transistors are in stress mode differs. A formal way to calculate individ-
ual parameter drifts for every transistor is developed. A canonical gate model is
used, which can consider the impact of the individual parameter drifts on the gate
performances. The results show that the degradation is overestimated by 20 %
without considering individual parameter drifts.
Degradation of the output slope: The proposed approach not only calculates an aged
gate delay, but in addition an aged output slope is determined. Like in a tradi-
tional STA, signal waveforms are modeled as ramps. The output slope of one gate
determines the input slope of a succeeding gate and this, in turn, impacts the gate
delay of the succeeding gate. The results show that the degradation of the circuit
delay is underestimated by 24 % when the fresh output slope instead of an aged
output slope is taken to calculated the gate delay.
Easy extensibility: The gate model considers two aging effects at the moment, but the
approach can easily be extended. The proposed approach is based on calculating
the transistor drifts and then computing the aged gate performances. For calcu-
lating the aged gate performances, the sensitivities of the gate performances with
respect to a parameter drift are required. Other aging effects that cause a drift
of transistor parameters can be taken into account if degradation equations are
available and the sensitivities for this new aging effect are characterized.
63
4. Aging-aware static timing analysis
One effect that gets important in technologies with a metal gate is positive bias
temperature instability (PBTI). PBTI is the counterpart of NBTI and degrades
NMOS transistors. Another effect that might become relevant in the future and
must be modeled is TDDB. In [Choudhury et al., 2010] it is shown that also
TDDB leads to a degradation of transistor characteristics before it comes to an
catastrophic breakdown.
The degradation equations can also be replaced by more accurate ones to take the
recovery effect for NBTI into account.
The chapter is organized as follows: First the complete aging analysis flow is intro-
duced (Section 4.1), then it is explained how the workload can be determined (Sec-
tion 4.2). In Section 4.3 the proposed aging-aware gate model is explained. The charac-
terization of the standard cells is described in Section 4.4 and results for several bench-
mark circuits are given in Section 4.5.
1. The operating conditions over lifetime are specified by globally setting the supply
voltage Vef f and the temperature Tef f . The approach could also be extended to
take voltage drops and temperature gradients over a chip into account by having
individual supply voltage and temperature values for every gate. That way, the
accuracy of the aging analysis could be increased.
2. The workload at the gate inputs is required to calculate individual transistor drifts.
The workload is defined by gate input signals over lifetime and it is determined by
two statistical parameters, signal probability (SP ) and transition density (T D):
• SP and T D can be obtained by performing a logic simulation of the circuit.
However, this requires typical input signals for the circuit and contradicts the
fundamental idea of a static timing analysis, which is independent of input
signals.
64
4.1. Aging-aware STA flow
• If neither realistic input signals nor values for SP and T D at the primary
inputs are available, a worst-case analysis can be performed. Worst-case
values for SP and T D are specified that are used for all nets of the circuit.
By choosing 0 % for SP , it is guaranteed that all PMOS transistors are in
inversion during the entire lifetime and the circuit degrades maximal due to
NBTI. For T D the specification of worst-case values is more difficult because
it has to be considered that due to the delay of the gates a signal may change
several times before settling to its static value (this is referred to as glitches).
3. After operating conditions and the workload are determined, the aged gate per-
formances can be calculated by modifying the Function update_edge from page
21 (see Function update_edge_aged on page 66). First, the stress probabilities
for the single transistors of a gate are obtained. The stress probability is the per-
centage of time that a transistor is stressed by a particular aging effect during
the lifetime. Next, the parameter drifts for the single transistors are computed by
means of degradation equations for NBTI and HCI. Finally, the aged gate delay is
computed by adding up the fresh gate delay and the gate delay degradation.
65
4. Aging-aware static timing analysis
Function update_edge_aged(u,v)
/* Recursive function to update the gate delay of an edge (u, v) */
if dvalid ((u, v)) == F alse then
/* Update gate delay based on input slope and output load */
slope = get_slope_from_node(u);
load = get_load_from_node(v);
stress_probabilities = get_Pstress();
drif ts = get_drifts(use_profile, stress_probabilities);
df resh ((u, v)) = get_delay_from_LUT(slope, load);
∆d((u, v)) = get_degradation(slope, load, drifts);
daged ((u, v)) = df resh ((u, v)) + ∆d((u, v));
dvalid ((u, v)) = T rue
end
return daged ((u, v))
66
4.2. Workload determination
Xakellis and Najm [1994] generate random signal vectors for the primary inputs, which
have the specified SP and T D. Then, logic simulation is used to obtain the signal
waveforms at every gate input and SP and T D for those signals are computed. This is
repeated until the stopping criterion is reached. At every iteration a new mean value for
SP and T D at every gate input is calculated. The stopping criterion is fulfilled when
all mean values are within a confidence interval specified in advance. Although, this
approach is very accurate, it is quite time consuming.
The following approaches are called probabilistic methods because they propagate
the statistical signal properties directly from the primary inputs into the circuit. The
approaches differ in how accurately they consider the spatial and temporal dependence
of signals.
Spatial and temporal dependence are defined as follows:
Spatial dependence: Two signals may depend on one another. For instance, both sig-
nals cannot be logic “0” at the same point in time. Spatial dependence arises
when a circuit has feedback (sequential circuits) and for a signal that splits and
reconverges again. In general, probabilistic methods assume spatial independent
signals at the primary inputs.
Temporal dependence: The logic value of a signal for two points in time may be inter-
dependent. A clock signal, for instance, is logic “1” during half a clock period and
logic “0” in the succeeding half.
fx1 and fx1 are the cofactors of f with respect to x1 . For a NAND gate (y = x1 · x2 )
the signal probability at the output is:
i=1
∂xi
∂y/∂x is the Boolean difference (∂y/∂x := yx ⊕yx ) and the signal probability of the Boolean
difference is the probability that the gate is sensitized and the transition at input xi is
67
4. Aging-aware static timing analysis
b
a x
c
b z
y
c 0 1
(a) (b)
observed at the gate output. It is assumed that the signals at the inputs x1 to xn are
spatially independent.
SP and T D at the nets can be computed directly from the statistical signal properties
at the primary gate inputs and not by propagating SP and T D from the gate inputs
to the gate outputs. In this case no internal spatial independence has to be assumed.
It is just assumed that the signals at the primary inputs are independent. A binary
decision diagram (BDD) is used to express the logic function of a signal dependent on
the primary inputs. The cofactors can now easily be calculated by following the true
and the false branch of the particular node of the BDD. The Boolean difference can be
computed with a BDD as well. Hence, Equation 4.1 and 4.4 can be used to compute
SP and T D for a signal directly from SP and T D at the primary inputs.
The difference between propagating SP from the gate inputs to the gate outputs or
directly computing SP from the primary inputs is illustrated by the following example
(see Figure 4.2). All three primary inputs have a signal probability of 0.5 and a transition
density of 1. SP and T D at the internal nets x and y are the same for both approaches:
68
4.3. AgeGate: Aging-aware gate model
When the BDD is used and SP (z) and T D(z) are computed from SP and T D at the
primary inputs directly, it looks as follows:
The difference in SP (z) and T D(z) results from the fact that the first approach does
not consider the spatial correlation from the reconvergent paths starting at b.
Unfortunately, circuits of industrial complexity are too large to set up the BDD for the
entire circuit. This is the reason why in this thesis the statistical signal properties are
propagated from the gate inputs to the gate outputs and the resulting accuracy penalty
is accepted. In [Najm, 1993] a compromise is proposed. The circuit is partitioned and
a BDD is generated for each partition of the circuit. That way the spatial correlation
within a partition is kept and just at the partition borders the correlation is lost.
The canonical gate model provides the aged gate performances dependent on the pa-
rameter drifts of the single transistors. These drifts are calculated by technology specific
degradation equations. The workload has an essential impact on the parameter drifts,
since it defines the fraction of the lifetime a transistor is actually stressed by a particular
aging effect. To determine this impact, information about the internal gate structure is
required.
m∈G p∈P
The aged gate performance (qaged ) is the sum of the fresh gate performance (qf resh ) and
the degradation of the gate performance (∆q). G is the set of all transistors of the gate
69
4. Aging-aware static timing analysis
and P is the set of all parameters that drift due to aging effects. χqm,p are the sensitivity
coefficients and ∆pm is the parameter drift of a particular transistor m. The sensitivity
coefficients are defined as:
∂q
χqm,p = |∆p =0 (4.6)
∂∆pm m
It is the partial derivative of q to a drift ∆pm at the nominal parameter value (∆pm = 0).
For the aged gate delay daged this results in:
!
∂d ∂d
daged = df resh + · ∆Vth,m + · ∆Ion,m (4.7)
X
m∈G
∂Vth,m ∂Ion,m
The sensitivity coefficients ∂d/∂Vth,m and ∂d/∂Ion,m are obtained together with the fresh
gate delay df resh when the gate is characterized. The drifts ∆Vth,m and ∆Ion,m are
computed during aging analysis by degradation equations. The aged output slope is
modeled similarly to the aged gate delay.
Figure 4.3 shows that only a small error is introduced by linearizing the dependence
of the gate performance to a parameter drift. The degradation of an inverter delay for
a drift of Vth and Ion is shown. The dependencies are once simulated on transistor level
and also calculated by means of the sensitivities. The comparison shows a good match
until 10 % degradation of Ion and 50 mV Vth drift. Those are drift values which are
reached just for very demanding operating conditions over lifetime (10 y, 125 ◦C, and
110 % Vnom ). Hence, the linearized sensitivities in the canonical gate model are justified.
Should the parameter drifts become too large and the error of the linear model be no
longer acceptable in future technologies, it is possible to move to a quadratic gate model
as it it proposed in [Zhang et al., 2005] for statistical static timing analysis (SSTA).
The equations are already discussed in Sections 3.1.1 and 3.1.2. The drifts depend on
the effective supply voltage over lifetime Vef f , the effective temperature over lifetime
Tef f , the time tstress and the transistor sizes W and L.
The time tstress states for how long the transistor is stressed due to an aging effect
during the lifetime tlif e . The stress time can be expressed as:
70
4.3. AgeGate: Aging-aware gate model
15 15
10 10
5 5
Figure 4.3.: Degradation of inverter delay by ∆Ion and ∆Vth , respectively. Solid lines
show dependencies calculated with sensitivities and dotted lines show de-
pendencies simulated on transistor level. Analyzing conditions are 27 ◦C,
1.2 V and 15 pF capacitive load.
Pstress is the probability that a transistor is stressed during tlif e . Pstress differs for
the individual transistors of a gate. The individual stress probability depends on the
workload at the gate inputs and on the internal gate structure.
A transistor must be negatively biased with respect to source and drain in order to
degrade due to NBTI (see example in Figure 4.4). For transistor MP C this is the case
when a logic “0” is applied to the input C. Hence, the stress probability of MP C just
depends on the signal probability at C. For transistor MP B on the other hand, a logic “0”
must be applied to B but also to input C. Otherwise the gate is not negatively biased
with respect to the source node of the transistor. Hence, the stress probability for MP B
depends on the workload at input B and C and in addition on the internal structure
of the gate [Kumar et al., 2007b]. More precisely it depends on the position of the
transistor in the PMOS stack. The challenge to determine individual transistor drifts
is to obtain the stress probabilities for all the transistors a gate consists of by means of
the values for SP and T D at the gate inputs and the internal gate structure.
71
4. Aging-aware static timing analysis
SP = 0.5
C MP C
SP = 0.4
B MP B
SP = 0.3
A MP A
Z
MN C MN B MN A
For the calculation of the stress probability for HCI, which is introduced in Section 4.3.3,
the probability that an NMOS transistor is conducting is needed as well. Due to that,
a new probability Pon is introduced. For a PMOS transistor, Pon is the probability that
the gate terminal is at logic “0”, given by 1 − SP at the gate terminal. For an NMOS
transistor, Pon is the probability that the gate terminal is at logic “1”, which equals SP
at the gate terminal. Hence, the probability for Condition A is equal to Pon of M :
For NBTI the gate terminal must be negatively biased with respect to its source and
drain terminal. However, Condition B just considers the logic value at one of both
transistor terminals. The reason is that when the transistor is conducting (condition A
fulfilled) it is enough to have a logic “1” at the source (drain) terminal, since the drain
(source) terminal will be charged to the same value.
Condition B is fulfilled if a conducting path exists between the supply voltage VDD
and the source or drain terminal [Stempkovsky et al., 2009] of the transistor M . Hence,
all PMOS transistors along the conducting path must have a logic “1” applied to their
gate terminals as well. There might be multiple paths from VDD to the source or drain
terminal of a transistor. In this case P (B)i is calculated separately for every path
P AT HN BT I,i :
72
4.3. AgeGate: Aging-aware gate model
B'
P AT HN BT I,i is the set of all transistors along a conducting path. How those paths are
determined is explained in Section 4.4.2. For independent signals at the gate inputs, the
probabilities can simply be multiplied.
The overall probability P (B) is the probability that at least one path is conducting1 :
P (B) = 1 − ( (1 − P (B)i )) (4.13)
Y
However, if the signals are dependent, this has to be taken into account when the
probability for condition B is calculated. To calculate P (B) for transistor MP A (see
Figure 4.4), P AT HN BT I consists of MP B and MP C . Both gate terminals have a signal
probability of 0.5. If the signals are independent, P (B) would be 0.5 · 0.5 = 0.25. If the
signals B and C are dependent (see Figure 4.5) it is not that easy to calculate P (B)
for transistor MP A . In the first case (signals C and B), both signals are never logic “0”
at the same time, hence, both transistors will never be in inversion at the same time
and therefore P (B) is 0. In the second case (signals C and B 0 ), the signals are always
logic “0” at the same time and P (B) is 0.5.
The larger the probability for Condition B, the larger is the transistor drift and the
increase of the gate delay. A worst-case assumption for Condition B is that all transistors
of a path tend to be in inversion at the same time. In this case, the minimum of the
probabilities Pon for all transistors in P AT HN BT I,i limits P (B)i :
P (B)i = min (Pon,t ) , worst-case assumption if signals are dependent
t∈P AT HN BT I,i
(4.14)
If there is more than one path, a worst-case assumption for Condition B is that just
one path is conducting at a time and the probabilities P (B)i can simply be added:
P (B) = min( P (B)i , 1) , worst-case assumption if signals are dependent (4.15)
X
When the workload should be considered for the aging analysis, a probabilistic method
is used to compute the signal probabilities at the gate inputs. The probabilistic method
assumes independent input signals and the dependence of reconvergent signals is lost as
well. Hence, the worst-case assumption is used in order to have a conservative result.
Finally, Pstress,N BT I is the probability that both conditions A and B are fulfilled:
P (A) · P (B) , if signals are independent
(
Pstress,N BT I = P (A ∧ B) = (4.16)
min (P (A), P (B)) , if signals are dependent
1
This is 1 minus the probability that no path is conducting at all
73
4. Aging-aware static timing analysis
For Pstress,N BT I it has also to be taken into account whether independent signals are
assumed or not.
For illustration, the probability Pstress,N BT I for transistor MP A in Figure 4.4 is cal-
culated. For independent signals Pstress,N BT I = (1 − 0.5) · (1 − 0.5) · (1 − 0.3) = 0.175.
Otherwise, Pstress,N BT I is the minimum of the three SP values of the transistors in the
stack, hence, Pstress,N BT I = min(0.5, 0.5, 0.7) = 0.5.
A transistor degrades due to HCI when carriers are accelerated and injected into the
gate oxide. The required electric field along the channel exists when the transistor
switches from its non-conducting (off) to its conducting (on) state. For an NMOS
(PMOS) transistor this implies a rising (falling) signal transition at the gate terminal.
Furthermore, the degradation depends on the charge that flows through a transistor.
Only if all other transistors along a path P AT HHCI from supply voltage/ground to
the cell output are in inversion, the output load is recharged. Otherwise only internal
capacitances are recharged which are substantially smaller and neglected in the proposed
approach. For HCI, two stress conditions have to be fulfilled for a transistor M :
T D at the gate terminal of M is a measure for the number of transitions (Condition C).
Furthermore, all other transistors along the path P AT HHCI,i must be in inversion to
form a conducting path (Condition D):
74
4.3. AgeGate: Aging-aware gate model
SP = 0.5
C MP C
SP = 0.4
B MP B
SP = 0.3
A MP A MP Z
int Z
MN C MN B MN A MN Z
Figure 4.6.: OR gate with three inputs and an internal signal int.
transitions. The number of effective transitions times the input slope is tstress,HCI .
Hence, Pstress,HCI is:
tstress,HCI
Pstress,HCI = = T D/2 · fCLK · P (D) · sIN (4.19)
tlif e
Multi-stage gates
The aging-aware gate model, described so far, is capable of determining the aged gate
performances of single-stage gates. Single-stage gates have no internal nets that are
connected to gate terminals of transistors. Examples for single-stage gates are inverters,
NAND and NOR gates. But a simple buffer is already a multi-stage gate, because the
two inverters are connected via an internal net. For multi-stage gates the following
problems arise when the stress probabilities are calculated [Lorenz et al., 2010d]:
1. The values for SP and T D of internal signals (e.g., int in Figure 4.6) are un-
known. These values are necessary to calculate Pstress,N BT I and Pstress,HCI for
the transistors MP Z and MN Z .
2. The transition time sIN of internal signals is unknown as well. sIN is required to
compute Pstress,HCI with Equation 4.19.
To obtain the statistical signal properties the probabilistic method from [Najm, 1991]
is used. Probabilistic methods can not just be used to propagate SP and T D from the
gate inputs to the gate output but also to propagate them to internal nets. To do so,
the logic function of the internal signal is determined when the gate is characterized.
The transition time sIN of internal signals needed in Equation 4.19 is obtained dur-
ing the characterization of the gate. Like the output slope sOU T , it is characterized
dependent on input slope at the gate input and output load at the gate output.
75
4. Aging-aware static timing analysis
Table 4.1.: An example for a temperature profile. The lifetime is 10y and Vef f is Vnom .
differences across the chip can be taken into account by having an individual Tef f and
Vef f value for every gate. In this section two methods are proposed to determine the
parameter drifts when Tef f and Vef f change during the lifetime. Hence, there exists a
temperature and/or voltage profile. Such a profile has to be defined in the specifications
for a circuit. A temperature profile could for instance look as shown in Table 4.1.
To ensure a conservative result, the upper bounds of the temperature intervals are
taken. This results in the following time-temperature-tuples (ti , Ti ):
The first proposed method just works for temperature profiles. The basic idea is
to determine the effective temperature Tef f that results in an equivalent drift as the
temperature profile over the same time. In both degradation equations for NBTI and
HCI the temperature dependence is given by the Arrhenius equation:
− kEaT
∆Vth , ∆Ion ∝ e b (4.21)
Ea is the activation energy (e.g., 0.16 eV for NBTI) and kb is the Boltzmann constant.
The time dependence is modeled for both effects as follows:
With n being a constant (e.g., 0.23 for NBTI). First, an arbitrary reference temperature
Tref is chosen and the times of the time-temperature-tuples are adjusted in a way that
the degradation stays the same (ti , Ti ) → (ti,ref , Tref ):
Ea
− kET
a ! −k
tni · e b i = tni,ref · e b Tref (4.23)
When this is done for the example above and NBTI, the following times are calculated:
t1 equals t1,ref , because T1 was chosen as the reference temperature. It can be seen
that the first tuple with the high temperature dominates the degradation (the drift after
76
4.4. Characterizing the standard cells
2 y at 27 ◦C equals the drift after 7 h at 150 ◦C). Because of the identical reference
temperature, the times can now be added:
n
ttot = (4.26)
X
ti,ref
i=1
The tuple (tlif e , Tef f ) is calculated from the tuple (ttot , Tref ) by setting ttot to tlif e and
adjusting the temperature that the drift stays the same:
!!−1
1 kb ttot
Tef f = − · n · ln (4.27)
Tref Ea tlif e
In the example the effective temperature is 119 ◦C. This results in a threshold voltage
drift of 50 mV due to NBTI.
The second method works for temperature as well as voltage profiles. The drift for
every time interval is first calculated separately and then the drifts are combined. In the
example above the following threshold voltage drifts ∆Vth,i can be computed:
The drifts cannot simply be added because the nonlinear time dependence has to be
taken into account:
1/n n
1/n 1/n
∆Vth = ∆Vth,1 + ∆Vth,2 + ∆Vth,3 (4.28)
This method also results in a degradation due to NBTI of 50 mV.
• The conducting paths P AT HN BT I,i and P AT HHCI,i for all transistors of a gate
are required. This information enables the calculation of the probabilities P (B)
(Equation 4.13) and P (D) (Equation 4.18).
• The logic function and signal slope for all internal signals of multi-stage gates.
These are required to calculate the stress probabilities.
77
4. Aging-aware static timing analysis
The determination of sensitivities and paths is discussed in the next subsections. The
logic function is obtained by a structural recognition algorithm developed at the EDA
institute at TUM. The algorithm is based on a structural recognition algorithm for
analog circuits [Massier et al., 2008]. It analyzes the pull-up as well as the pull-down
network of the single gate stages and generates the logic function of all internal nodes
and the output node. The slope of internal signals is determined when the delay and
output slope of the gate is characterized. It is stored in two-dimensional LUTs dependent
on gate input slope and output load.
For HCI the sensitivity of q with respect to ∆Ion is needed. Unfortunately, it can
not be determined directly by means of the adjoint sensitivity analysis, because ∆Ion
is, unlike ∆Vth , not a transistor parameter. But there is an equivalent circuit for a
degraded transistor due to HCI (see Figure 3.12(a)). The equivalent circuit can be used
to simulate an aged transistor on circuit level. It maps ∆Ion on a threshold voltage drift
∆Vth and a mobility degradation ∆µ0 . ∆Vth is realized by a voltage source VDeg and a
current controlled current source IDeg realizes the mobility degradation. This equivalent
circuit can be used to calculate the sensitivity χq∆Ion by means of the chain rule:
∂q ∂q ∂VDeg,n ∂q ∂IDeg,n
χq∆Ion ,n = = + (4.30)
∂∆Ion,n ∂VDeg,n ∂∆Ion,n ∂IDeg,n ∂∆Ion,n
The partial derivatives ∂q/∂VDeg and ∂q/∂IDeg are obtained by replacing all transistors
by their equivalent circuit using the adjoint sensitivity analysis. The remaining partial
derivatives can be derived from the equations for VDeg and IDeg .
78
4.4. Characterizing the standard cells
B MP B
A MP A
C MP C
Z
MN A
MN B MN C
path in a gate. An example for multiple paths is the complex gate with the logic func-
tion z = a · (b + c) shown in Figure 4.7. There exist two paths for transistor MP C . If
transistor MP B is in on-state, then the source of MP C is connected to VDD . Hence,
P AT HN BT I,1 consists just of transistor MP B . The second path P AT HN BT I,2 consists
of transistor MP A , since the drain MP C is connected to VDD if MP A is conducting.
P AT HHCI is required for HCI. It leads from VDD /ground along the considered transis-
tor to the output of the gate. P AT HHCI is determined by performing two breadth-first
searches. For a PMOS transistor, again the pull-up network is taken into account and
for an NMOS transistor the pull-down network. The first search starts at the source
terminal of the considered transistor and looks for VDD or ground, respectively. And
the second search starts at the drain terminal of the considered transistor and looks for
the gate output. It is again possible to have multiple paths. In the example with the
complex gate (Figure 4.7) two paths exist for the transistor MP A . The first path consists
of the transistors MP A and MP B and the second path consists of the transistors MP A
and MP C .
To calculate P (B) and P (D) during aging analysis, all paths P AT HN BT I,i and
P AT HHCI,i for all transistors of a gate are already determined during the characteriza-
tion of the gate and stored in the gate model.
79
4. Aging-aware static timing analysis
ternal nets. For every nominal LUT one sensitivity per NMOS transistor χqm,∆Ion and
two sensitivities per PMOS transistor (χqm,∆Vth and χqm,∆Ion ) are required. Hence, the
AgeGate model for a two input NAND gate has 56 LUTs and an OR gate with four
inputs has even 408 LUTs.
LUTs of sensitivities that have (almost) no impact on the degradation of the gate
performance can be removed. This impact is given by:
For every sensitivity it is checked whether ∆q m,p is smaller than a specified limit. For
this purpose the drift ∆pm must be specified as well. For instance, the threshold voltage
drift of a PMOS transistor has no noteworthy impact on the gate delay for a rising input
change, because in this case the pull-down network has to recharge the output load.
If 0.1 % of the nominal gate performance is chosen as the limit and the drifts are
100 mV for ∆Vth and 20 % for ∆Ion (these drift values are much larger than what can
be observed in reality), the LUTs of the NAND gate are reduced from 56 to 39 and the
LUTs for the OR gate are reduced from 408 to 168.
4.5. Results
4.5.1. Waveform dependence of parameter drift
Transistor parameter drifts and aged signal slopes are mutually dependent. A small
experiment should show, whether it is justified to calculate the parameter drifts in the
proposed approach from fresh output slopes or if an iterative approach is beneficial.
For this purpose a NOR2 ring oscillator is simulated with RelXpert (65 nm LP, 1.7 V,
145 ◦C , 700 h). In a first run, the fresh waveforms are used to degrade the transistors. In
a second run, the aged waveforms after 700 h are used. The aged waveforms are obtained
by simulating the degraded ring oscillator from the first run. The truth should be in
between those two simulations, since in reality the waveform would degrade continuously
within the 700 h affecting the parameter drift and the drift, vice versa, affecting the
signal waveform. The degradation of the oscillator frequency is 5.35 % for fresh slopes
and 5.43 % for aged slopes (see Figure 4.8). An iterative approach would give a value in
between. Hence, there is no significant advantage of an iterative approach.
This can be explained by the fact that NBTI is a static effect and the slope of the
waveform has no impact on the degradation caused by it. Only the degradation caused
by HCI is dependent on the time the signal is in transition. However, as it can be seen
later in Section 4.5.3 NBTI is the dominant aging effect.
80
4.5. Results
Figure 4.8.: Ring oscillator waveforms of fresh (leading waveform in magenta) and aged
(shifted waveforms in red and blue) simulations. The transistor drifts for
the aged simulations were determined once by the fresh waveform and the
aged waveform. Independent of which waveform was taken to determine the
drifts, the aged waveforms are almost indistinguishable
transistor level, the transistors in the transistor level netlist are replaced by the equiva-
lent circuits (see Figure 3.12(a)), the same parameter drifts as for the aging analysis on
gate level are applied and a SPICE simulation is performed. The upper diagram shows
the degradation when the device under test did not oscillate during stress. During this
static stress the device is only affected by NBTI. In the lower diagram, the device was
oscillating during stress. This time both aging effects are relevant. Simulation and aging
analysis match quite well. Measurement results were only available for the upper case.
The results show a mismatch compared to the aging analysis and the simulation. It can
be assumed that a large part of the error is caused by inaccurate degradation equations.
The degradation determined with the proposed aging analysis is a bit smaller than the
simulated degradation on transistor level. This can be explained by the linearization
of the sensitivities. As it can be seen on Figure 4.9, the degradation calculated with
linearized sensitivities is smaller than the degradation simulated on transistor level.
81
4. Aging-aware static timing analysis
10
simulation
8 aging analysis
6
4
2
0 5h 144h 500h
stress duration
Figure 4.9.: Frequency degradation of a 65 nm inverter ring oscillator stressed for 500 h
at defined stress conditions.
method for SP = 0.2 and T D = 0.2 at the primary inputs. The figure indicates that
it is not enough just to consider the most critical nominal path during aging analysis
because the order of the arrival times can change over lifetime (signals 866 and 874).
It is difficult to compare AgeGate to the different state-of-the-art aging-aware gate
models, because the published results are based on different technologies. Hence, espe-
cially the degradation equations are different. Instead, it is shown how the accuracy of
the aging analysis is increased by the special features of the proposed aging-aware gate
model. The special features are: consideration of NBTI and HCI, computation of aged
output slopes and calculation of individual parameter drifts.
In Table 4.2 the path delay degradation ∆delay of the critical path is depicted for
a worst-case analysis of the ISCAS’85 benchmark circuits. The nominal path delays
without aging (NOM) are given as a reference. The degradation due to both effects
(BOTH) as well as due to just one effect (NBTI, HCI) is analyzed. When both effects
are considered, the degradation of the critical path delay is between 12.0 % and 15.4 %.
The dominant aging effect for this technology and the chosen use profile is NBTI, with
a performance degradation of up to 12.3 %. In the last column (NO_SLP) values for
∆delay are given if no aged output slope is computed. By comparing ∆delay with and
without considering the aged output slope, it can be seen that not considering aged
output slopes results in an underestimation of the degradation by 24 % on average.
For the column BOTH also the run time on an Opteron workstation with 2.4 GHz
and 2 GB RAM is given in parenthesis. It can be seen that the proposed model can be
evaluated quickly.
For the diagram in Figure 4.11 an aging analysis with individual transistor drifts is
82
4.5. Results
Figure 4.10.: The five slowest output arrival times over lifetime for ISCAS’85 circuit
c880. Individual workloads for the gates were obtained for SP = 0.2 and
T D = 0.2 at primary inputs. Signals 866 and 874 change order with time.
Table 4.2.: Degradation of critical path delays for different analyzer settings.
83
4. Aging-aware static timing analysis
Figure 4.11.: Comparison of analysis with and without individual transistor drifts.
compared to an aging analysis where it is assumed that all transistors of a gate degrade
as much as the worst amongst them. The diagram shows the benefit of calculating
individual transistor drifts. Without individual transistor drifts the mean degradation
is overestimated by 20 %.
4.6. Summary
An aging analysis flow on gate level capable of determining the impact of the two dom-
inant drift-related aging effects on circuit timing was introduced. The developed aging-
aware gate model, AgeGate, consists of a canonical gate model, technology specific
degradation equations, and information about the internal gate structure. What dis-
tinguishes AgeGate from existing aging-aware gate models is that it considers the aged
output slope, it takes NBTI and HCI into account, and it calculates individual transistor
drifts. The results show that both aging effects are relevant, not calculating an aged
output slope underestimates the performance degradation by 24 %, and not computing
individual transistor drifts overestimates the degradation by 20 %.
84
5. Identifying possible critical paths in aged
circuits
When the operating conditions over lifetime and the individual workloads of the gates
are known, the degraded circuit delay and the critical path causing this delay can be
determined by the aging-aware timing analysis described in Chapter 4. If the operating
conditions and workload are not (exactly) known, just a worst case analysis can be
performed (see Section 4.1). Due to this uncertainty, multiple possible critical paths
(PCPs) may exist. This chapter is about identifying these PCPs.
A PCP is a path that is the critical path of a circuit for a specific combination of
operating conditions and workload. However, it would be too complex and inefficient to
identify PCPs with this definition. Hence, a weakened definition is used instead:
A possible critical path (PCP) is a path that cannot be excluded from the paths that
become the critical path of a degraded circuit for a specific combination of temperature
Tef f , supply voltage Vef f , workload of the input signals and lifetime tlif e .
This definition reflects how PCPs are determined: Those paths are identified that are
for sure no PCPs and the remaining paths are considered as PCPs.
Several applications arise from knowing the PCPs:
• The TG of a combinational circuit can be reduced until it just contains PCPs.
This reduced TG can be used as a timing model for modules, such as adders or
multipliers. Since such a timing model is generated once and can be used whenever
the module is instantiated in a more complex hierarchical design, it accelerates the
aging-aware timing verification of complex digital circuits compared to an analysis
on gate level.
• PCPs can also be utilized to monitor a system during its lifetime. The delay of
the PCPs is determined in periodic intervals and countermeasures are taken if the
path delay is no longer within the safe operating range. Such an adaptive system
can react, for instance, by reducing the clock frequency of the aged circuit.
• PCPs are also beneficial for optimizing a circuit to minimize the circuit perfor-
mance loss due to aging. Existing optimization approaches [Wu and Marculescu,
2009; Wang et al., 2009a,b; Bild et al., 2009] depend on knowing in advance the
gates that degrade the most. Hence, the operating conditions and the workload
of a circuit must be known. If the operating conditions and the workload are
unknown, the PCPs can help to optimize a circuit nevertheless. They yield the
information which gates might become critical. By combining this information
and the information which gates have a huge impact on circuit performance, those
gates can be identified that should be protected from excessive degradation.
85
5. Identifying possible critical paths in aged circuits
• Finally, PCPs are required by already published papers. Chen et al. [2011] pro-
pose a path-based aging-aware timing analysis. Wang et al. [2008] introduce node
criticality computation. By protecting the identified critical nodes the delay degra-
dation can be reduced. Both approaches need PCPs. However, they just take the
upper X % (e.g., 10 %) of the paths with the longest aged delay. This is quite
inaccurate and the number of PCPs could either be over- or underestimated.
The remainder of this chapter is organized as follows: The next section describes
prerequisites for the proposed approach. Then the method to identify the PCPs is
introduced (see Section 5.2). This method is extended in Section 5.3 by considering that
some paths must degrade even if the workload is unknown. In Section 5.4 it is described
how process variations and variations of the operating conditions can be considered as
well. Section 5.5 introduces two applications of PCPs, the generation of aging-aware
timing models and the benefit of PCPs for testing aged circuits. Results follow in
Section 5.6 and the chapter is summarized in Section 5.7.
5.1. Prerequisites
Without exact operating conditions and workloads, the degraded gate delay cannot be
exactly determined. However, it is possible to determine an interval for the gate delay.
The lower bound is the fresh gate delay (df resh ), since aging always increases the gate
delay. The upper bound is the maximal aged gate delay (daged ). To determine the upper
bound, a validity region must be defined by specifying maximal values for the effective
temperature Tef f , the effective supply voltage Vef f and the lifetime itself.
The foundation to identify the PCPs is a timing graph as described in Section 2.1.2.
However, this time the edge weights, which are given by the gate delays, are not deter-
ministic quantities but intervals. For this reason, all other timing quantities (q) (e.g AT,
D2S, SLACK, . . . ) are intervals as well. The intervals are stored as tuples. The first
element of the tuple is the fresh value qf resh and the second element is the aged value
qaged . An example for a TG with annotated nodes and edges is given in Figure 5.1.
The timing quantities that are stored at the nodes and edges can change during the
computation of the PCPs because elements of the timing graph that do not belong to
a PCP are removed. An incremental timing analysis, as described in Section 2.1.3, is
used to update the changed timing quantities whenever they are read. Every tuple has
a valid flag. When the timing quantity is read, it is first checked whether it is valid. If
not, the timing quantity is updated, stored and the valid flag is set again.
86
5.2. Identification of PCPs
Figure 5.1.: TG annotated with arrival time and delay to sink at every node.
sum(a, b) = sum([af resh , aaged ], [bf resh , baged ]) := [af resh + bf resh , aaged + baged ] (5.1)
max(a, b) := [max(af resh , bf resh ), max(aaged , baged )] (5.2)
a < b := aaged < bf resh (5.3)
Operations for subtraction, min and greater than (>) can be defined correspondingly.
Even though the goal is to determine the PCPs, it is crucial that the reduction steps
do not depend on enumerating every single path in the timing graph and decide whether
it is a PCP or not. This would make it impossible to determine the PCPs for circuits of
industrial relevance, since the number of paths increases exponentially with the number
of nodes. Two criteria are used to determine whether a path is a PCP or not:
Criterion 1: A path must have a maximal aged path delay Daged greater than the critical
path delay of the fresh circuit D(Pcrit )f resh (or just Dcrit,f resh ). Otherwise it is
not a PCP because Pcrit will always have a greater path delay.
Criterion 2: Even if a path A has an aged path delay greater than Dcrit,f resh , it might
not be a PCP. If there is another path B that has a greater path delay than A for
all possible operating conditions and workloads, path A is not a PCP.
87
5. Identifying possible critical paths in aged circuits
The slack reduction step (see Algorithm 3) has a time complexity of O(n), with n
being the number of nodes in the timing graph. The nodes are not simply removed from
the timing graph, instead the function clean_remove_node (on page 94) is called. This
function checks if additional nodes and edges can be removed from the timing graph and
assures that the remaining graph is a valid TG (see Section 5.2.6).
88
5.2. Identification of PCPs
Figure 5.2.: Illustration of path delay reduction step. Edge (b, d) can be removed because
the delay of path P is less than the delay of path Pother .
89
5. Identifying possible critical paths in aged circuits
Figure 5.3.: Illustration of arrival time reduction step. Edge (d, e) can be removed be-
cause arrival time interval along edge (d, e) is less than the arrival time at e
after the max-operation.
determines the fresh arrival time (AT(v)f resh ) and path segment U (solid line) is the
path to v that determines the maximal aged arrival time at v along the edge (u, v).
These path segments can easily be obtained because for each arrival time it is stored
from which edge it results. If the path delay interval of segment U is less than the
interval of segment V , then the edge (u, v) can be removed.
The arrival time reduction step (see Algorithm 5) has a time complexity of O(e), with
e being the number of edges in the timing graph.
This reduction step is almost equivalent to the arrival time reduction step. This time
not the delay from S to a node is considered but the delay from the node to T (D2S).
D2S is determined by computing the delay to T for all outgoing edges of a node u and
computing the maximum of them. If the delay to T along an edge (u, v) is less than the
delay to T at u, the edge can be removed (see Algorithm 6).
90
5.2. Identification of PCPs
91
5. Identifying possible critical paths in aged circuits
But how to take common edges into account? An exact method is the following: First,
all paths, which have an aged path delay slower than Dcrit,f resh , are enumerated. Then,
the delays of two paths are compared. For common edges, the fresh edge delay is assumed
for the aged edge delay as well. If one path interval is less than the other interval, this
path is not a PCP and can be removed. This exact method has an exponential time
complexity and cannot be used for complex circuits.
Baba and Mitra [2009] propose a more efficient method to consider common edges.
This method extends the arrival time reduction step. Hence, it is block-based not path-
based like the exact method. The arrival time reduction step removes an edge (u, v) if
the arrival time at v along this edge is less than the resulting arrival time at v after
the max-operation. As shown in the arrival time reduction step, this can be interpreted
as comparing two path segments. The longest path segment V consists of all edges
that determine the fresh arrival time at v after the max-operation and the longest path
segment U determines the aged arrival time at v along edge (u, v). If V and U have
common edges, then the same edge delay for common edges must be assumed.
However, just adding up the updated edge delays along the path segments is not
enough. By changing the common gate delays, U and V itself could have changed. In
the example (see Figure 5.4) by setting the aged delay of (a, b) to 2, the aged arrival time
at b is now determined by the second incoming edge (f, b). In [Baba and Mitra, 2009]
this is solved by setting the aged delay of common edges to the fresh value and running
the STA again to determine the changed arrival times. Hence, whenever common edges
are detected the STA is performed again with changed edge delays to decide whether an
edge can be removed or not1 . This takes a lot of time as can be seen from the results of
[Baba and Mitra, 2009].
In the proposed approach it is not necessary to run the STA again. This is possible
because the join-slacks (see Section 2.1.5) indicate how far the gate delay can be de-
creased before the arrival time at a node is determined by another edge. In the example,
the aged join-slack between edge (a, b) and edge (f, b) indicates that when the aged gate
delay is reduced by more than 2 time units, the arrival time is determined by edge (f, b).
For the path segment U two path delay intervals are calculated; the path segment delay
D(U ) when common edges are not considered and the path segment delay D(U ) when
for common edges the fresh gate delay is used. To decide whether an edge (u, v) can be
removed, the following cases have to be distinguished (see Figure 5.6):
1. If D(U ) < D(V ), then remove common edge:
Even without considering common edges, the edge (u, v) can be removed.
2. If D(U ) not < D(V ), then do not remove common edge:
The edge cannot be removed, because even if the fresh delay is used for common
edges the path delay is still too large.
3. Else (D(U ) < D(V ) and D(U ) not < D(V )), it depends on U 0 :
If Vf resh is between D(U )aged and D(U )aged , it depends on the path segment U 0
1
After that the common edge delays have to be reset and the STA has to be run once more to get back
the original state of the timing graph.
92
5.2. Identification of PCPs
with the next smaller delay than U whether (u, v) can be removed or not.
a) If D(U 0 ) < D(V ), then remove common edge:
This is like case 1. From U it cannot be decided whether (u, v) can be removed
but from U 0 .
b) If D(U 0 ) not < D(V ), then do not remove common edge:
This is like case 2. From U it cannot be decide whether (u, v) can be removed
but from U 0 .
c) If D(U 0 ) < D(V ) and D(U 0 ) not < D(V ), it depends on next U 0 :
Like case 3. Look at the U 0 with the next smaller delay.
d) If no U 0 , then remove common edge:
If there is no path segment U 0 with a smaller delay than U , then the delay
D(U ) can be assumed and the edge is removed.
The pseudo code for this reduction step is given in Algorithm 7. The only differ-
ence compared to the Algorithm 5 for the arrival time reduction step is the function
edge_can_be_removed, which checks for the different cases given above that have to be
considered.
The delay to sink reduction step can be extended in a similar way to take common
edges into account. In this case the branch-slacks are required to iterate over the path
segments from a node to T .
With the exact method more edges can be removed, because all paths that share
common edges are compared to each other. The example in Figure 5.5 illustrates the
difference: Let’s assume path A (solid line) and B (dashed line) are PCPs, but path
C (dotted line) is not a PCP because the path delay interval of C is smaller than the
interval of A when common edges are considered. However, path C is not slower than
path B. When just the longest path segments at x are compared to each other, A and
C are never compared and path C is not removed from the PCPs.
Whenever case 3.c is detected, the next longest path segment U 0 must be determined.
In order not to have a worst-case time complexity dependent on the number of paths
93
5. Identifying possible critical paths in aged circuits
Figure 5.5.: Example that shows difference between proposed and exact method for com-
mon edges.
94
5.3. Realistic aged path delays
U'
u
U v
V V
Path V
Figure 5.6.: Graphical representation of the common edge reduction step cases. Edge
(u, v) can be removed if aged delay of path U is smaller than fresh delay of
path V .
95
5. Identifying possible critical paths in aged circuits
In this section it is investigated whether it is justified to use intervals for gate and path
delays, or not2 .
In the following, it is shown that intervals for the gate delay are justified. It is shown
as well that the upper bound of the path delay interval is realistic as long as a given
path is statically sensitizable. The lower bound of a path delay can also be reached as
long as just one input transition is considered. But the lower bound of the interval is
often too pessimistic if for a given path the maximum of the delays for a rising and a
falling input transition are considered. This can be used to further reduce the number
of PCPs.
96
5.3. Realistic aged path delays
input transition
IN OUT
Figure 5.7.: Path delay of an inverter chain (10 inverters) with respect to SP at the
input.
The inverter chain can still degrade maximal (for SP IN = 1 or SP IN = 0). However,
it is no longer possible that the path does not degrade at all. The gate delays for one
transition do not degrade when the delays for the opposite transition degrade the most
and vice versa. The inverter chain now degrades minimal for SP IN = 0.5 (intersection
of solid lines), but the minimal degradation is already 85 % of the maximal degradation.
97
5. Identifying possible critical paths in aged circuits
1 0
Hence, the input and output of a gate are two consecutive nodes i and i+1. fi denotes the
logic function dependent on the primary inputs of a node i is fi . The static sensitization
condition of a logic gate is given by the Boolean difference:
∂fi+1
= fi+1fi ⊕ fi+1f (5.6)
∂fi i
The sensitization condition specifies the input vectors for which a transition at the gate
input propagates to the gate output. A path is statically sensitizable if all the gates
along the path fulfill the sensitization condition:
m−1
∂fi+1
=1 (5.7)
Y
i=0
∂fi
First constraint for the optimization is that the signal probabilities are between 0 % and
100 %:
s.t. 0 ≤ SP ≤ 1 (5.9)
98
5.3. Realistic aged path delays
An exact approach to this minimization problem would be to use the signal proba-
bilities at the PIs as free variables. The aged delay of the considered path depends on
the signal probabilities of the on-path gate inputs and the off-path gate inputs (side
inputs). The relation between the signal probabilities at the gate inputs and the signal
probabilities at the PIs is given by the logic interconnection. Considering this during
the optimization would lead to a complex nonlinear optimization problem with multiple
local minima.
To find the global minimum efficiently, the problem is simplified and a valid lower
bound for the minimization problem is obtained. It is assumed that the signal probabil-
ities at the side inputs can be chosen in a way that the aged gate delay becomes minimal
without considering the logic interconnection. Only the logic interconnection of the path
itself is considered, the logic interconnection of the rest of the circuit is neglected. This
enables us to minimize the aged path delay further than would be possible without this
simplification. Hence, one gets a valid lower bound of the minimal aged path delay.
The free variables SP are now only the signal probabilities at the on-path gate inputs.
Hence, the path delay dependent on SP is required.
First, the gate delay degradation ∆d of an edge (i, o) for a falling input transition
depends on SP i of the on-path gate input i:
n is the time exponent given in the degradation equation 3.7. The factor ki combines the
other dependencies (operating conditions, lifetime) which are fixed for this optimization.
The path delays for a rising and a falling input transition can now be written as:
l∈Nr
l∈Nf
SP l is the signal probability at the node l. Nr (Nf ) are sets of gate inputs along the
path which have a falling input transition for a rising (falling) input at the path input.
Additional constraints consider that the values for SP cannot be chosen freely, since
the signal probability at a gate output depends on the signal probabilities at the gate
inputs. For an inverter, the signal probability at the output SP o is given by 1 − SP i at
the input. For a NOR gate with two inputs, the signal probability at the output SP o
depends on both inputs SP i and SP j :
SP o = (1 − SP i ) · (1 − SP j ) (5.13)
Let’s assume i is the on-path input and j is the side input. SP j is not a free variable for
the optimization, but it affects SP o which is again a free variable for the optimization.
This can be considered by solving (5.13) for SP j :
SP o
SP j = 1 − (5.14)
1 − SP i
99
5. Identifying possible critical paths in aged circuits
1
NAND
SPo
IN
V
NOR
0 SPi 1
Figure 5.9.: Graphical representation of the constraints for the gate types.
By taking into account that SP j is between 0 and 1, the following two relations between
SP i and SP o can be obtained:
SP o
0≤1− (5.15)
1 − SP i
SP o
1≥1− (5.16)
1 − SP i
From 5.15 the following constraint for a NOR gate can be derived:
In Appendix A it is shown how the constraint for the NAND gate is derived and that
these constraints are also valid if a NAND or NOR gate has more than two inputs.
The diagram in Figure 5.9 shows the constraints for the gate types graphically. The
optimization tries to choose the SP s in such a way that the gates degrade as little as
possible. Hence, the SP at the gate input should be 1. The signal probability at the
gate output should also be 1, because the gate output is the input of the succeeding
gate. However for an inverter, having a SP of 1 at the input means that the SP at the
output is 0. The inverter does not degrade but the succeeding gate degrades maximal
(this increases the path delay for the opposite transition at the input). The same is true
for a NOR gate. If the SP at the input is 1, then the SP at the output is 0. Only for a
NAND gate it is possible to have a SP of 1 at the input and the output.
The equality and inequality constraints ( 5.9, 5.17, 5.18, 5.19) are linear but the cost
function ( 5.8, 5.11) is nonlinear. Unfortunately, this nonlinear optimization problem
100
5.3. Realistic aged path delays
still has multiple local minima. Due to that the optimization problem was transformed
into a linear optimization problem by setting the time exponent n to 1:
l∈Nr
l∈Nf
s.t. s ≥ D(Pr )
s ≥ D(Pf ))
Now the minimization problem can be solved efficiently. The solution of this linear
problem is still a valid lower bound for the minimal path delay. This can be seen by
looking once again at Figure 5.7. Shown are the exact path delays (solid lines) as
well as the linearized path delays (dashed lines). The intersection of both dashed lines
is the minimum of the maximum of both path delays. The minimal aged path delay
degradation after linearization is 50 % of the maximal aged path degradation, compared
to 85 % in the exact case. Hence, it is a valid lower bound.
The degradation of the gate delay in Equation 5.10 is just dependent on the signal
probability of the on-path input (switching input). This is correct for an inverter because
it has just one input. For a NAND gate it is correct as well, since the delay degradation
of the timing arc from the switching input to the output (almost) entirely depends on
the signal probability at the switching input due to the parallel connection of the PMOS
transistors. However for a NOR gate, this is not the case. The PMOS transistors are
connected in series. If the PMOS transistor that is nearest to the supply voltage is
connected to an input with a SP of 1, then the gate does not degrade. In the path delay
equations (5.20, 5.21) this is considered by removing those NOR gates from the sets Nr
and Nf where the PMOS transistor of the switching input is not directly connected to
the supply voltage. Otherwise, a side input is connected to the PMOS transistor that is
directly connected to the supply voltage and the signal probability of this input can be
chosen freely since the interconnection of the rest of the circuit is neglected.
101
5. Identifying possible critical paths in aged circuits
slower maximal aged path delay will never have a minimal aged path delay greater than
D(Pcrit )f resh .
The number of paths that have to be considered in the exact method might be too
many. Instead, again a lower bound for the minimal aged circuit delay is determined by
obtaining the minimal aged path delay of the N slowest paths and taking the maximum
of them.
5.3.7. Wrap-up
This section was about investigating whether intervals for the gate and path delays are
justified. It was shown that an interval for the aged gate delay is justified and the
upper bound of the aged path delay interval is justified as well if the path is statically
sensitizable.
The lower bound of the aged path delay interval is equal to the fresh path delay.
However, for many paths the minimal aged path delay is unequal to the fresh path
delay if the maximum of the path delay for both input transitions is considered. This
is because due to the inverting characteristic of CMOS logic it is not possible that the
gate itself and its succeeding gate do not degrade (if the gate is an inverter or a NOR
gate).
An optimization problem was formulated to obtain the minimal aged path delay. The
optimization problem was too complex to solve it exactly. By simplifying3 and linearizing
the optimization problem, it could be efficiently solved. The solution is still a valid lower
bound for the minimal aged circuit delay.
From the minimal aged path delay a minimal aged circuit delay can be determined.
This minimal aged circuit delay could be used to further reduce the number of PCPs by
refining the Condition 1.
102
5.4. Considering process variations
Besides aging, process variation is another limiting factor for circuit reliability. So
far, process variation was not considered and deterministic values are assumed for the
gate delays. However, due to process variation even the fresh gate delays cannot be
determined exactly.
Until recently, global process variations and uncertainties of the current operating
conditions (Tcurr and Vcurr ) are considered by corner cases. Corner cases can be used as
well to take global process variation and uncertainties into account when the PCPs are
determined: All PCPs are determined by obtaining the PCPs for the different corner
cases and computing the union of them.
Due to ongoing miniaturization, local process variation within a single chip has in-
creased so much that it can no longer be neglected. Local variations can not be con-
sidered with corner cases. For that purpose, SSTA has been developed which models
timing quantities (e.g., delay, arrival time, slack) as probability distributions.
Figure 5.10 illustrates the idea of how aging analysis and SSTA can be combined. In
the nominal case, timing quantities are deterministic values. On the one hand, there is
aging. Due to aging, those deterministic timing quantities become time dependent. To
identify the PCPs, for each timing quantity an interval is considered. On the other hand,
there is process variation. It results in a probability distribution for timing quantities.
Combining aging and process variation results in an interval with random variables as
lower and upper bounds.
i=1
a0 is the nominal value, xi represents the variation of n global sources and xr is a
random variable modeling the pure random effect of process variation. ai and ar are the
sensitivities of the timing quantities to xi and xr , respectively. ai and ar are scaled that
xi and xr are Gaussian distributions with zero mean and unit variance (N (0, 1)).
For a block-based timing analysis two operations are required: sum and max. The sum
ŝ of the random variables â and b̂ = b0 + ni=1 bi · xi + br · xr,b is obtained by adding the
P
coefficients of the global variation si = ai +bi . The independent variation ar ·xr,a +br ·xr,b
is replaced by sr · xr,s . sr is determined by matching the variance of ar · xr,a + br · xr,b
and sr · xr,s .
To compute the maximum m̂ = max(â, b̂), the tightness probability Ta is required. Ta
is the probability that â is greater than b̂:
a0 − b0
Ta = P (â > b̂) = Φ( ) (5.24)
θ
q
Φ is the cumulative distribution function and θ = σa2 + σb2 − 2cov. σa2 and σb2 are the
variances and cov is the covariance of â and b̂. The tightness probability Tb is (1 − Ta ).
103
5. Identifying possible critical paths in aged circuits
P (d)
d
Pro
vari
g
Agin
cess
atio
ns
P (d)
P (d)
t
df resh daged dˆ
P (d)
Figure 5.10.: Basic idea for combining aging effects and process variations.
104
5.4. Considering process variations
i=1
The result of the sum and the max of two canonical forms is again a canonical form.
Hence, all timing quantities of the timing graph can be expressed as canonical forms.
i=1
n
q̂ aged = q aged + q i xi + q r xr (5.29)
X
i=1
For âaged the impact of aging and the impact of process variation can be added because
they are independent (see [Fischer et al., 2008]).
For intervals with random variables the operations sum, max and greater than (“>”)
have to be defined as well. For sum, the lower bounds and upper bounds are added similar
to Equation 5.1. For max, the maximum of the lower bounds and the maximum of the
upper bounds is calculated like in Equation 5.2. For intervals with random variables,
the greater than operation returns a probability:
The combined statistical and aging-aware timing analysis is used but not limited to
determining the PCPs. It can as well be used independent of that for an aging-aware
105
5. Identifying possible critical paths in aged circuits
SSTA. What is not taken into account, so far, is that the transistor parameter drifts
are also probability distributions. Aging effects are statistical processes. Two identical
transistors under identical conditions age differently. To consider this also the nominal
aged value aaged would be a random variable.
5.5. Applications
After introducing the methods to identify the PCPs, here are two applications of PCPs.
Timing models that take process variation into account have already been published
[Garg and Marculescu, 2007; Li et al., 2009]. To the best of my knowledge, this is the
first aging-aware timing model above gate level.
Such models can, for instance, be used in high-level synthesis (HLS). One important
step in HLS is scheduling. During scheduling, arithmetic/logical operands are mapped
on time slots of duration T0 (see Figure 5.11). Therefore, a pre-characterized library with
different implementations of modules is required. The single implementations differ in
their characteristics (delay, area, power). The schedule is generated by choosing optimal
implementations from the pre-characterized library [Coussy and Morawiec, 2008].
When a module ages, its delay increases. If this is not taken into account during
synthesis, it is possible that the system fails before the end of its specified lifetime
because the time for performing a calculation is no longer sufficient. When a module is
106
5.5. Applications
+ +
aging
T0 + +
Figure 5.11.: The dotted circles indicate the aged performances. The circuit fails because
the second adder needs the result before the first adder has finished its
calculation.
1
7
2 10
6
7
S 3 T 2
8 6 10
11
4 S T
9 8
11
5 4
(a) (b)
Figure 5.12.: The timing graph of the ISCAS’85 circuit c17 is shown in (a). This is
a simplified TG because each net is just represented by one node. An
example of a reduced TG is shown in (b).
characterized for the library, it is unknown how the cell will be utilized. Therefore, a
timing model is needed which provides the delay of a module dependent on operating
conditions over lifetime and workload.
The fundamental idea is to use a strongly reduced TG as a timing model. The maximal
aged circuit delay is determined by a PCP. Hence, it is not necessary to consider the
complete timing graph of a module (see Figure 5.12(a)), but it is enough to just consider
the part of the TG that consists of edges that belong to PCPs (see Figure 5.12(b)).
The timing model is characterized by generating the reduced timing graph that just
contains edges that are part of PCPs. When an aging-aware timing analysis is performed
and the aged delay for the module is needed, the reduced timing graph is evaluated. First,
the workload from the module inputs has to be propagated to the nodes of the timing
graph. Then, the delays of the remaining edges of the reduced timing graph can be
computed by means of AgeGate.
The timing model [Lorenz et al., 2010a,b] is a gray-box model, because it takes the
internal structure of the module into account. It is as accurate as an aging analysis on
gate level, but it is much faster. The speed-up of the timing model depends on how far
107
5. Identifying possible critical paths in aged circuits
200
150
Counts
100
50
0 0 2 4 6 8 10
Circuit delay degradation [%]
the timing graph could be reduced. The results show a mean speed-up of 30 ×.
108
5.5. Applications
109
5. Identifying possible critical paths in aged circuits
D Q
Hold D Q
Latch
SCAN SCAN
SCAN EN SCAN EN
combinational logic
D Q
Hold D Q
Latch
SCAN SCAN
SCAN EN 1 SCAN EN
0
D Q
Hold D Q
Latch
SCAN SCAN
SCAN EN SCAN EN
Figure 5.14.: Enhanced-scan design. The standard scan design is extended by hold
latches. Thereby, the first delay test vector V1 is latched by the hold
latches while the second delay test vector V2 is read into the scan chain.
110
5.5. Applications
• By testing all PCPs, the system does not only know that a circuit degrades but it
also knows which paths of that circuit are too slow. Hence, the circuit is just too
slow for several input vectors which have to be avoided.
Testability of paths
When nodes or edges are removed, it is crucial that all remaining PCPs are testable,
otherwise it might not be possible to determine the degradation of the current critical
path. This has to be considered in the arrival time and in the delay to sink reduction
step. To remove an edge from the timing graph, it is checked whether a path segment A
has a larger delay than a path segment B. An edge is only removed if the path segment
with the greater delay is statically sensitizable. In fact, it is enough if there is a path
segment which is testable and has a greater delay than path segment A to remove the
edge. It does not necessarily have to be the path segment B.
111
5. Identifying possible critical paths in aged circuits
them all, they are too many to test them all as well. Therefore, at least the final reduction
step that works on an already reduced TG may be path-based. This final reduction step
doesn’t remove any nodes or edges, it just identifies those paths in the reduced TG that
have a path delay greater the fresh critical path delay. This is done by enumerating all
paths with respect to the path delay in descending order. The enumeration is stopped
when the first path has a maximal aged path delay less than the required time at T (see
Figure 5.15).
Finding paths for the purpose of testing a circuit is a research topic for quite some time.
The first approaches just considered the nominal gate delay [Li et al., 1989; Sharma and
Patel, 2002]. Their goal was to identify paths to test all gates for delay faults. Later,
process variation was considered [Lu et al., 2005; Zolotov et al., 2010] by identifying
critical paths for all process space conditions.
Two other publications are concerned with testing aged circuits. Wang et al. [2007a]
introduce path-enumerative methods to identify paths which exceed the clock period
under worst-case aging conditions. A optimization problem is set up to obtain these
maximal aged path delays. However, a mistake was made by not considering that NBTI
just degrades every other gate (those with a falling input transition). Without this
mistake, the maximal aged path delay would be equal to the upper bound of the path
delay intervals, as it is discussed in Section 5.3.3.
Baba and Mitra [2009] proposed a method to identify the paths of an aged circuit that
must be tested. Gate delay intervals are defined and methods are introduced to remove
nodes and edges. Their approach is significantly improved in the following points:
• The impact of process variation is considered when the PCPs are determined (Sec-
tion 5.4).
112
5.6. Results
• The correlation of gate delays along a path is taken into account (Section 5.3). In
[Baba and Mitra, 2009] path delays are always intervals.
• Baba and Mitra [2009] determine the PCPs first and in a separate step is checked
whether those paths are sensitizable or not. If a path is detected not to be sensiti-
zable, it must be checked whether the removal of other paths from the PCPs was
unjustified.
• An aging-aware STA has just to be performed once. In [Baba and Mitra, 2009] a
STA has to be run again whenever a common edge is detected (Section 5.2.5).
• The final reduction step to determine the PCPs for testing aged circuits is enumer-
ative, which allows the removal of some additional paths from the already reduced
TG.
• In the results section it is shown that by calculating the number of PCPs for
different lifetimes the number of paths that must be tested in the beginning of the
lifetime can significantly be reduced.
5.6. Results
The proposed approach is tested with ISCAS’85 and ITC’99 benchmark circuits. The
circuits are synthesized with an industrial 90 nm cell library. To generate the aging-
aware gate models, single staged gates (inverters, NOR and NAND gates with 2 to 4
inputs) from the library are characterized. The operating conditions are 1.32 V, 125 ◦C
and a specified lifetime of 10 years. Those harsh conditions result in a large maximal
threshold voltage drift (17 % of nominal threshold voltage) and, therefore, lead to large
intervals for the gate delays.
The benchmark circuits are used for the following investigations:
• It is checked how far the timing graph can be reduced. This is relevant for the
aging-aware timing model.
• And the number of PCPs for the circuits is obtained. The PCPs are important for
testing the circuits during the lifetime.
113
5. Identifying possible critical paths in aged circuits
Daged,min − Df resh
∆Daged,min = (5.32)
Daged − Df resh
• The linearization of the gate delay dependencies results in a smaller minimal aged
delay (50 % compared to 85 % for the inverter chain).
• It depends on the gate types along a given path. As shown in Section 5.3.4, only
the constraints for an inverter and a NOR gate prevent that the gates along the
path do not age (If a given path just consists of NAND gates, then Daged,min would
be equal to Df resh ).
• Furthermore, the difference of the fresh path delay for a rising D(Pr )f resh and
a falling D(Pf )f resh input transition is relevant. Minimized is the maximum of
D(Pr )aged and D(Pf )aged . If the difference is too large, the maximum does not
chang, since the SPs are chosen in a way that only the smaller of both path delays
is increased.
114
5.6. Results
115
5. Identifying possible critical paths in aged circuits
margin it is almost inevitable that the system has to react because the circuit degrades
to much.
As Baba and Mitra [2009] do not take process variations (PVs) into account,the results
are first compared without considering PVs. The proposed approach can reduce the
number of PCPs compared to Baba and Mitra [2009] by a factor of 2.7 × (column
Impr. in Table 5.3). For all circuits, except for c6288 and b19, the number of PCPs
is reasonably small, so that it is feasible to test them all. For circuit c6288 and b19, a
traditional worst-case design must be used. For the other circuits a better-than-worst-
case design can be used by testing all identified PCPs periodically. It seems that the
number of identified PCPs is more dependent on the TG structure than on the pure
circuit size: Circuit c6288 with just 2600 gates has over 1012 PCPs, however, b18 with
over 80 000 gates just has 236 PCPs.
The runtime to determine all PCPs of a circuit with considering PVs is 30 min on
average on a workstation with a 2.4 GHz CPU and 8 GB RAM. Without circuit b19,
which took about 7 h, the average runtime of the remaining circuits is 10 min.
Finally, the last two columns show the results when all reduction steps are performed
and local PVs are considered. The δ, which defines when a timing quantity is considered
greater than another quantity, is set to 0.9. Hence, a timing quantity is considered
greater than another one when the probability for this is greater 90 %. For the moment
just one source of variation is considered, namely the pure random variation of the
threshold voltage xr . xr is set to 10 % of the nominal Vth . However, as described in
Visweswariah et al. [2006] an arbitrary number of varying parameters can be considered.
Due to the uncertainty of the gate delays introduced by PVs, the number of PCPs that
have to be tested is increased.
More detailed results for all benchmark circuits are given in the Appendix B. There, it
is shown how far the number of PCPs can be reduced by the individual reduction steps.
The test time can be further reduced by determining sets of PCPs for different time
periods (see Table 5.4). The PCPs for circuit c3540 are calculated every 2 years until
the specified lifetime is reached. Hence, for the first 2 years just 175 paths have to be
checked. The number of paths to be checked increases with time and in the last two
years all 1318 have to be checked.
5.7. Summary
Aging is one of the main factors limiting the reliability of nano-scale circuits. The
degradation of a circuit strongly depends on operating conditions and workload. If
those conditions are not (yet) known, it is hard to accurately predict the degraded
timing behavior of a circuit, which is given by the delay of the critical path. A method
is proposed to identify all paths of a circuit that may become critical due to degradation,
the so called possible critical paths.
First, this is done by introducing intervals for gate delays, since the exact delays are
unknown. Later, it is shown that those intervals for the path delay are sometimes too
pessimistic and an efficient method to calculate a lower bound for the path delay, called
116
5.7. Summary
Table 5.3.: Comparison of the proposed approach to the approach from Baba and Mitra
[2009]. Shown are the initial number of gates and paths in the circuit, the
number of PCPs, the improvement in the number of PCPs of our approach
compared to Baba and Mitra [2009] and the runtimes with and without
considering process variations.
a
Number of PCPs for those circuits is determined without checking statical sensitizability because
the BDDs for the circuits were too large and could not be set up. However, statical sensitizability
checking should be possible with a SAT-solver based approach (e.g. Drechsler et al. [2008]).
Time [y] 2 4 6 8 10
# PCPs 175 396 773 1114 1318
117
5. Identifying possible critical paths in aged circuits
minimal aged path delay, is presented. A way to incorporate process variation when the
PCPs are determined is introduced as well.
Two applications for PCPs are given: An aging-aware timing model for modules and
the usage of PCPs to monitor a circuit in the field. The results show that the timing
model has a mean speed-up of 30 × compared to a timing analysis on gate level and the
number of paths that must be tested can be reduced by 2.7 × compared to a state-of-
the-art approach.
118
6. Conclusion
Aging leads to a time-dependent change of device parameters. Unlike other effects that
cause a variation of ICs, aging effects have not received much attention yet. However
due to the ongoing miniaturization, the degradation of the circuit performance caused
by aging effects increases. Furthermore, the performance gain by moving from one tech-
nology to the next decreases. Hence, generous safety margins are no longer affordable,
since this makes the transition to the latest technology generations uneconomical. To
enable the continued scaling, new design techniques are required that allow the reduction
of the safety margins. The contribution of this thesis are very accurate analyzing and
monitoring methods to determine the timing degradation of aged circuits.
First, the two dominant drift-related aging effects were investigated. It was shown
how the parameter drift can be modeled and which impact those drifts have on the gate
performances. It turned out that the gate delay as well as the output slope is increased.
However, the power dissipation of a gate is not affected or even slightly reduced.
An aging analysis flow on gate level capable of determining the impact of the two
dominant drift-related aging effects on circuit timing was developed and implemented.
The centerpiece of the analysis flow is an aging-aware gate model called AgeGate. Age-
Gate consists of a canonical gate model, technology specific degradation equations, and
information about the internal gate structure. In contrast to existing aging-aware gate
models, AgeGate takes NBTI and HCI into account, it does not just compute an aged
gate delay but an aged output slope as well and, last but not least, it considers individual
transistor drifts. The results show that both aging effects are relevant, not calculating
an aged output slope underestimates the performance degradation by 24 % on average,
and not computing individual transistor drifts overestimates the degradation by 20 % on
average.
The continued scaling requires that the design is done on higher and higher levels of
abstraction. Based on AgeGate, an aging-aware timing model for modules was proposed.
The basic idea of the timing model was to determine all possible critical paths of a
module that might become critical due to aging. This is done by removing all elements
of a timing graph that do not belong to a possible critical path. This way, the timing
model is as accurate as an aging-aware timing analysis on gate level but a mean speed-up
of 30 × (maximum speed-up 96 ×) could be achieved.
Aging is an increasing reliability concern in advanced technologies. The timing degra-
dation of a circuit strongly depends on the workload and the operating conditions over
lifetime. However, often these factors are unknown during the design of a circuit. A
method that monitors the circuit by testing the delay of all possible critical paths was
introduced. This way, countermeasures must only be taken if the circuit degrades too
much. The circuit is more competitive, since it must not be designed for worst-case con-
119
6. Conclusion
120
A. Constraints for NAND and NOR gates
First, the constraint for a NAND gate with two inputs is derived. The SP o at the output
of the NAND gate is:
SP o = 1 − SP i · SP j (A.1)
Solving Equation A.1 for the side input SP j :
1 − SP o
SP j = (A.2)
SP i
Taking into account that SP j is between 0 and 1 gives the following constraint:
0 ≤ SP j ≤ 1 (A.3)
1 − SP o
0≤ ≤1 (A.4)
SP i
SP o ≥ 1 − SP i (A.5)
Next the constraint for a NAND gate with three inputs is derived:
SP o = 1 − SP i · SP j · SP k (A.6)
1 − SP o
SP k = (A.7)
SP i · SP j
0 ≤ SP k ≤ 1 (A.8)
1 − SP o
0≤ ≤1 (A.9)
SP i · SP j
SP o ≥ 1 − SP i · SP j (A.10)
By considering that SP j is between 0 and 1, the lower bound of the inequality for SP o
of a three input NAND gate is equal to the constraint for a NAND gate with two inputs:
SP o ≥ 1 − SP i (A.11)
SP o = (1 − SP i ) · (1 − SP j ) · (1 − SP k ) (A.12)
121
A. Constraints for NAND and NOR gates
SP o
SP k = 1 − (A.13)
(1 − SP i ) · (1 − SP j )
0 ≤ SP k ≤ 1 (A.14)
SP o
0≤1− ≤1 (A.15)
(1 − SP i ) · (1 − SP j )
SP o
0≤ ≤1 (A.16)
(1 − SP i ) · (1 − SP j )
SP o ≤ (1 − SP i ) · (1 − SP j ) (A.17)
By considering that SP j is between 0 and 1, the upper bound of the inequality for SP o
of a three (or a n) input NOR gate is equal to the constraint for a two input NOR gate:
SP o ≤ (1 − SP i ) (A.18)
122
B. More detailed results for PCP
identification
Table B.1 shows the number of PCPs and the corresponding runtimes for all ISCAS’95
and ITC’99 circuits. The reduction steps, as discussed in Section 5.2, are applied to the
initial TG one after another. First, the slack reduction step is performed. Next, the
path delay reduction step is applied to the already reduced TG. Then, the arrival time
and the delay to sink reduction steps are performed. The column “All reduction steps
considering minimum aged circuit delay” shows the resulting number of PCPs when
Dcrit,f resh is replaced by Dcrit,aged,min (which is relevant for the slack, the path delay
and the pathbased reduction steps) and all reduction steps are performed again. Finally,
the last column shows the number of PCPs and the runtime when the same reduction
steps as in the previous column are performed but process variations are considered as
well.
123
Initial Slack reduc- Path delay reduc- Arrival time and All reduction All reduction
tion step tion step delay to sink re- steps considering steps considering
duction steps minimum aged minimum aged
circuit delay circuit delay and
process variations
# Gates # Paths # PCPs ([s]) # PCPs ([s]) # PCPs ([s]) # PCPs ([s]) # PCPs ([s])
c17 7 18 6 (0.00) 6 (0.00) 3 (0.00) 3 (0.00) 5 (0.19)
c432 226 123652 22378 (0.17) 13126 (0.01) 157 (0.05) 157 (0.08) 167 (24.62)
c499 534 452608 69517 (0.99) 29518 (0.05) 1487 (1.24) 375 (2.53) 696 (65.25)
c880 438 16956 1650 (0.58) 1007 (0.01) 98 (0.05) 74 (0.08) 118 (18.92)
c1355 589 522368 72216 (0.74) 40432 (0.05) 3376 (2.55) 2224 (4.90) 3074 (117.91)
c1908 431 1.5e+06 138412 (0.86) 130736 (0.02) 4596 (16.49) 2091 (23.44) 5407 (394.14)
c2670 708 31286 2402 (1.98) 1068 (0.01) 21 (0.02) 21 (0.04) 46 (23.42)
c3540 905 4.2e+06 923326 (0.80) 357372 (0.06) 15276 (0.45) 1345 (0.45) 2608 (173.93)
c5315 1484 738816 82572 (4.44) 75368 (0.02) 1568 (0.06) 899 (0.11) 2012 (94.40)
c6288a 2601 5.1e+16 3.5e+16 (0.92) 2.3e+16 (0.33) 6.8e+12 (6.99) 4.1e+12 (10.92) 9.7e+12 (661.31)
c7552 2242 448564 18310 (6.08) 9480 (0.03) 3173 (0.12) 522 (0.26) 942 (79.07)
b02 28 72 4 (0.02) 3 (0.00) 3 (0.00) 3 (0.00) 3 (0.55)
b03 233 1632 164 (0.20) 131 (0.01) 80 (0.03) 80 (0.17) 96 (9.97)
b04 540 185324 17172 (1.45) 15336 (0.01) 120 (0.03) 120 (0.05) 378 (39.64)
B. More detailed results for PCP identification
b05 1156 189666 19653 (1.48) 12392 (0.04) 3295 (0.17) 981 (0.53) 1672 (61.59)
b06 74 238 23 (0.03) 18 (0.00) 10 (0.01) 7 (0.02) 16 (2.08)
b07 615 4046 37 (1.10) 35 (0.01) 10 (0.01) 10 (0.03) 24 (14.50)
b08 155 2632 176 (0.13) 91 (0.00) 9 (0.01) 9 (0.02) 6 (11.08)
b09 167 1858 270 (0.18) 200 (0.00) 83 (0.04) 59 (0.08) 40 (7.78)
b10 213 1790 183 (0.17) 143 (0.00) 45 (0.02) 42 (0.07) 55 (9.33)
b11 1050 7156 106 (2.17) 87 (0.00) 21 (0.02) 16 (0.05) 25 (21.77)
b12 1497 20020 708 (2.66) 523 (0.02) 237 (0.09) 144 (0.35) 247 (32.79)
b13 307 1216 3 (0.45) 3 (0.00) 2 (0.00) 2 (0.01) 5 (7.89)
b14 5718 2e+08 6.1e+06 (24.44) 5.2e+06 (0.14) 52170 (32.75) 10948 (46.50) 49004 (1340.81)
b15 10236 2e+07 1.7e+06 (28.39) 894146 (0.38) 42089 (2.29) 6387 (10.95) 14116 (649.80)
b17 24840 6.1e+07 10898 (212.04) 7608 (0.04) 275 (0.21) 173 (0.63) 238 (782.09)
b18a 83679 2.6e+26 2.6e+07 (1001.56) 2.4e+07 (0.06) 236 (0.14) 236 (0.23) 608 (4642.54)
b19a 144747 4.7e+22 2.2e+22 (2133.45) 1.8e+22 (2.48) 1e+19 (67.52) 1e+19 (228.69) 2.4e+19 (25182.16)
b20a 13097 6.7e+12 2.7e+06 (82.07) 2.5e+06 (0.13) 6138 (0.99) 5283 (1.09) 9998 (890.82)
b21 13052 7.2e+12 7.6e+11 (61.67) 7.3e+11 (0.20) 3796 (1.05) 3452 (1.85) 5755 (771.66)
b22 19731 6.7e+12 519916 (114.18) 385166 (0.12) 2436 (0.69) 1489 (1.03) 2828 (871.60)
Table B.1.: Detailed results for the proposed reduction steps
124
a
Number of PCPs for those circuits is determined without checking statical sensitizability because the BDDs for the circuits were too large and
could not be set up. However, statical sensitizability checking should be possible with a SAT-solver based approach (e.g. Drechsler et al. [2008].)
Bibliography
M. Agarwal, Bipul C. Paul, Ming Zhang, and Subhasish Mitra. Circuit Failure Prediction
and Its Application to Transistor Aging. In IEEE VLSI Test Symposium, pages 277–
286, May 2007.
Charles J. Alpert, Anirudh Devgan, and Stephen T. Quay. Buffer Insertion for Noise
and Delay Optimization. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 18(11):1633–, November 1999.
Todd Austin, Valeria Bertacco, Scott Mahlke, and Yu Cao. Reliable Sytems on Unreliable
Fabrics. IEEE Design and Test, 2008.
A. H. Baba and Subhasish Mitra. Testing for transistor aging. In IEEE VLSI Test
Symposium, pages 215 – 220, May 2009.
Thomas Baumann, Stefan Drapatz, Georg Georgakos, Karl Hofmann, and Christian
Pacha. Accelerating and Masking Properties of Transistor Degradation of Selected
Digital Circuit Topologies. Honey milestone report 3.1.2-q11, Infineon Technologies,
August 2010.
Manuel J. Bellido, Jorge Juan, and Manuel Valencia. Logic-Timing Simulation and the
Degradation Model. Imperial College Press, London, 2006.
D.R. Bild, G.E. Bok, and R. P. Dick. Minimization of NBTI performance degradation
using internal node control. In Design, Automation and Test in Europe (DATE), pages
148–153, 2009.
David Blaauw, Kaviraj Chopra, Ashish Srivastava, and Louis Scheffer. Statistical Timing
Analysis: From Basic Principles to State of the Art. IEEE Trans. on CAD of Integrated
Circuits and Systems, 4:589–607, 2008.
David T. Blaauw, Chanhee Oh, Vladimir Zolotov, and Aurobindo Dasgupta. Static
electromigration analysis for on-chip signal interconnects. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 22(1):39–48, January
2003.
125
Bibliography
Jifeng Chen, Shuo Wang, Nemat Bidokhti, and Mohammad Tehranipoor. A Framework
for Fast an Accurate Critical-Reliability Paths Identification. In IEEE North Atlantic
Test Workshop (NATW), May 2011.
Liang-Chi Chen, Sandeep K. Gupta, and Melvin A. Breuer. A new gate delay model
for simultaneous switching and its applications. In ACM/IEEE Design Automation
Conference (DAC), pages 289–294, New York, NY, USA, 2001. ACM.
Mihir Choudhury, Vikas Chandra, Kartik Mohanram, and Robert C. Aitken. Analytical
model for TDDB-based performance degradation in combinational logic. In Design,
Automation and Test in Europe (DATE), pages 423 – 428, 2010.
Philippe Coussy and Adam Morawiec. High-Level Synthesis from Algorithms to Digital
Circuits. Springer, 2008.
John Croix and Martin Wong. Blade and razor: cell and interconnect delay analysis
using current-based models. In ACM/IEEE Design Automation Conference (DAC),
pages 386–389, June 2003.
S. Das, C. Tokunaga, S. Pant, W. H. Ma, S. Kalaiselvan, K. Lai, D.M. Bull, and David T.
Blaauw. RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance.
IEEE Journal of Solid-State Circuits, 44(1):32–48, January 2009.
Robert Entner. Modeling and Simulation of Negative Bias Temperature Instability. PhD
thesis, Technische Universität Wien, 2007.
126
Bibliography
Thomas Fischer, E. Amirante, Karl Hofmann, M. Ostermayr, Peter Huber, and Doris
Schmitt-Landsiedel. A 65nm test structure for the analysis of NBTI induced statistical
variation in SRAM transistors. In European Solid-State Device Research Conference
(ESSDERC), pages 51–54, September 2008.
Stephan Henzler, Martin Wirnshofer, and Dominik Lorenz. Intrinsic time margin mon-
itoring for assessment of process variation and aging, 2009.
Vincent Huard, CR Parthasarathy, Alain Bravaix, Chloe Guerin, and Emmanuel Pion.
CMOS device design-in reliability approach in advanced nodes. In IEEE International
Reliability Physics Symposium (IRPS), pages 624–633, 2009.
127
Bibliography
Kunhyuk Kang, Sang Phill Park, Kaushik Roy, and Muhammad A. Alam. Estimation of
statistical variation in temporal NBTI degradation and its impact on lifetime circuit
performance. In IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), pages 730–734, Piscataway, NJ, USA, 2007. IEEE Press.
Christoph Knoth, Irina Eichwald, Petra Nordholz, and Ulf Schlichtmann. White-Box
Current Source Modeling Including Parameter Variation and Its Application in Timing
Simulation. In International Workshop on Power and Timing Modeling, Optimization
and Simulation (PATMOS), pages 200–210, September 2010.
Christoph Knoth, Carsten Uphoff, Sebastian Kiesel, and Ulf Schlichtmann. SWAT: Sim-
ulator for Waveform-Accurate Timing including Parameter Variations and Transistor
Aging. In International Workshop on Power and Timing Modeling, Optimization and
Simulation (PATMOS), September 2011. to appear.
Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. An Analytical Model for
Negative Bias Temperature Instability. In IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pages 493–496, 2006.
Yung-Huei Lee, Neal Mielke, Marty Agostinelli, Sukirti Gupta, Ryan Lu, and William
McMahon. Prediction of Logic Product Failure Due To Thin-Gate Oxide Breakdown.
In IEEE International Reliability Physics Symposium (IRPS), pages 18 – 28, 2006.
Bing Li, Ning Chen, Manuel Schmidt, Walter Schneider, and Ulf Schlichtmann. On
Hierarchical Statistical Static Timing Analysis. In Design, Automation and Test in
Europe (DATE), April 2009.
Wing Ning Li, Sudhakar M. Reddy, and Sartaj K. Sahni. On Path Selection in Combi-
national Logic Circuits. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 26(1):56–63, January 1989.
128
Bibliography
Zhihong Liu, Bruce W. McGaughy, and James Z. Ma. Design tools for reliability analysis.
In ACM/IEEE Design Automation Conference (DAC), pages 182–187, 2006.
Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Aging Analysis of Circuit
Timing Considering NBTI and HCI. In IEEE International On-Line Testing Sympo-
sium (IOLTS), pages 3–8, June 2009a.
Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Aging analysis at gate and
macro cell level. In IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), pages 77–84, November 2010b.
Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Timing-Modell für Makrozellen
zur Alterungsanalyse. In GMM/GI/ITG-Fachtagung Zuverlässigkeit und Entwurf,
pages 41–47, September 2010c.
Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Aging-aware Timing Analysis
of Combinatorial Circuits on Gate Level. it - Information Technology, 4, August 2010d.
Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Efficiently analyzing the impact of
aging effects on large integrated circuits. Microelectronics Reliability, (0):–, 2012. ISSN
0026-2714. doi: 10.1016/j.microrel.2011.12.029. URL http://www.sciencedirect.
com/science/article/pii/S0026271411005622.
Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, and Weiping Shi. Longest-path
selection for delay test under process variation. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, pages 1924 – 1929, December 2005.
Yinghai Lu, Li Shang, Hai Zhou, Hengliang Zhu, Fan Yang, and Xuan Zeng. Statistical
reliability analysis under process variation and aging effects. In ACM/IEEE Design
Automation Conference (DAC), pages 514–519, July 2009.
Hong Luo, Yu Wang, Ku He, Rong Luo, Huazhong Yang, and Yuan Xie. A Novel
Gate-Level NBTI Delay Degradation Model with Stacking Effect. In Nadine Azemard
and Lars Svensson, editors, Integrated Circuit and System Design. Power and Timing
Modeling, Optimization and Simulation, volume 4644 of Lecture Notes in Computer
Science, pages 160–170. Springer Berlin / Heidelberg, 2007a.
Hong Luo, Yu Wang, Ku He, Rong Luo, Huazhong Yang, and Yuan Xie. Modeling
of PMOS NBTI Effect Considering Temperature Variation. In IEEE International
129
Bibliography
130
Bibliography
Jessica Qian, Satyamurthy Pullela, and Lawrence T. Pillage. Modeling the “Effective ca-
pacitance" for the RC interconnect of CMOS gates. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 13(12), December 1994.
Stewart E. Rauch III. The statistics of NBTI-induced VT and beta mismatch shifts in
pMOSFETs. IEEE Transactions on Device and Materials Reliability, pages 89 – 93,
December 2002.
Hans Reisinger, O. Blank, Wolfgang Heinrigs, A. Muhlhoff, Wolfgang Gustin, and Chris-
tian Schlünder. Analysis of NBTI Degradation- and Recovery-Behavior Based on
Ultra Fast VT-Measurements. In IEEE International Reliability Physics Symposium
(IRPS), pages 448–453, March 2006.
Hans Reisinger, O. Blank, Wolfgang Heinrigs, Wolfgang Gustin, and Christian Schlün-
der. A Comparison of Very Fast to Very Slow Components in Degradation and Re-
covery Due to NBTI and Bulk Hole Trapping to Existing Physical Models. IEEE
Transactions on Device and Materials Reliability, 7(1):119–129, 2007.
T. Sakurai and A. R. Newton. Alpha-Power Law MOSFET Model and its Applications
to CMOS Inverter Delay and Other Formulas. IEEE Journal of Solid-State Circuits
SC, 25(2):584–594, April 1990.
Louis Scheffer, Luciano Lavangno, and Grant Martin, editors. EDA for IC implemen-
tation, circuit design, and process technology. Electronic Design Automation for Inte-
grated Circuits Handbook. CRC Press, Boca Raton, 2006.
Dieter K. Schroder and Jeff A. Babcock. Negative bias temperature instability: Road
to cross in deep submicron silicon semiconductor manufacturing. Journal of Applied
Physics, 94(1), 2003.
131
Bibliography
Ellen M. Sentovich, Kanwar Jit Singh, Luciano Lavagno, Cho Moon, Rajeev Murgai,
Alexander Saldanha, Hamid Savoj, Paul R. Stephan, Robert K. Brayton, and Al-
berto L. Sangiovanni-Vincentelli. SIS: A System for Sequential Circuit Synthesis.
Memorandum UCB/ERL M92/41, Electronics Research Laboratory, University of
California, Berkeley, CA 94720, May 1992.
M. Sharma and J. H. Patel. Finding a small set of longest testable paths that cover
every gate. In IEEE International Test Conference (ITC), pages 974 – 982, December
2002.
Alexander Stempkovsky, Alexey Glebov, and Sergey Gavrilov. Calculation of Stress
Probability for NBTI-Aware Timing Analysis. In IEEE International Symposium on
Quality Electronic Design (ISQED), pages 714–718, March 2009.
Alvin W. Strong, Ernest Y. Wu, Rolf-Peter Vollertsen, Jordi Sune, Giuseppe La Rosa,
Stewart E. Rauch III, and Timothy D. Sullivan. Reliability Wearout Mechanisms in
Advanced CMOS Technologies. Series on Microelectronic Systems. IEEE Press, 2009.
Dennis Sylvester, David Blaauw, and Eric Karl. Elastic: An adaptive self-healing archi-
tecture for unpredictable silicon. IEEE Design & Test of Computers, 23(6):484–490,
2006.
Synopsys. Composite Current Source. http://www.synopsys.com/products/
solutions/galaxy/ccs/cc_source.html, 2006.
Synopsys. HSPICE User Guide: Simulation and Analysis, September 2008.
E. Talpes and D. Marculescu. Toward a multiple clock/voltage island design style for
power-aware processors. Very Large Scale Integration (VLSI) Systems, IEEE Trans-
actions on, 13(5):591 –603, may 2005. ISSN 1063-8210. doi: 10.1109/TVLSI.2005.
844305.
James Tschanz, Keith A. Bowman, Steve Walstra, Marty Agostinelli, Tanay Karnik,
and Vivek De. Tunable replica circuits and adaptive voltage-frequency techniques for
dynamic voltage, temperature, and aging variation tolerance. In Symposium on VLSI
Circuits, pages 112–113, June 2009.
Robert H. Tu, Elyse Rosenbaum, Wilson Y. Chan, chester C. Li, Eric Minami, Khandker
Quader, Ping Keuung Ko, and Chenming Hu. Berkeley Reliability Tools - BERT.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
12:1524–1533, 1993.
John P. Uyemura. CMOS logic circuit design. Kluwer Academic Publisher, 2001.
Chandu Visweswariah, Kaushik Ravindran, Kerim Kalafala, Steven G. Walker, Samba-
sivan Narayan, Daniel K. Beece, Jeff Piaget, Natesan Venkateswaran, and Jeffrey G.
Hemmet. First-Order Incremental Block-Based Statistical Timing Analysis. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(10),
October 2006.
132
Bibliography
Wenping Wang, Zile Wei, Shengqi Yang, and Yu Cao. An efficient method to iden-
tify critical gates under circuit aging. In IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pages 735–740, Piscataway, NJ, USA, 2007a. IEEE
Press.
Wenping Wang, Shengqi Yang, Sarvesh Bhardwaj, Rakesh Vattikonda, Sarma Vrudhula,
Frank Liu, and Y. Cao. The impact of NBTI on the performance of combinational
and sequential circuits. In ACM/IEEE Design Automation Conference (DAC), pages
364–369, New York, NY, USA, 2007b. ACM.
Wenping Wang, Shengqi Yang, and Yu Cao. Node Criticality Computation for Cir-
cuit Timing Analysis and Optimization under NBTI Effect. In IEEE International
Symposium on Quality Electronic Design (ISQED), pages 763–768, March 2008.
Yu Wang, Xiaoming Chen, Wenping Wang, Varsha Balakrishnan, Yu Cao, Yuan Xie,
and Huazhong Yang. On the efficacy of input Vector Control to mitigate NBTI effects
and leakage power. In IEEE International Symposium on Quality Electronic Design
(ISQED), pages 19–26, March 2009a.
Yu Wang, Xiaoming Chen, Wenping Wang, Yu Cao, Yuan Xie, and Huazhong Yang.
Gate replacement techniques for simultaneous leakage and aging optimization. In
Design, Automation and Test in Europe (DATE), pages 328–333, April 2009b.
Kai-Chiang Wu and Diana Marculescu. Joint logic restructuring and pin reordering
against NBTI-induced performance degradation. In Design, Automation and Test in
Europe (DATE), pages 75–80, April 2009.
Lifeng Wu, Jingkun Fang, Hirokazu Yonezawa, Yoshiyuki Kawakami, Nobufusa Iwan-
ishi, Heting Yan, Ping Chen, Alvin I-Hsien Chen, Norio Koike, Yoshifumi Okamoto,
and Chune-Sin Ye. GLACIER: a hot carrier gate level circuit characterization and
simulation system for VLSI design. In IEEE International Symposium on Quality
Electronic Design (ISQED), pages 73–79, 2000.
Michael G. Xakellis and Farid N. Najm. Statistical Estimation of the Switching Activity
in Digital Circuits. In ACM/IEEE Design Automation Conference (DAC), pages 728
– 733, June 1994.
Gary Kok-Hoo Yeap. Practical Low Power Digital VLSI Design . Springer, 1998.
133
Bibliography
Vladimir Zolotov, Jinjun Xiong, Hanif Fatemi, and Chandu Visweswariah. Statistical
Path Selection for At-Speed Test. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, pages 749 – 759, May 2010.
134
List of Figures
3.1. 36 mV Vth drift due to NBTI at 1.2 V VDD (a). Sensitivity of the gate
delay degradation to a threshold voltage drift (b). Hence, NBTI causes
about 10 % degradation of the output delay for a rising input transition. . 36
3.2. Cross section of a PMOS transistor. . . . . . . . . . . . . . . . . . . . . . 37
3.3. Output characteristic of a PMOS transistor for altered values of ∆Vth . . . 39
3.4. Time dependence of Vth drift due to NBTI. . . . . . . . . . . . . . . . . . 40
3.5. Temperature dependence of ∆Vth for altered values of Vgs . . . . . . . . . . 40
3.6. Transistor width dependence. Marked is the minimal transistor width
used in the standard cell libraries. . . . . . . . . . . . . . . . . . . . . . . 41
3.7. Drift over time for an AC stress. . . . . . . . . . . . . . . . . . . . . . . . 42
3.8. Duty cycle dependence of NBTI. . . . . . . . . . . . . . . . . . . . . . . . 43
3.9. Drain avalanche hot carrier. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
135
List of Figures
136
List of Figures
5.1. TG annotated with arrival time and delay to sink at every node. . . . . . 87
5.2. Illustration of path delay reduction step. Edge (b, d) can be removed
because the delay of path P is less than the delay of path Pother . . . . . . 89
5.3. Illustration of arrival time reduction step. Edge (d, e) can be removed
because arrival time interval along edge (d, e) is less than the arrival time
at e after the max-operation. . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4. Example for the common edge reduction step. . . . . . . . . . . . . . . . . 91
5.5. Example that shows difference between proposed and exact method for
common edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6. Graphical representation of the common edge reduction step cases. Edge
(u, v) can be removed if aged delay of path U is smaller than fresh delay
of path V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.7. Path delay of an inverter chain (10 inverters) with respect to SP at the
input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.8. A general path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.9. Graphical representation of the constraints for the gate types. . . . . . . . 100
5.10. Basic idea for combining aging effects and process variations. . . . . . . . 104
5.11. The dotted circles indicate the aged performances. The circuit fails be-
cause the second adder needs the result before the first adder has finished
its calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.12. The timing graph of the ISCAS’85 circuit c17 is shown in (a). This is
a simplified TG because each net is just represented by one node. An
example of a reduced TG is shown in (b). . . . . . . . . . . . . . . . . . . 107
5.13. Distribution of delay degradation for 1000 workload samples. The dotted
line is the worst-case degradation when it is assumed that all PMOS
transistors degrade maximal. . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.14. Enhanced-scan design. The standard scan design is extended by hold
latches. Thereby, the first delay test vector V1 is latched by the hold
latches while the second delay test vector V2 is read into the scan chain. . 110
5.15. Path-based reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
137
List of Tables
2.1. Execution trace of the k most critical paths algorithm for the five slowest
paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2. Comparison of state-of-the-art gate models with the proposed aging-aware
gate model AgeGate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1. An example for a temperature profile. The lifetime is 10y and Vef f is Vnom . 76
4.2. Degradation of critical path delays for different analyzer settings. . . . . . 83
139
List of Algorithms
-. Function reset_node(node) . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
-. Function reset_edge(u,v) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1. Circuit delay computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
-. Function update_node(node) . . . . . . . . . . . . . . . . . . . . . . . . . . 21
-. Function update_edge(u,v) . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2. k most critical paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
-. Function update_edge_aged(u,v) . . . . . . . . . . . . . . . . . . . . . . . 66
141
Acronyms
ASTA aging-aware static timing analysis
FF flip-flop
IC integrated circuit
143
Acronyms
RD reaction diffusion
RTL register transfer level
TA timing analysis
TDDB time-dependent dielectric breakdown
TG timing graph
144
List of Symbols
activation energy Ea
aged gate delay daged
aged path delay Daged
aged timing quantity qaged
arrival time AT
branch slack BS
145
List of Symbols
gate current Ig
gate delay d
gate source voltage Vgs
join slack JS
output load CL
output slope sOU T
oxide thickness tox
parameter drift ∆p
path P
path delay D
probability that transistor is “on” Pon
146
List of Symbols
source node S
stress probability Pstress
stress probability HCI Pstress,HCI
stress probability NBTI Pstress,N BT I
stress time tstress
substrate current Isub
supply voltage VDD
switching power Pswitching
temperature T
threshold voltage Vth
threshold voltage drift ∆Vth
timing quantity q
transistor length L
transistor width W
transition density TD
147
Index
149
Index
required time, 21
technology mapping, 11
temporal dependence, 67
time dependent dielectric breakdown, 35
time-complexity, 18
timing analysis, 12, 15
timing arc, 15
timing graph, 17
timing quantity, 19
timing sign-off, 15
transition density, 66
transition time, 16
use profile, 59
variability, 9
workload, 64, 66
150