Download as pdf or txt
Download as pdf or txt
You are on page 1of 150

TECHNISCHE UNIVERSITÄT MÜNCHEN

Lehrstuhl für Entwurfsautomatisierung

Aging Analysis of Digital Integrated Circuits

Dominik Lorenz

Vollständiger Abdruck der von der Fakultät für Elektrotechnik und


Informationstechnik der Technischen Universität München zur Erlangung des
akademischen Grades eines

Doktor-Ingenieurs

genehmigten Dissertation.

Vorsitzender: Univ.-Prof. Dr. sc.techn. Andreas Herkersdorf


Prüfer der Dissertation:
1. Univ.-Prof. Dr.-Ing. Ulf Schlichtmann
2. Prof. Diana Marculescu, Ph.D.,
Carnegie Mellon University, PA, USA

Die Dissertation wurde am 31.01.2012 bei der Technischen Universität München


eingereicht und durch die Fakultät für Elektrotechnik und Informationstechnik am
24.04.2012 angenommen.
Acknowledgments
This thesis results from my work as an research assistant at the Institute for Electronic
Design Automation at the Technische Universität München.
First of all, I would like to thank Professor Ulf Schlichtmann for giving me the oppor-
tunity to do research at his institute and for encouraging me to work on this novel topic.
His guidance and continued support, as well as the open and creative atmosphere at the
institute have been essential for the successful completion of this research project.
I also would like to thank the second examiner Professor Diana Marculescu for her
interest in my research.
Most of the work would not have been possible without the valuable cooperation of
the Infineon Technologies employees working together with me on the HONEY research
project. A special thanks goes to Georg Georgakos for his guidance and the fruitful
discussions with him.
It is a pleasure for me to thank my colleagues at the EDA institute for their collab-
oration and their friendship. It was a great time at the institute, which I will never
forget.
Finally, I want to express my heartfelt gratitude towards my wife, Nicole, and my little
sunshine, Annika, for their continuous support or just for smiling when I come home.

3
Contents

1. Introduction 9
1.1. Objective of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2. Semi-custom design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3. Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2. Fundamentals 15
2.1. (Static) timing analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1. Gate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2. Timing graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3. Incremental timing analysis . . . . . . . . . . . . . . . . . . . . . . 18
2.1.4. Sequential circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.5. Path enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2. State of the art of aging analysis . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.1. Circuit level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.2. Gate level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3. Aging effects and their impact on standard cells 35


3.1. Aging effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1. Negative Bias Temperature Instability . . . . . . . . . . . . . . . . 37
3.1.2. Hot Carrier Injection . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.1.3. Stress conditions in CMOS logic gates . . . . . . . . . . . . . . . . 46
3.2. Impact on gate performance . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.1. Impact on combinational gates . . . . . . . . . . . . . . . . . . . . 49
3.2.2. Impact on flip-flops . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.3. Impact on power dissipation . . . . . . . . . . . . . . . . . . . . . . 56
3.3. Technology trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4. Aging-aware static timing analysis 63


4.1. Aging-aware STA flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2. Workload determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3. AgeGate: Aging-aware gate model . . . . . . . . . . . . . . . . . . . . . . 69
4.3.1. Canonical gate model . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.2. Degradation equations . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.3. Calculation of Stress Probabilities . . . . . . . . . . . . . . . . . . 71
4.4. Characterizing the standard cells . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.1. Obtaining the sensitivities . . . . . . . . . . . . . . . . . . . . . . . 78

5
Contents

4.4.2. Obtaining the internal gate structure . . . . . . . . . . . . . . . . . 78


4.4.3. Simplification of the gate model . . . . . . . . . . . . . . . . . . . 79
4.5. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5.1. Waveform dependence of parameter drift . . . . . . . . . . . . . . 80
4.5.2. Comparison of AgeGate, circuit-level simulation and measurements 80
4.5.3. Aging analysis results . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5. Identifying possible critical paths in aged circuits 85


5.1. Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2. Identification of PCPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.1. Slack reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.2.2. Path delay reduction step . . . . . . . . . . . . . . . . . . . . . . . 88
5.2.3. Arrival time reduction step . . . . . . . . . . . . . . . . . . . . . . 88
5.2.4. Delay to sink reduction step . . . . . . . . . . . . . . . . . . . . . . 90
5.2.5. Common edge reduction step . . . . . . . . . . . . . . . . . . . . . 91
5.2.6. Removing edges and nodes . . . . . . . . . . . . . . . . . . . . . . 94
5.3. Realistic aged path delays . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.3.1. Gate delay interval . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.3.2. Realistic aged path delays for an inverter chain . . . . . . . . . . . 96
5.3.3. Maximal aged path delay of a general path . . . . . . . . . . . . . 97
5.3.4. Minimal aged path delay for a general path . . . . . . . . . . . . . 98
5.3.5. Minimal aged circuit delay . . . . . . . . . . . . . . . . . . . . . . 101
5.3.6. Use of minimal aged circuit delay in reduction steps . . . . . . . . 102
5.3.7. Wrap-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4. Considering process variations . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4.1. Block-based statistical static timing analysis . . . . . . . . . . . . 103
5.4.2. Representation of timing quantities . . . . . . . . . . . . . . . . . . 105
5.5. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5.1. Aging-aware timing model for modules . . . . . . . . . . . . . . . . 106
5.5.2. Monitoring of aging circuits . . . . . . . . . . . . . . . . . . . . . . 108
5.6. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.6.1. Minimal aged delay . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.6.2. Node and edge reduction . . . . . . . . . . . . . . . . . . . . . . . 114
5.6.3. Possible critical paths . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6. Conclusion 119

A. Constraints for NAND and NOR gates 121

B. More detailed results for PCP identification 123

Bibliography 125

6
Contents

Acronyms 143

List of Symbols 145

7
1. Introduction
In biology, aging of an organisms is defined as a progressive, irreversible process that
inevitably ends with death. The maximal lifetime of an individual is significantly affected
by aging [Wikipedia, 2011].
The same is true for integrated circuits (ICs). Aging effects cause the circuit per-
formance to degrade and they have a significant impact on the specified lifetime of a
circuit.
Circuit aging can be regarded as a time-dependent variation. Aging is not the only
variability the IC industry must cope with. In fact, variability has always been a fact
of life in the IC industry. The reasons for variability can be classified into these three
categories:

Variations of the operating conditions: Primarily changes in supply voltage and oper-
ating temperature.

Process variations: These denote deviations in process parameters from their nominal
values that are present in an IC after it has been manufactured. Examples are
variations in the concentration of dopants or the oxide thickness. In contrast to
aging, manufacturing variations do not change over time once the IC has been
manufactured.

Time-dependent variations: These denote changes in the physical (and consequently,


in the electrical) properties of an IC over time caused by aging effects.

Variations of the operating conditions are handled during the design process by speci-
fying a range (e.g. VDD,min and VDD,max ) within which the IC has to meet the specified
properties (e.g. frequency or power consumption). Process variations have traditionally
been considered by specifying so-called process corners which describe e.g. for delay the
best or worst realistic combinations of process parameters, thus establishing generous
guardbands against parameter variations. This modeling is increasingly considered to
be problematic and statistical design methodologies have therefore been proposed as a
remedy for dealing with manufacturing variations. A detailed overview of this field is
given in Blaauw et al. [2008].
Time-dependent variation caused by aging effects, on the other hand has by far not re-
ceived a similar amount of attention. Aging effects lead to a change of device parameters
over time dependent on the operating conditions over lifetime and the workload. The
workload defines the portion of the lifetime a device spends in a particular operating
point. Negative bias temperature instability (NBTI), for instance, is regarded as the
most severe aging effect nowadays. NBTI results in an increased threshold voltage (Vth )

9
1. Introduction

of PMOS transistors whenever the transistor is in inversion. The threshold voltage drift
(∆Vth ) is accelerated by elevated temperature or supply voltage.
The impact of variations on the circuit performance increases due to the continued
technology scaling [Nassif, 2000]. The same absolute variation of the gate length, for
instance, increases the relative variation since the nominal gate length is scaled by a
factor of 0.7 × every two years according to Moore’s law [Moore, 1965]. The supply
voltage is scaled as well. Therefore, a supply voltage variation or a threshold voltage
variation have a larger impact on circuit performance. This is the case if a constant
absolute variation for the different variability mechanisms is assumed. However, the
variation caused by aging effects is going to increase, since these effects strongly depend
on the strength of the electrical fields. The electrical fields continue to increase with
scaling, because the transistor sizes are scaled more aggressively than the supply voltage
since several technology generations1 .
Variability is the reason why performance and power consumption vary from chip
to chip and over time. To be able to still manufacture working and reliable products
despite increasing variability, the performance guardbands must be increased or other
techniques must be applied to make a product robust against variations. Examples of
such techniques are dynamic voltage frequency scaling (DVFS) [Semeraro et al., 2002;
Talpes and Marculescu, 2005; Herbert and Marculescu, 2009] or the use of redundant
circuitry [Lyons and Vanderkulk, 1962]. Therefore, the operating frequency is not as
high as it may be, chip area is wasted and the power consumption is higher than nec-
essary. Hence, conservative safety margins and variation-aware design techniques make
the design of competitive products more difficult and lead to a minimization or even
elimination of the advantages of moving to the next technology node. One way out of
this dilemma according to Austin et al. [2008] are innovative design techniques to reduce
the reliability costs again.

1.1. Objective of this thesis


The contribution of this thesis to reduce the reliability costs are methods to accurately
analyze the timing degradation of a circuit caused by drift-related aging effects. This
allows to tighten the safety margins again.
Within this thesis the following objectives have been set and achieved:
• Investigate the impact of aging effects on transistors, how can they be modeled
and on which parameters do they depend. Furthermore, quantify the degradation
of the properties of standard cells caused by aging effects.
• Develop and implement an aging analysis to determine the timing degradation
of ICs on gate-level. The developed aging-aware gate model should consider the
dominant aging effects.
• Develop an aging analysis on higher abstraction levels. This enables considering
aging in earlier design stages and for complex systems.
1
Under the assumption that no breakthroughs are achieved to mitigate aging on technology level.

10
1.2. Semi-custom design flow

• Furthermore, another approach is developed to reduce the safety margins even


further by enabling a better-than-worst-case design style. To assure that such an
aggressively designed circuit still works correctly during the specified lifetime, the
degradation of the circuit caused by aging is periodically monitored and counter-
measures are taken if the circuit ages too much.

In the course of this thesis seven pre-publications [Lorenz et al., 2009a,b, 2010a,b,c,d,
2012] have been contributed to the scientific community. Furthermore, a patent for a
time margin monitor for the assessment of aging and process variation was filed and
granted [Henzler et al., 2009].

1.2. Semi-custom design flow


In the beginning of IC design in the early 1970s2 , circuit design was entirely manual work,
even the layout was drawn by hand. However, without the development of sophisticated
electronic design automation (EDA) tools, the design of state-of-the-art ICs would not
be possible.
Figure 1.1 depicts a simplified design flow from a hardware description language (HDL)
to a layout also referred to as register transfer level (RTL) to GDSII flow. The purpose
of this figure is to illustrate where timing analysis (TA) is required and an aging-aware
TA would reduce the uncertainty of the delay prediction. Design flows are getting more
and more complex and according to Scheffer et al. [2006, Chapter 1] this trend continues,
amongst other things, due to variability and reliability challenges:

“The RTL to GDSII flow has undergone significant changes in the last 25
years. The continued scaling of CMOS technologies significantly changed the
objectives of the various design steps. The lack of good predictors for delay
has led to significant changes in recent design flows. Challenges like leakage
power, variability, and reliability will continue to require significant changes
to the design-closure process in the future”.

Everything starts with a product specification which includes constraints for perfor-
mance, area, and power. Further constraints, especially in advanced technologies, are
reliability and yield. The next step is to write a synthesizable description in a HDL
(VHDL or Verilog). This representation at RTL is then transferred into a logic rep-
resentation by logic synthesis [Sentovich et al., 1992]. A netlist of generic cells (e.g.,
NAND and NOT cells), which represent the logic function, is obtained and mapped to
cells from a standard cell library. Next, the cells of the netlist are placed and the nets
are routed. Before the chip can be processed, tested and packaged, the sign-off is per-
formed by thoroughly verifying that the timing and other electrical performances meet
the specification.
2
The first microprocessor, Intel’s 4004, was fabricated 1971.

11
1. Introduction

It is very expensive and time consum-


specification
ing to process a chip. Therefore, it is
not feasible to iteratively design a chip by
processing it, testing it and making de-
sign changes. In fact, the IC industry is
HDL
quite unique by heavily relying on abstract
models for designing a product. There are, log.
synthesis
for instance, transistor models to simulate
the voltage and current waveforms on cir-
log. functions
cuit level; or gate models, which provide,
amongst other things, the delay of the tech.
mapping
standard cells. The goal of models is to
provide all the information that is neces-
netlist
sary on a particular abstraction level and
omit unimportant information. Only by place &
abstraction it is possible to design state- route
of-the-art circuits with up to billions of layout
transistors3 . The models must be as accu-
rate as possible to provide a good predic- sign-off
tion for the performance, power and area
of a design. Otherwise the final product tape-out
might not meet the specification.
TA is a crucial step during the design
of a digital circuit. Due to complexity Figure 1.1.: IC design flow.
reasons TA is done on gate level or even
higher abstraction levels. Basically, the gate and wire delays along the longest, the so
called critical path, are added up and it is verified whether the resulting circuit delay
fulfills the timing specification, or not. When a circuit ages, the gate delays increase
and the circuit may violate the timing specification although the specifications were met
right after manufacturing (see Figure 1.2).
In Chapter 3 it is shown that aging significantly degrades the gate delay. To consider
this, a TA with an aging-aware gate model is required. Such an aging-aware TA is
developed in this thesis.
TA is required in many design flow steps, not just for the final timing sign-off. This
enables the consideration of timing at every synthesis step and the synthesis tool can
optimize the design until the timing constraints are met. With each synthesis step, the
available information is getting more accurate which in turn increases the accuracy of
the TA. Only the multi-level logic functions are known at the logic synthesis stage and
the circuit delay can only be roughly estimated by the logic depths of those functions.
During technology mapping it is first known which gates from the standard cell library
are instantiated. From this step on aging can be considered by an aging-aware gate
model. The exact net length is available during the place and route synthesis stage, which
3
Intel’s Six-Core Core i7 CPU from 2010 has 1.17 · 109 transistors.

12
1.3. Structure of the thesis

Figure 1.2.: Aging-aware timing analysis of a circuit. Aging effects degrade transistor
parameter, which results in increased gate delays over time. The critical
path delay increases as well and the timing specification might be violated
during the specified lifetime.

increases the accuracy of the TA by knowing the parasitic capacitance and resistance of
the nets. Finally, the coupling capacitances are available for timing sign-off, which again
increases the accuracy of the TA. Hence, an aging-aware TA is beneficial at all synthesis
steps from technology mapping on.

1.3. Structure of the thesis


Chapter 2 discusses the fundamentals of TA and the state of the art of aging-aware timing
analysis. Chapter 3 introduces the two dominant drift-related aging effects, NBTI and
hot carrier injection (HCI). Their physical mechanisms are explained and it is shown
how the device parameter degradation can be modeled. Furthermore their impact on
the gate performance is investigated. An aging-aware timing analysis flow is described in
Chapter 4. Its basis is an aging-aware gate model called AgeGate. Accuracy benefits of
the proposed approach are demonstrated on benchmark circuits. The degradation of a
circuit strongly depends on the operating conditions and the workload. Chapter 5 shows
methods to identify the paths of a circuit that might become critical without knowing
the exact operating conditions and workload. Two applications are presented which use
this information: An aging-aware timing model for modules and a methodology to design
better-than-worst-case circuits by monitoring all possible critical paths and interfering
if one of them degrades too much. Finally, the thesis is summarized in Chapter 6.

13
2. Fundamentals

2.1. (Static) timing analysis


Timing analysis is required for many different steps during the design process of a digital
circuit. The most obvious task for a TA is to determine the maximum clock frequency
a circuit can operate at. Therefore, a TA as accurate as possible is needed for timing
sign-off at the end of the digital design flow. A TA is also needed for circuit optimiza-
tion. During synthesis and layout (placement as well as routing), timing analysis is
performed in the inner optimization loop. This requires a timing analyzer that responds
to several thousand timing queries as fast as possible (see incremental timing analysis in
Section 2.1.3). When local optimizations are performed on a design (e.g., buffer insertion
[Alpert et al., 1999]), the TA checks that no timing constraint is violated due to a local
modification.
The timing analysis of complex digital circuits with up to millions of gates is performed
on gate level (or even higher abstraction levels), since a SPICE simulation on circuit level
of such large circuits is too time consuming. The required input vectors for a SPICE
or logic simulation are another problem. It is not practical to simulate a circuit for all
possible input vectors. Nevertheless, a SPICE simulation is more accurate and can be
used to verify the paths with the longest delays that are determined by a static timing
analysis (STA).
A STA has two main advantages compared to a timing simulation on circuit level. It
is significantly faster, since a simplified gate model (see Section 2.1.1) and a simplified
interconnect model are used. Furthermore, no input vectors are needed, because the
logic function of a gate is not considered for the signal propagation. Instead, the prop-
agation of signal arrival times just depends on the circuit topology. Bellido et al. [2006,
Chapter 2] compare state-of-the-art gate models to a SPICE simulation. The average
speed-up is three orders of magnitude and the mean error is 6.75 %.
A STA tool can operate in an early and a late mode. In late mode, the latest arrival
times of a signal are determined. In early mode, on the other hand, the earliest time a
signal transition can take place at a node is obtained. The circuit delay calculation and
the verification of the setup time constraints (see Section 2.1.4) are performed in late
mode. The hold time constraints are checked in early mode.

2.1.1. Gate models


For STA a gate model is needed to compute the gate delays. The gate model provides
a delay for a falling and a rising input transition for each of its timing arcs. A timing
arc is defined from a gate input to a gate output (see Figure 2.1(a)). Typically, it is

15
2. Fundamentals

A
B "1" Z
CL
(a) NAND gate with (b) Corresponding wave-
a transition at input forms
A. Timing arcs are de-
picted as lines.

assumed that the output transition is caused by the switching of just one input signal
(single input switching assumption). A simultaneous transition at two or more inputs
can significantly increase the gate delay. Hence, gate models that take simultaneous
input switching into account are more accurate [Chen et al., 2001].
To obtain the gate delays, the gates of a standard cell library are pre-characterized
by SPICE simulations. Those simulations are used to create a gate model. During the
STA, just the gate model is evaluated. This is the reason, a STA is much faster than
performing a SPICE simulation for the entire circuit.
There are several techniques to model the gate delay. One of the first was to use the
following equation [Sapatnekar, 2004, chap. 4]:

d = k1 · CL + k2 (2.1)

The gate delay is split into two parts. The dependence of the gate delay on the output
load (CL ) is given by k1 and the intrinsic gate delay is given by k2 . CL is given by
the input capacitance of succeeding gates and the interconnect capacitance. This quite
simple model neglects the impact of the input slope (sIN ) on the gate delay.
To consider the impact of the slope, signals are modeled as ramps for STA (see Fig-
ure 2.1(b)). A signal is defined by two values: the arrival time (AT) and the correspond-
ing slope. The slope (s) is given by the transition time. This is the time a signal takes to
change from logic “0” to logic “1”. Hence, bounds for the logic values have to be defined
(e.g., 50 % of VDD for signal crossing and 20 % and 80 % of VDD for transition time).
A commonly used gate model is based on a look-up table (LUT). The industry quasi-
standard, the liberty file format from Synopsys, is such a LUT-based gate model. It
stores the gate delays in 2-dimensional LUTs dependent on input slope and output load
(see Figure 2.1):
d = f (sIN , CL ) (2.2)
Values in between the stored values of the LUTs are obtained by interpolation. The
input slope is now required in addition to the output load in order to compute the gate
delay. For this reason, the output slope (sOU T ) is stored dependent on sIN and CL in
LUTs as well. Now, the input slope of a gate can be calculated based on the output slope
of its predecessor gate. An advantage of LUT-based gate models is that their accuracy
can easily be increased by characterizing the gate at additional supporting points.

16
2.1. (Static) timing analysis

Figure 2.1.: LUT-based gate model

Due to the ongoing miniaturization, the input capacitance of the gates decreases and
the resistance of the interconnect network increases. This leads to an increased inaccu-
racy when purely capacitive loads are assumed. Due to this, an effective capacitance
was introduced by Qian et al. [1994]. The effective capacitance represents the complex
interconnect network by a single value. This enabled the continued usage of the existing
models.
However, the signal waveform in advanced technologies differs significantly from a
simple ramp (signals have a long “tail” now), which leads to inaccuracies as well. This
is the reason why current source models (CSMs) are developed. The goal of CSMs is
to model the signal waveform more accurately by modeling gates as voltage controlled
current sources which charge the complex interconnect network and the fan-out gates.
Several approaches have been published. The composite current source model (CCSM)
[Synopsys, 2006] stores time-current waveforms in the LUTs. The effective current source
model (ECSM) [Cadence, 2007] differs only slightly from the CCSM by storing time-
voltage waveforms, which are again converted to current waveforms and applied to the
interconnect network. CCSM and ECSM have the advantage that they are compatible
to the existing timing analysis tools and were adopted quite fast by the industry.
Another CSM approach by Croix and Wong [2003] is to store the static output current
depending on gate input voltage and gate output voltage in LUTs. By solving differential
equations the voltage waveform at the succeeding gate input can be computed.
The aging-aware gate model introduced in Chapter 4 is LUT-based. However, Knoth
et al. [2011] show that the approach can be combined with a CSM [Knoth et al., 2010]
to an aging-aware CSM.

2.1.2. Timing graph


A timing graph (TG) is used in STA tools to represent a combinational circuit. A TG
is a directed acyclic graph (DAG): T G = (N, E). The nodes N of a timing graph are
the gate in- and outputs. These are connected by two types of edges E. The weights of
edges connecting gate inputs with gate outputs are the gate delays for the corresponding
timing arc. Edges between gate outputs and inputs of succeeding gates represent the
delays caused by the interconnect network.
The focus of this thesis is on aging effects causing a drift of transistor parameters.
Hence, the passive interconnect network is not affected and not considered in the course
of this thesis. This enables us to simplify the timing graph. The nets of the gate level
netlist can be taken as nodes N and the weighted edges E correspond to gate delays.

17
2. Fundamentals

1
7
2 10
6
S 3 T
8
11
4
9
5

(a) Gate level netlist for ISCAS’85 cir- (b) Simplified timing graph for c17
cuit c17 (for every net just one node is added
and not two, as it is described in the
text)

Figure 2.2.: Circuit and corresponding timing graph

The gate model provides a delay for a rising and a falling input transition. Hence,
every TG edge has two edge weights. To be able to use unmodified standard graph
algorithms, this should be avoided. A very clean and elegant way is described by Ju and
Saleh [1991]: For every net two nodes are added to the timing graph, one for a rising
transition, and another one for a falling transition. If two nets, u and v, are connected
by an inverting gate, the node u for a rising (falling) transition is connected to the node
v for a falling (rising) transition. If it is a non-inverting gate, the node u for a rising
(falling) transition is connected to v for a rising (falling) transition. That way every
edge in the timing graph has just one edge weight.
Two additional nodes are added to the TG. A source node node (S) connected to all
primary input (PI) nodes; and all primary output (PO) nodes are connected to a sink
node (T ) (see Figure 2.2). To model unequal arrival times at the primary inputs, delays
can be assigned to the edges from S to the PIs.

2.1.3. Incremental timing analysis

When the TG is annotated with gate delays as edge weights, the circuit delay can be
determined. The circuit delay is defined by the path (P ) with the longest path delay
(D(P )). This path is called critical path (Pcrit ), its path delay is the critical path delay
(D(Pcrit ) or just Dcrit ).
The circuit delay can be determined by path-based or block-based methods. The
path-based method enumerates all paths in the TG and computes their path delays by
adding up the gate delays along the path. The critical path with the longest path delay
determines the circuit delay. The path-based method has an exponential worst-case
time-complexity because the number of paths in a circuit increases (in the worst case)
exponentially with the number of nodes.
The block-based method propagates the arrival times (ATs) through the circuit, start-
ing at S until T is reached. For a given node n, AT(n) is the maximal point in time

18
2.1. (Static) timing analysis

Figure 2.3.: Computation of the arrival time (AT).

that the signal at n can change1 . The arrival time of a node n can be calculated when
the arrival times of all predecessor nodes i and the gate delays d of all incoming edges
are known (see Figure 2.3):
 
AT(n) = max AT(i) + d((i, n)) (2.3)
i∈predecessors(n)

AT(T ) corresponds to the circuit delay. In contrast to the path-based method, each
node is just visited once, hence, the time complexity is O(|N |).
Hence, the difference between the block-based and the path-based method is that the
former calculates maximal arrival times for each node whereas the latter computes all
path delays first and then calculates the maximum out of them.
Both methods add up the gate delays without considering the logic function of the
gate. Hence, the critical path may not be sensitizable. A path is not sensitizable if there
doesn’t exist an input assignment that enables a signal to propagate along the path (see
Section 5.3.3). A path that is not sensitizable is called false path. If the critical path
is a false path, then the circuit delay is overestimated. The path-based method can
easily recognize a false path by checking every path whether it is sensitizable. For the
block-based method this is more difficult, since one cannot easily determine the path
with the next longest path delay if the critical path is a false path. An efficient method
to enumerate the paths with respect to the path delay is discussed in Section 2.1.5.
When the static timing analyzer is used in the inner optimization loop, the design is
often modified only slightly before the timing must be reevaluated. It would be very
inefficient to analyze the complete design again in this case. The incremental timing
analysis instead just analyzes the part of the timing graph that is affected by the change.
The foundation of an incremental timing analysis is that every timing quantity (e.g.,
arrival time or gate delay) has a valid flag (e.g., ATvalid or dvalid ). It is crucial that
whenever the circuit and therefore the timing graph changes the valid flags of timing
quantities that are affected are reset. This is done by two recursive functions reset_node
and reset_edge. In reset_edge the controlling node of the arrival time (ATctrl ) is
needed. The controlling node is the predecessor node that defines the arrival time (i.e.,
the node i in Equation 2.3 that is responsible for the maximal arrival time at n)
1
or minimal time a signal changes if hold time constraints should be checked

19
2. Fundamentals

Function reset_node(node)
/* Function to set the arrival time of a node to invalid */
ATvalid (node) ← F alse;
foreach successor suc of node do
/* Delay of outgoing edges are invalid because edge input slope is
invalid */
reset_edge(node, suc);
end

Function reset_edge(u,v)
/* Recursive function to set the delay of an edge (u, v) to invalid */
dvalid ((u, v)) ← F alse;
if ATctrl (v) == u then
/* Arrival time at node v is invalid because it was controlled by
edge (u, v) */
reset_node(v);
end

Whenever a timing quantity is read, first, it has to be checked whether it is still


valid. If not, then it must be recalculated. This is done by two recursive functions,
update_node and update_edge. Let’s assume the circuit delay should be reevaluated
after a design change. First, it is checked if the arrival time at T is still valid. If this is
the case, then the change did not affect the circuit delay. Otherwise, one has to proceed
backwards into the timing graph starting at T until one reaches valid arrival times and
gate delays and recalculate AT(T ) based on those values.
The algorithm to calculate the circuit delay for an incremental timing analyzer is given
in Algorithm 1. As an initialization step the arrival time of the source node, which is
equal to 0, must be set to valid. Then, the arrival time at the sink node is queried. The
propagation of the arrival time from source node to sink node is done behind the scenes
by update_node and update_edge.

Algorithm 1: Circuit delay computation


/* Set arrival time at source node to valid */
ATvalid (S) ← T rue;
/* Update arrival time at the sink node */
update_node(T );

Figure 2.4 shows an example for the incremental timing analysis. Due to a de-
sign change the arrival time at node 6 is invalid, resulting in the other nodes marked
red (or dark gray) also being invalid. Now the circuit delay is reevaluated by calling
update_node(T ). This results in recursively calling update_node for all invalid nodes

20
2.1. (Static) timing analysis

Function update_node(node)
/* Recursive function to update the arrival time of a node */
if ATvalid (node) == T rue then
return AT(node)
else  
AT(node) ← maxi∈predecessors(n) update_node(i) + update_edge((i,node))
end

Function update_edge(u,v)
/* Recursive function to update the gate delay of an edge (u, v) */
if dvalid ((u, v)) == F alse then
/* Update gate delay based on input slope and output load */
slope = get_slope_from_node(u);
load = get_load_from_node(v);
d((u, v)) = get_delay_from_LUT(slope, load);
dvalid ((u, v)) = T rue
end
return d((u, v))

down to node 6.
The methods to identify possible critical paths in an aged circuit, discussed in Chap-
ter 5, continuously modify the TG by removing nodes and edges. Hence, without an
incremental TA, the STA would have to be performed whenever the TG is modified.
There are several other timing quantities of interest. AT gives the maximal time a
signal takes from the source node to a given node. Delay to sink (D2S), on the other
hand, defines the maximal time a signal takes from a given node until it reaches the sink
node. D2S is calculated as follows:
 
D2S(n) = max D2S(i) + d((n, i)) (2.4)
i∈successors(n)

To calculate D2S for all nodes, one starts at T and computes D2S for the predecessor
nodes until S is reached.
The required time (REQT(n)) is the time a signal must be at a node n such that it
arrives at T in time. Therefore, REQT at T must be specified first. REQT at a node n
is the difference between REQT(T ) and the D2S at n:

REQT(n) = REQT(T ) − D2S(n) (2.5)

The difference between required time and arrival time is called slack (SLACK):

SLACK(n) = REQT(n) − AT(n) (2.6)

21
2. Fundamentals

1
7
2 10
6
S 3 T
8
11
4
9
5

Figure 2.4.: Example of the incremental timing algorithm. Arrival time at red (dark
grey) nodes is not valid. To update arrival time at node T, all invalid
arrival times are recursively updated (dashed arrows).

A negative slack implies that the signal arrives at a node after it has to in order to fulfill
the required time at the sink node. The slack of a node is an important information for
circuit optimization.

2.1.4. Sequential circuits


In contrast to a combinational circuit, a sequential circuit has storage elements in addi-
tion to logic gates. Hence, the output of a sequential circuit does not only depend on
the input signals but on the internal state as well. For synchronous sequential circuits,
the output signals of the combinational logic, which are fed back into the combinational
logic, are synchronized by a clock signal (see Figure 2.5).
Due to its simplicity, regarding design and verification, the common storage element
used in synchronous sequential circuits is the flip-flop (FF). FFs capture the data signal
at the active clock edge (in Figure 2.5 the rising transition is the active clock edge).
Synchronous sequential circuits can be used to realize finite state machines. They can
also be used to split complex combinational circuits into several parts. That way the
performance of the circuit can be increased, since just the circuit parts must fulfill the
timing constraints. This is called pipelining and is used, for instance, in microprocessors.
To store a date correctly into a flip-flop, the following two timing constraints have to
be fulfilled (see waveform in Figure 2.5):

• setup time (tSU P ) is the time interval the data signal has to be stable before the
active clock edge to sample the date correctly. This can be verified during STA by
the following inequality:

dCLK−to−Q + Dmax + tSU P < tCLK (2.7)

The clock-to-Q delay (dCLK−to−Q ) is the delay from an active clock edge until the
output of the sending FF changes. Dmax is the maximal delay of the combinational
circuit to the receiving FF input.

22
2.1. (Static) timing analysis

PI PO
combinatorial
logic

D Q

Clk
TSUP THLD
Clk

Figure 2.5.: Diagram of a sequential logic circuit. The timing constraints (setup and
hold time) of a flip-flop are given as well.

• hold time (tHLD ) is the time interval that the data signal has to remain stable
after the active clock edge to sample the date correctly. This can be checked by
the following inequality:
dCLK−to−Q + Dmin > tHLD (2.8)
Dmin is the minimal circuit delay to the receiving FF input. Dmin is obtained by
the STA tool in the early mode.
The STA algorithm must be modified slightly to analyze sequential circuits. The flip-
flops are removed from the netlist. Every signal connected to a FF input becomes a PO
and every signal connected to a FF output becomes a PI. The remaining circuit is now
purely combinational and the TG can be set up. The timing constraints for the flip-flops
are considered by weights of edges to the sink node and from the source node. Edge
weights from S to former FF outputs are set to dCLK−to−Q .
To check the setup time constraints, the edge weights from former FF inputs to T are
set to tSU P . If the maximal arrival time at the sink node is less than tCLK , then all
setup time constraints are met.
To check the hold time constraints, the edge weights from former FF inputs to T are
set to tHLD . Now, if the minimal arrival time at the sink node is greater than tCLK ,
then all hold time constraints are met. The minimal arrival time at a node is calculated
by simply exchanging the max-operation in Equation 2.3 with the min-operation.

2.1.5. Path enumeration


When a block-based STA is performed, the circuit delay is given by the arrival time at
the sink node. The corresponding critical path can be obtained efficiently, because the

23
2. Fundamentals

Figure 2.6.: An example for calculating the branch slacks.

controlling nodes are stored for the delays to sink. The controlling node of a node n
is the successor node which is responsible for the maximal D2S at n. By following the
path from a node to its controlling node starting at S, the critical path is determined.
However, often not only the critical path itself is of interest, but also those paths with
the next longest path delays. These paths are required, for instance, to simulate their
delay again on circuit level. This problem is referred to as k most critical paths problem.
Determining the next longest paths is not as easy as determining Pcrit in a block-based
STA approach.
Ju and Saleh [1991] propose an efficient way to compute the k most critical paths.
One advantage of their algorithm is that k does not have to be specified in advance, but
the path enumeration can be suspended and continued as required. The key idea of the
algorithm is the introduction of branch slacks (BSs).
In an initialization phase, the BSs are calculated for every edge in the TG. Therefore,
the successor nodes vi of a node u are sorted according to the following cost function
fcost :
fcost (u, vi ) = d((u, vi )) + D2S(vi ) (2.9)
This is the maximal delay from node u to T over the edge (u, vi ). The branch slack is
now the difference between the cost function of two nodes vi and vi+1 next to each other
in the sorted successor list of u:

BS(u, vi ) = fcost (u, vi ) − fcost (u, vi+1 ) (2.10)

The branch slack of an edge (u, vi ) tells us that the path with the next longest path
delay, which branches out from node u, goes over edge (u, vi+1 ) and its path delay is
BS(u, vi ) shorter. Figure 2.6 shows the calculation of the branch slacks.
In the path enumeration phase, the next longest paths are determined by means of
the branch slacks. First, Pcrit is determined as discussed before. The path with the next
longest path delay branches out of Pcrit at the edge (u, vi ) with the smallest branch slack.
This path can be determined by branching off at u to vi+1 and following the controlling
nodes of vi+1 recursively until the sink node is reached.
Additional paths can be computed as follows. The path Pk+1 with the next longest
path delay should be determined. Pk+1 can be generated by branching out at a branch

24
2.1. (Static) timing analysis

point from one of the k already determined paths. Therefore, a data structure list[i] is
required, which keeps a list of branch points for every path Pi that is already determined.
This list is sorted according to the branch slacks. Hence, the branch point resulting in
the path with the next longer path delay which branches out from Pi comes first in the
list. The data structure next_delay is another sorted list, which contains the delay of
the next longest path branching out from every already determined path Pi . The next
longest path delay for Pi can be calculated as follows:

next_delay(Pi ) = D(P i ) − BS of the first element in list[i] (2.11)

When the next longest path should be determined one takes the first path from
next_delay and looks in list[i] for the first branch point for this path (see Algorithm 2).
In Figure 2.7 an annotated TG with branch slacks and delays to sink is given. Table 2.1
shows the corresponding execution trace of the k most critical path algorithm for the
first five iterations. Given are the determined path and its delay, the branch points with
corresponding branch slacks and the next longest path delay of a path branching out
from this path. The first path is Pcrit with a path delay of 12. Pcrit has two branch points
S with BS = 1 and node 6 with BS = 2. The branch points are ordered in non-decreasing
order with respect to the branch slack. Hence, next_delay is 11 (= D(Pcrit )−BS((S, 2)))
and the corresponding path is branching out from Pcrit at S. To determine the path in
the second iteration the path with the largest next_delay is taken. In this case there is
only one next_delay, hence, the path in the second iteration is branching out from Pcrit
at S. The used branch point is crossed out (indicated by the arrow with the 2 on top
standing for the iteration in which it is crossed out). The next_delay = 9 is computed
for the second path and a new next_delay for the first path must be calculated as well
(indicated by the arrow with the 2 on top). The execution trace shows how the algorithm
continues to determine the next three longest paths.

Algorithm 2: k most critical paths


P1 ← Pcrit ;
prepare list[1] and calculate next_delay(P1 ) ;
k ← 1;
while path enumeration not stopped yet do
i ← path with longest next_delay;
j ← first branch point in list[i] ;
generate the next longest path Pk+1 by branching out from the j-th node on
path Pi ;
prepare list[k + 1] and calculate next_delay(Pk+1 ) ;
remove first element in list[i] and update next_delay(Pi );
k ←k+1 ;
end
return (P1 , P1 , . . . , Pk )

25
2. Fundamentals

9
1 4 5

12 7 5
8 3 0
2 4
10

2
6 BS=2

=
BS
= 1 7 2 4 0
BS 4
S BS=
3 3
BS=2
T
2 8 2 0

3
11 5 11
BS
BS=
4
=
4 2 2
1 6 9
4
5

Figure 2.7.: TG with branch slacks (arc between to edges) and delays to sink (number
next to the node)

k path(delay) branch points(branch slack) next_delay


*2 *3 2
1 S, 2, 6, 7, 10, T (12) S(1),
   6(2)
  11 → 10
4 5 4
2 S, 4, 6, 7, 10, T (11) S(2),
 
* 
6(2),

* 4(5)
 9→9
3 S, 2, 6, 8, 10, T (10) 8(2) 8
4 S, 1, 7, 10, T (9) S(2) 7
5 S, 4, 6, 8, 10, T (9) S(2), 8(2) 7

Table 2.1.: Execution trace of the k most critical paths algorithm for the five slowest
paths.

26
2.2. State of the art of aging analysis

The algorithm discussed so far is not only capable of enumerating all paths from S
to T , it can determine all paths from an arbitrary node to T . In order to enumerate
all paths from the source node to an arbitrary node, the algorithm must be slightly
changed. Most important is to introduce join slack (JS). Join slacks are quite similar to
branch slacks. The join slack is the delay difference between two path segments from S
to a given node.
In this thesis the k most critical paths algorithm is required in Chapter 5. It is used
to consider common edges when the possible critical paths of a circuit are identified and
to determine whether a possible critical path of an aged circuit is sensitizable.

2.2. State of the art of aging analysis


Several tools have been published that analyze the circuit performance degradation
caused by aging effects on circuit level as well as gate level [Liu et al., 2006].
Tools that analyze the degradation caused by drift related aging effects, such as NBTI
and HCI, are discussed in the following. There are other tools as well that compute the
impact on circuit reliability caused by electromigration (EM) [Blaauw et al., 2003] or
radiation-induced soft errors [Miskov-Zivanov and Marculescu, 2008].

2.2.1. Circuit level


The general flow of tools to analyze the performance degradation on circuit level can be
divided into the following three steps:
1. The fresh circuit is simulated and the current and voltage waveforms at the tran-
sistor terminals, which are relevant for the prediction of the device degradation,
are stored.
2. Those waveforms are used to generate degraded device models for each individual
device.
3. Finally, the degraded circuit performances are obtained by a second SPICE simu-
lation with aged device models.
The first published reliability simulator is called Berkeley reliability tools (BERT) [Tu
et al., 1993]. BERT is able to determine the performance degradation caused by HCI. Be-
sides that, BERT can compute the probability that a circuit fails due to time-dependent
dielectric breakdown (TDDB) and EM. In the first step, BERT determines the drain
current Id (t), the gate current Ig (t) and the substrate current Isub (t). In the second
step, from Id (t), Ig (t) and Isub (t) a parameter AGE is determined for every transistor.
AGE quantifies the amount of degradation:
mn
Id (t) Isub (t)
Z tlif e 
AGEN M OS = dt (2.12)
0 W · Hn Id (t)
1 Ig (t) mp
Z tlif e  
AGEP M OS = dt (2.13)
0 Hp W

27
2. Fundamentals

H and m are determined experimentally for a given technology. W is the transistor


width and tlif e the lifetime. Of course it is not possible to simulate the circuit for
the entire lifetime tlif e . Hence, the circuit is simulated for a shorter time interval and
AGEN M OS and AGEP M OS are extrapolated.
Two methods are implemented in BERT to determine the degraded device models.
Either by interpolating between degraded device model cards for a particular AGE or
the parameter degradation ∆p of the aged device model card are obtained by functions
dependent on AGE:
∆p = f (AGE) (2.14)

After generating the degraded device models, the degraded circuit performance can be
simulated in the third step.
Commercial reliability simulators, like RelXpert [Cadence, 2003], are already available
and the latest versions of HSPICE [Synopsys, 2008] and ELDO [Karam et al., 2001]
come with an integrated reliability analysis. RelXpert can consider the impact of HCI
and NBTI. ELDO is capable of determining the degraded device parameters iteratively.
Therefore, the specified lifetime is divided into n time intervals (of equal length). The
steps one and two are conducted in every time interval. That way, the impact of the
degraded waveforms on the parameter drift can be considered.
Maricau and Gielen [2010] analyze the combined impact of aging and process variation
on circuit behavior. Like ELDO, it is an iterative approach, but the length of the time
intervals is variable. In Section 4.5.1 it is proven by a simple experiment that such an
iterative approach is (at least for digital circuits) not necessary.
A drawback of commercial tools like RelXpert and ELDO is that the degradation
equations are proprietary. Hence, the user has to trust the tool and cannot verify how
the degradation is calculated. Kufluoglu et al. [2010] show that RelXpert only reaches
an acceptable accuracy when the proprietary degradation equations are replaced by
improved user defined equations.
Reliability simulators on circuit-level can be very accurate. However, a reliability
simulation on circuit-level is quite time consuming and realistic input vectors are re-
quired. For the first step of the aging analysis, input vectors are needed that cause a
realistic/worst-case degradation of the circuit. The third step requires input vectors to
measure the degraded circuit performances. In general, the input vectors in the first and
third step are not equal.
Like SPICE simulators for timing analysis (see Section 2.1), these tools are not capable
of simulating complex digital circuits. Nevertheless, they can be used to verify the critical
aged path determined by a aging-aware timing analysis on gate level.

2.2.2. Gate level


Aged LUT-based gate models

Although reliability simulators on circuit level are not applicable for timing analysis of
complex digital circuits, they can be used to characterize aged gate models.

28
2.2. State of the art of aging analysis

Figure 2.8.: Aged LUT-based gate model as proposed in [Chen et al., 2011].

Chen et al. [2011] propose a path-based analysis flow, although the gate model can
also be used for a block-based approach. HSPICE [Synopsys, 2008] is used to generate
several aged LUTs for different conditions like lifetime, temperature or signal probability.
This approach results in a lot of LUTs, especially when the workload at the gate inputs
should be considered. If, for instance, LUTs should be generated for five different signal
probabilities, 5 LUTs would be enough for a gate with one input (see Figure 2.8). A gate
with three inputs already needs 125(= 5 · 5 · 5) LUTs and there are gates in a standard
cell library that have even more inputs.
The aging-aware gate model GLACIER [Wu et al., 2000] considers HCI and defines a
factor α as follows:
daged
α(sIN , CL , T D) = (2.15)
df resh
The aged gate delay daged and the fresh gate delay df resh have to be simulated. df resh
is dependent on input slope sIN and output load CL . daged is also dependent on the
transition density T D at the input. For a multiple input gate, daged depends on T D at
every input. To reduce the complexity, it is assumed that the gate delay for each input
can be calculated by considering the contribution from the switching of all gate inputs
separately from one another as follows:
n
!
α= αi − (n − 1) (2.16)
X

i=1

Where n is the number of transistors connected in series and αi is the contribution of one
input pin i when just this input switches. However, this approach neglects the impact
of the workload at the other inputs and of the internal gate structure on the parameter
drift (see Section 4.3.3).
When a reliability simulator on circuit level is used to characterize a gate library, then
the gate models are valid just for one specific use profile. Hence, the gate models are
dependent on the use profile. If, for example, the specified life time changes, the entire
library has to be re-characterized.

29
2. Fundamentals

Figure 2.9.: Gate delay degradation as a linear function of ∆Vth

Aged gate delay as a function of parameter drift


All other proposed gate models have in common that they just consider NBTI and daged
is the sum of df resh and the degradation as a function of the threshold voltage drift
∆d(∆Vth ) caused by NBTI:

daged = df resh + ∆d(∆Vth ) (2.17)

The advantage of such a gate model is that it is independent of the use profile and the
workload, because they only impact the parameter drift and the drift is computed during
the analysis and not in advance during the gate model characterization.
As long as the parameter drift caused by aging is small enough, a linear approximation
for the dependence of ∆d and ∆Vth can be used (see Figure 2.9):

∂d
daged = df resh + · ∆p (2.18)
∂Vth
Paul et al. [2006] use the α-power law [Sakurai and Newton, 1990] to obtain the
sensitivity ∂V
∂d
th
:
Id ∝ (Vgs − Vth )α (2.19)
It is assumed that the gate delay is solely determined by recharging the output load (no
intrinsic gate delay):
CL · VDD const.
d= = (2.20)
Id (Vgs − Vth )α
Differentiating the expression with respect to Vth results in:
∂d α·d
= (2.21)
∂Vth (Vgs − Vth )

In contrast to that, Kumar et al. [2006] determine the dependence ∆d(∆p) by simu-
lation and store the results in LUTs. Kumar et al. [2006] also describe how to calculate
the threshold voltage drift iteratively based on the reaction diffusion (RD) equations for
NBTI (see Section 3.1.1). However, this involves solving an equation for every stress and
recovery phase during the lifetime and makes the calculation of the drift very inefficient,
especially for long lifetimes. A third contribution is that arbitrary signals result in the

30
2.2. State of the art of aging analysis

Figure 2.10.: Transformation of arbitrary signals into periodic signals with same signal
probability and transition density.

long term
prediction model
ΔVth

time

Figure 2.11.: Drawing of an NBTI threshold voltage drift caused by consecutive stress
and relaxation phases (thin black line) and the ∆Vth drift given by the long
term prediction model (thick orange line).

same drift as periodic signals with same signal probability and transition density. Hence,
it is not necessary to know the exact waveform of the gate input signals, but it is enough
to know their signal probabilities and transition densities (see Figure 2.10). Otherwise,
aging analysis would not be feasible, if exact input signals are unknown when a circuit
is developed.
Wang et al. [2007b] derive a closed form equation to calculate the upper bound of
the parameter drift caused by NBTI (see long term prediction model in Figure 2.11).
Hence, the drift does not have to be calculated iteratively. It is also shown that NBTI
has a negligible impact on the clock distribution network of a sequential circuit. For
sequential circuits it is important that the delay of the clock distribution network to the
sending and the receiving FFs have the same delay. Only that way it is assured that the
signals in the combinational logic have one full clock period to propagate from sending
to receiving FFs. Wang et al. [2007b] argue that the clock period is unaffected by aging,
because the clock signals to the sending and receiving FFs are delayed equally. However,
clock gating is not considered. If the sending and receiving FFs are in separate clock
domains, both clock signals can degrade differently. This would have to be considered
during the analysis of sequential circuits.
The gate model by Luo et al. [2007b] is based on the α-power law as well. It considers
different temperatures in active and standby mode. In standby mode the transistors
degrade as well, but due to the lower temperature and the exponential dependence of
parameter drift on temperature, the parameter drift is much smaller. In Section 4.3.3 it
is shown how different temperatures can be considered for the gate model introduced in
this thesis.
Luo et al. [2007a] introduce a model that takes the stacking effect into account. Stack-
ing effect describes the effect that not all transistors in a transistor stack have VDD as

31
2. Fundamentals

their gate source voltage.


All gate models so far have in common that they use just one value for ∆Vth , although,
in general ∆Vth differs for different transistors of a gate. Either ∆Vth is calculated for
every transistor and the maximum is taken or the ∆Vth of the transistor with an input
transition is taken.
Kumar et al. [2007a] show that the parameter drift of a NOR gate with two inputs
depends on the signal probability at both inputs. However, this is just shown exemplarily
and there is no formal algorithm derived to calculate the parameter drift of arbitrary
logic gates dependent on the signal probabilities at their inputs.
Stempkovsky et al. [2009] don’t propose a self-contained aging-aware gate model, but
an algorithm to compute the time each individual transistor of a gate is in stress con-
dition. It considers the signal correlation at the gate inputs. The model also takes into
account that the supply voltage, which must be applied to the source and drain contacts
of a PMOS transistor that it is stressed due to NBTI, can come from the drain or the
source contact (see Section 4.4.2).
Aging effects are stochastic processes. NBTI, for instance, is caused by breaking Si-H
bonds and this happens with a certain probability. This results in a distribution of the
threshold voltage drift. Hence, two identical transistors that are stressed identically do
not have the same threshold voltage drift. Kang et al. [2007] model the Vth variation of
PMOS transistors and investigate its impact on SRAM cells and combinational logic. Lu
et al. [2009] propose a statistical reliability analysis which jointly considers the impact
of process variation and aging effects.
Table 2.2 compares all aging-aware gate models discussed so far and the proposed
aging-aware gate model, AgeGate.
First optimization methods to minimize the impact of NBTI have been published. This
can for instance be done by pin reordering and logic restructuring [Wu and Marculescu,
2009] or by controlling the signals at internal nodes when the circuit is idle [Bild et al.,
2009].

32
Table 2.2.: Comparison of state-of-the-art gate models with the proposed aging-aware gate model AgeGate.

Gate model Description NBTI HCI Individual Aged output Use profile
transistor slope independent
drifts model
[Chen et al., 2011] aged LUT 3 3 3 7 7
[Wu et al., 2000] aged LUT 7 3 3a 3 7
[Paul et al., 2006] α-power law 3 7 7 7 3
[Kumar et al., 2006] simulated sensi- 3 7 7 7 3
tivities
[Wang et al., 2007b] closed form ex- 3 7 7 7 3
pression for pa-
rameter drift
[Luo et al., 2007b] different tempera- 3 7 7 7 3
ture in active and
standby mode
[Luo et al., 2007a] considers stacking 3 7 7 7 3
effect
[Kumar et al., 2007a] individual tran- 3 7 3b 7 3
sistor drifts
considered
[Lu et al., 2009] jointly considers 3 7 7 7 3
aging effects and
process variation
AgeGate based on canoni- 3 3 3 3 3
cal gate model

a
neglects impact of the workload at other inputs and of internal gate structure on parameter drift
b
Doesn’t describe formal way to calculate individual transistor drifts

33
2.2. State of the art of aging analysis
3. Aging effects and their impact on
standard cells
The objective of this thesis are methods to analyze the degradation of complex digital
circuits due to aging. But prior to that, the aging effects and their impact on the
performance of single gates are investigated.
Aging effects can be classified into effects that cause a catastrophic failure of a device
and effects that cause a drift of device parameters with time. For the analysis of the
circuit degradation the drift-related aging effects have to be taken into account. In
addition, the amount of gate performance degradation due to an aging effect and on
which factors it depends1 is investigated. This helps to decide which dependencies have
to be modeled by the aging-aware gate model that is developed in Chapter 4.
To determine the impact of aging effects on the degradation of the gate performance,
it is proceeded as follows (see Figure 3.1): The parameter drifts, caused by aging effects,
and the sensitivity of a gate performance with respect to a parameter drift are obtained.
Combining both information provides the degradation of the gate performance.
Finally, it is identified how the degradation due to aging evolves over different process
technologies. The parameter drifts due to HCI do not show a consistent trend, but it is
shown that the circuits are getting more and more sensitive to a parameter drift because
of the reduced supply voltage.

3.1. Aging effects


Aging effects change device parameters with time. It can be distinguished between
aging effects that lead to an abrupt, catastrophic failure and effects that lead to a device
parameter drift. Representatives that lead to a catastrophic failure are TDDB and EM.
TDDB can be split up into two phases [Lee et al., 2006]. The first phase is called soft
break down (SBD). With time, traps in the gate oxide are generated and these traps
eventually form a conducting path through the oxide. Once a conducting path has been
established, new traps are generated due to thermal damage. The new traps result in
higher currents, the temperature in the oxide is further increased and even more traps
are formed. This condition is called thermal runaway and finally leads to a hard break
down (HBD) and the transistor suddenly fails.
The phenomenon that electrons carry metal atoms along a wire is called electromigra-
tion. EM causes shorts or opens in signal wires and especially in supply wires [Strong
et al., 2009].
1
e.g., dependency of the gate delay degradation on temperature and supply voltage

35
3. Aging effects and their impact on standard cells

90nm; 10y; 125°C; W=10µm; Lmin INV; 90nm; 27°C; 1.2V


0.1 30

∆delay (falling input) [%]


|∆Vth| [V]
20
0.05
10

0 0
0.8 1 1.2 1.4 0 0.02 0.04 0.06 0.08 0.1
Supply Voltage VDD [V] |∆V | [V]
th

(a) (b)

Figure 3.1.: 36 mV Vth drift due to NBTI at 1.2 V VDD (a). Sensitivity of the gate delay
degradation to a threshold voltage drift (b). Hence, NBTI causes about
10 % degradation of the output delay for a rising input transition.

Aging effects that cause a catastrophic failure have to be treated stochastically by


computing a failure rate or a mean time to failure for a circuit.

Aging effects that cause a parameter drift, on the other hand, can be treated determin-
istically. They cause a degradation of the transistor characteristics, which, in turn, leads
to a degradation of the gate performance. This is the reason why drift-related aging ef-
fects have to be considered for an aging-aware timing analysis. The two dominant effects
that cause a parameter drift are negative bias temperature instability (NBTI) and hot
carrier injection (HCI). Both effects are described in detail in the following subsections.

Unfortunately, the classification of drift-related aging effects and aging effects that
cause a catastrophic failure are not as unambiguous as described so far. For the latter, a
parameter drift can be observed as well before the catastrophic failure takes place. The
resistance of a wire first increases and then an open is generated due to electromigration.
For TDDB, conducting paths lead to a gradually increase of the gate current during
the SBD phase before the transistor actually fails. If the time interval in which a
parameter drift can be observed is short, it is not required that this effect is considered
for an aging-aware TA — the device is going to fail anyway within a short period of
time. Lee et al. [2006] show that the time between a SBD and a HBD is significant in
advanced technologies. A gate model for the SBD phase of TDDB is already proposed
in [Choudhury et al., 2010]. The equivalent circuit used to model the impact of SBD on
a transistor could also be used to incorporate SBD into the proposed aging-aware gate
model discussed in Chapter 4. EM does not affect the gate itself, but the delay of signal
lines and the voltage drop across supply lines. Hence, if EM becomes relevant, it must
be considered in the wire load model for timing analysis.

36
3.1. Aging effects

gate oxide

O O
Si O
Gate Si O
O O
Source Drain Si Si

O H O H H O O H O

Si Si Si Si Si Si Si Si Si
Si Si Si Si Si Si Si Si Si

channel

Figure 3.2.: Cross section of a PMOS transistor.

3.1.1. Negative Bias Temperature Instability


NBTI is regarded the most severe aging effect nowadays. It is a research topic for the
last 40 years [Miura and Matukura, 1966] and gains increased interest in the last decade
due to the problems it causes in modern semiconductor technologies [Entner, 2007].
NBTI only affects PMOS transistors. The stress mode for NBTI is a negatively biased
gate terminal with respect to source and drain. Hence, the transistor is in inversion.
The main impact of NBTI on a PMOS transistor can be modeled by an increase of the
absolute value of the threshold voltage. A (normally-off) PMOS transistor has a negative
threshold voltage. Due to NBTI the threshold voltage becomes more negative. It could
be misleading to say that NBTI decreases the threshold voltage, because a reduction
of Vth (for NMOS transistors) implies a performance increase. The convention for this
thesis is to say that NBTI increases (the absolute value of) the threshold voltage |Vth |.
Like the name negative bias temperature instability implies, NBTI is accelerated by an
increased temperature and an increased supply voltage.

Physical mechanism of NBTI


There is still no consensus yet on the physical mechanism of NBTI. One quite popular
theory is the RD model.
According to Alam et al. [2007], NBTI originates from broken Si-H bonds at the
interface between the substrate and the gate oxide. Figure 3.2 shows a cross section of
a transistor. The substrate consists of crystalline silicon (Si). To isolate the gate from
the substrate, a layer of silicon dioxide (SiO2 ) is grown upon the substrate. The gate
itself consists of polycrystalline silicon. After the SiO2 layer is processed, dangling bonds
remain at the Si/SiO2 interface. A dangling bond is a Si atom with an unsatisfied valence.
Dangling bonds are called interface states. These states can capture charges and have a
significant negative impact on the transistor performance. During the manufacturing of
a chip, interface states are satisfied by hydrogen atoms (H). Those Si-H bonds can break
up again during the NBTI stress mode. The generated interface states are responsible

37
3. Aging effects and their impact on standard cells

for the degradation of the transistor parameters. There are contradictory opinions about
what happens with the vacant H atoms. It is still under discussion whether there is a
diffusion of neutral H atoms, a diffusion of H2 molecules, or a drift of H+ ions in the
direction of the gate. Alam et al. [2007] argue that H atoms react to H2 and H2 then
diffuses.
The generation of the interface states and the diffusion of the hydrogen can be modeled
by a RD system. In a RD system two processes are involved: A local reaction and a
diffusion (or drift) of the reaction products.
The rate of interface state generation due to NBTI is given by the following equation:
dNit
= kF (N0 − Nit ) − kR NH (0)Nit (3.1)
dt | {z } | {z }
generation annealing

N0 is the initial number of Si-H bonds, Nit is the number of interface states and kF
is the rate constant of broken bond creation (dissociation rate constant). NH (0) is the
number of hydrogen atoms at the Si/SiO2 interface. The process of Si-H bond breaking
can also be reversed. This is described by the second term. kR is the rate constant of
reverse annealing of a dangling bond and a H atom to a Si-H bond. This annealing or
recovery effects is a special property of NBTI. It means that the number of interface
states decreases again when the stress is removed.
The creation of interface states is limited by the diffusion (or drift) of hydrogen. This
is modeled by a second rate equation:
dNit dNH
= −DH + NH · µH · Eox (3.2)
dt dx
DH is the diffusion coefficient, µH is the mobility and Eox the electrical field across
the oxide. The second term can be neglected for neutral atoms or molecules. kF , kR and
DH are temperature dependent. kF depends on the electrical field as well. This means
that for the generation of interface states an electrical field is required but not for the
annealing and the diffusion. Equations 3.1 and 3.2 form a system of partial differential
equations. This system can either be solved numerically or a closed form equation can
be derived if some justified assumptions are made:
s
kF N0
Nit (t) = (DH t)1/4 (3.3)
2kR
The assumptions are that the rate of interface states is small and Nit is much smaller
than N0 . The time dependence for H diffusion is 1/4 and for H2 diffusion it is 1/6. The
dependence of Nit on Vth is given by [Schroder and Babcock, 2003]:

qNit (ΦS )
Vth ∝ − (3.4)
Cox
Cox is the oxide capacitance and ΦS is the surface potential. By increasing Nit the
absolute value of Vth is increased. Other device parameters are also going to change due
to Vth :

38
3.1. Aging effects

−4
x 10
0.5
|∆ Vth|=0mV
0 |∆ Vth|=33mv

−0.5 |∆ Vth|=66mV
|∆ Vth|=100mV
−1

Id [A]
−1.5

−2
Degradation
−2.5

−3
−1.4 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0
Vds [V]

Figure 3.3.: Output characteristic of a PMOS transistor for altered values of ∆Vth .

Id ∝ (Vgs − Vth )2 (3.5)


gm ∝ (Vgs − Vth ) (3.6)

The drain current Id is important for the performance of digital circuits and the
transconductance gm is relevant for analog circuits. Figure 3.3 shows the output char-
acteristic of a PMOS transistor for altered values of ∆Vth .
Unfortunately, the reaction diffusion theory is not able to explain all properties of
NBTI. The RD theory cannot model the temporal behavior of the recovery effect, the
bias dependence of the recovery effect, and the dependency of the parameter drift on
the duty cycle of the signal at the gate terminal [Grasser et al., 2009].
One attempt to explain this is by extending the RD model by a second component
[Islam et al., 2007]. Besides the creation of interface states, hole trapping might be
responsible for the threshold voltage drift as well. The holes are trapped by already
existing traps in the oxide. Another explanation is a two-stage model based on E’
centers [Grasser et al., 2009]. E’ centers are a well known defect in SiO2 oxides. In
the first stage the E’ centers are charged and discharged. This explains the recovery
effect. In the second stage a dangling bond can be created at the Si/SiO2 interface by a
positively charged E’ center.

Modeling of NBTI
To compute the threshold voltage drift for NBTI, degradation equations from an industry
partner are used:

Ea 1+C
 
∆Vth = A · exp · Vgs b · tstress n · (3.7)
kB · T W

The drift is dependent on temperature T , the gate-source voltage Vgs , the time tstress
the transistor is in NBTI stress mode and the transistor width W . A, Ea , kB , b, n and

39
3. Aging effects and their impact on standard cells

90nm; Vnom; 125°C; W=10µm; Lmin;


2
10

∆Vt [mV]
1
10 0 1
10 10
lifetime [y]

Figure 3.4.: Time dependence of Vth drift due to NBTI.

90nm; 10y; SP=0%; W=10µm; Lmin


0.08
1.08V
1.2V
0.06
1.32V
|∆Vth| [V]

0.04

0.02

0
0 50 100 150
T [°C]

Figure 3.5.: Temperature dependence of ∆Vth for altered values of Vgs .

C are constants. The time dependence (n) is shown in Figure 3.4. Reported values for
n in the literature are between 0.15 and 0.30 [Massey, 2004]. This could be a clue for
H as well as for H2 diffusion. ∆Vth increases monotonically with time (without taking
recovery into account). For an aging-aware timing analysis, this means that it is enough
to verify that a circuit is fast enough at the end of the specified lifetime. Due to the
power law, the drift increases very fast at the beginning and settles with time. Suppose
n is 0.25. If you have a certain threshold voltage drift after a time t1 , it takes 16 · t1 to
have a threshold voltage drift twice as high.
The temperature dependence (see Figure 3.5) is modeled by the Arrhenius equation.
The reported values for the activation energy Ea vary between 0.1 and 0.36 eV [Massey,
2004]. The voltage dependence is given by a power law. The higher the gate-source
voltage is, the higher is the electrical field across the gate oxide and the resulting drift.
For the drift, the temperature and voltage over the lifetime are important. From now
on, they are referred to as effective temperature (Tef f ) and effective supply voltage
(Vef f ), to distinguish them from the current temperature Tcurr and voltage Vcurr at the
moment the circuit is analyzed. The current values of temperature and voltage define
the sensitivities, as can be seen later in Section 3.2.1.

40
3.1. Aging effects

Vnom; 125°C; 10y; SP=0%(wc); Lmin


0.12
120nm
90nm
0.1 65nm LP
65nm HP
min. width in cell library
0.08

6Vth [V]
0.06

0.04

0.02
0 0.5 1 1.5 2 2.5 3
Width [µm]

Figure 3.6.: Transistor width dependence. Marked is the minimal transistor width used
in the standard cell libraries.

Just a vertical electrical field and no lateral field exists during the homogeneous stress
mode for NBTI. The creation of interface states is uniformly distributed over the whole
gate area and a dependence on transistor sizes should not be observable. A dependence
on transistor length for very short transistors is reported in literature [Massey, 2004],
but not modeled in the degradation equations. However, a transistor width dependence
for small transistors is modeled by the degradation equations. Some kind of edge effects
are assumed to be responsible for the dependence on transistor sizes. Figure 3.6 shows
the transistor width dependence for different technologies. Marked are the minimal tran-
sistor widths used in the standard cell libraries. One can see that for some technologies
(65 nm LP) the transistor width actually affects the drift and for other technologies
(120 nm, 90 nm) the minimal transistor width used in the standard cell library is too
large to have a significant effect on transistor drift.
NBTI strongly depends on the process technology as well. Manufacturing steps that
have an impact on NBTI drift are, for instance, concentration of hydrogen, deuterium
and nitrogen in the oxide, the gate material, and initial quality of the Si/SiO2 interface
[Schroder and Babcock, 2003].
NBTI is a statistical process [Schlünder et al., 2011]. A Si-H bond is broken with a
certain probability. Hence, the threshold voltage drift for defined stress parameters is
a probability distribution. However, the degradation equations just provide the mean
value for the drift. Rauch III [2002] shows that the sigma of the threshold voltage drift
is dependent on the transistor area:

1
σ(∆Vth ) ∝ √ (3.8)
W ·L

It is also shown that ∆Vth due to aging and ∆Vth due to process variation are uncorre-
lated [Fischer et al., 2008].

41
3. Aging effects and their impact on standard cells

ΔVth
time

Figure 3.7.: Drift over time for an AC stress.

NBTI2 is the only aging effect that shows a recovery effect. In the RD model, recovery
can be explained by the second term in Equation 3.1. This term describes the reverse
annealing of Si-H bonds. There is no consensus about whether the complete drift recovers
or a permanent part remains [Massey, 2004]. What has been understood is that the
recovery of a certain amount of drift takes substantially longer than the time needed to
generate this drift. In [Grasser et al., 2009] a proportion of recovery to degradation of
2.5/1 in logarithmic timescale is reported. This means, for instance, when a threshold
voltage drift is generated with 25 mV/decade the recovery has a slope of 10 mV/decade.
The recovery effect makes it more difficult to characterize NBTI and complicates the
analysis of a circuit as well. To extract the constants for the degradation equation, single
transistors are stressed under defined conditions and the resulting drifts are measured.
Before the drift can be measured, the stress has to be removed. Reisinger et al. [2007]
argue that a conventional measurement set up takes up to 1 s to obtain the threshold
voltage drift. Hence, the transistor has 1 s to recover before the drift is measured.
Reisinger’s proposed on-the-fly measurement just takes 1 µs and it is shown that the
drift already recovered 50 % of its value in the interval between 1 µs and 1 s. How much
of the drift is recovered before 1 µs is unknown. 1 µs seems already sufficient fast, but in
a circuit that is operated with 1 GHz the recovery time might just be 1 ps. Hence, the
error between the real drift value and the measured, already recovered value might be
larger than 50 %.
The degradation due to NBTI is frequency independent, but it strongly depends on
the duty cycle of the signal at the transistor gate. NBTI is a static aging effect. The
drift is determined by the portion of the lifetime the gate voltage is negative with respect
to source and drain and not by the number of signal transitions (frequency). Although
the degradation is frequency independent, a substantial difference between a DC and an
AC stress is observed [Massey, 2004]. This is due to the recovery effect. For a DC stress
the drift cannot recover, it will monotonically increase. For an AC stress, the drift can
recover in between the stress phases. This results in a tooth saw curve for the drift over
time as depicted in Figure 3.7. Due to the fact that the drift builds up faster than it
recovers, the mean of the drift increases monotonically.
Figure 3.8(a) shows the dependence of the drift on the stress-duty-cycle as modeled
by the degradation equations. For a stress-duty-cycle of 100 %, the transistor is con-
stantly stressed (DC stress) and the drift is maximal. For a stress-duty-cycle of 0 %, the
2
except from its counterpart positive bias temperature instability (PBTI)

42
3.1. Aging effects

90nm; Vnom; 125°C; 10y; W=10µm; Lmin


0.04 100
|∆Vth| [V]

∆Vth [%]
0.02 50

0 0
0 20 40 60 80 100
Stress duty cycle [%]

(a) (b) [Baumann et al., 2010]

Figure 3.8.: Duty cycle dependence of NBTI.

transistor is never in stress mode and there is no drift observable.


Unfortunately, the degradation equations used in this thesis do not take the recovery
effect into account. Figure 3.8(b) shows a measured curve of the stress-duty-cycle de-
pendence with recovery for a 40 nm technology [Baumann et al., 2010]. This curve has
a S-shape and the drift values for AC stress (stress-duty-cycle < 100 %) are far below
the drift for DC stress.
Not being able to consider the recovery influences the accuracy of the proposed aging
analysis results3 . However, due to the fact that the recovery effect has an impact on
the characterization as well, it is not for sure whether the results are too pessimistic
or optimistic. On the one hand, recovery is not taken into account for the dependency
on the stress-duty-cycle. If it is assumed, for instance, that a transistor experiences
a stress-duty-cycle of 50 %, the degradation equations that are used provide a drift of
about 80 % of the maximal drift. By considering recovery, the drift would just be about
40 % of the maximal drift (16 mV/42 mV from Figure 3.8(b)). Hence, the error of the
analysis would be 50 %.
However, it must be considered as well that recovery makes the measurement of the
drift more difficult. The drift values to extract the parameters for the degradation
equations were not determined by the on-the-fly measurement set-up from [Reisinger
et al., 2007]. This results in an error of at least 50 % as well. In this case both errors
would cancel each other out. The measurement underestimates the actual drift by 50 %,
because the drift has already recovered from its initial value until the measurement
starts, and the analysis overestimates the drift by 50 %, because recovery is not taken
into account for the stress-duty-cycle dependence.

3
if the workload is taken into account

43
3. Aging effects and their impact on standard cells

Positive bias temperature instability


NBTI occurs only for PMOS transistors. A similar aging effect for NMOS transistors is
called PBTI. The stress condition for PBTI is that the NMOS transistor is in inversion.
Hence, the gate terminal is positively biased with respect to source and drain.
Before high-k metal gates were introduced, PBTI could be neglected. Since then,
degradation due to PBTI is reported to be in the same order of magnitude than NBTI
([Tschanz et al., 2009]).
The developed aging analysis is based on a 90 nm technology with SiO2 as gate dielec-
tric. Hence, PBTI can be neglected. Nevertheless, there are no fundamental problems
to consider PBTI as well by the proposed aging analysis methodology.

3.1.2. Hot Carrier Injection


Hot carrier injection (HCI) affects both, NMOS and PMOS transistors. Carriers are ac-
celerated until they have enough energy to overcome the potential barrier of the Si/SiO2
interface and leave the channel. A small number of those hot carriers damage the gate
oxide and the interface or get trapped into the oxide and form space charges. Both
mechanisms lead to a degradation of the transistor characteristics. The rest of the car-
riers contributes to the gate current. Hot carriers are holes or electrons that gained a
high kinetic energy by an electrical field. By secondary effects (e.g., electron-electron
scattering) their energy can be further increased [Strong et al., 2009]. They are called
“hot” because their energy is substantially higher than their energy in thermal equilib-
rium. The carriers are accelerated by the drain-source voltage Vds across the inverted
channel. In the drain region the carriers have collected enough energy to overcome the
potential barrier of the Si/SiO2 interface. Hence, HCI is an asymmetric aging effect that
damages the drain region of a transistor.

Physical mechanism of HCI


Four different mechanisms for hot carrier generation and injection can be distinguished
[Renesas, 2008]:

• Drain avalanche hot carrier (DAHC)

• Channel hot carrier (CHC)

• Secondary generated hot carrier (SGHC)

• Substrate hot carrier (SHC)

DAHC and CHC are the two major mechanisms and are further discussed.

Drain avalanche hot carrier


High energy carriers collide with Si atoms and generate electron hole pairs by impact
ionization (see Figure 3.9). Those generated carriers are themselves accelerated and can

44
3.1. Aging effects

Vg

Vs Vd
Gate Ig

Id
Source Drain

Figure 3.9.: Drain avalanche hot carrier.


Vg

Vs Vd
Gate Ig

Id
Source Drain

Figure 3.10.: Channel hot carrier.

again cause impact ionization (avalanche multiplication). Some generated carriers are
injected into the oxide or damage the interface. DAHC is maximal for Vds = 2 · Vgs .

Channel hot carrier


This time, impact ionization is not the reason for carrier injection. For CHC (see Fig-
ure 3.10), the hot carriers themselves are injected into the oxide. They are accelerated
in the direction of the gate by a high gate voltage. Some “lucky electrons” are able to
overcome the potential barrier at the Si/SiO2 interface and enter the oxide. CHC is
maximal for Vds = Vgs .

Modeling of HCI
HCI damage can be modeled by an increase of the absolute value of the threshold voltage
Vth and an decrease of the mobility µ0 [Strong et al., 2009]. The degradation equations
used in this thesis provide a reduction of the drain saturation current Ion in terms of
percentage:
Ion,f resh − Ion,aged Ea
∆Ion = = Ae kB ·T · Vds b · tstress n · L−m (3.9)
Ion,f resh

∆Ion depends on Tef f , the effective drain-source voltage (Vds ), the stress time (tstress )
and the transistor length (L). Figure 3.11 shows the supply voltage, temperature, and

45
3. Aging effects and their impact on standard cells

90nm; 10y; 1.32V; 25°C, DF=100; W=10µm; Lmin


1
10 10 10
PMOS
8 8 NMOS

∆ ION [%]

∆ ION [%]

∆ ION [%]
6 6
0
10
4 4

2 2

0 0 −2 0
1 1.2 1.4 0 50 100 10 10
supply voltage [V] T [°C] lifetime [y]

Figure 3.11.: Voltage, temperature and lifetime dependence of HCI.

lifetime dependence of HCI. The dependence on supply voltage and lifetime follows a
power law. To determine the time the transistor is stressed, a duty factor DF is given.
The stress time tstress is tlif e /DF . A DF of 100 means that the transistor is stressed
for 1/100 of its lifetime. Reported values for n from literature are 0.25 for PMOS and
0.5 for NMOS transistors. Furthermore, a negative temperature dependence is reported,
hence HCI is the only effect that gets worth when the temperature is decreased. This
is explained by an increase of the free way length of the hot carriers. However, in the
degradation equations used in this thesis there is almost no temperature dependence for
NMOS transistors and for PMOS transistors it is positive.
∆Ion is, unlike ∆Vth , not a parameter of the transistor model. Hence, ∆Ion can not be
directly used to simulate a degraded transistor. However, there is an equivalent circuit
for a degraded transistor due to HCI (see Figure 3.12(a)). The equivalent circuit is used
to simulate an aged transistor on circuit level. It maps ∆Ion on a threshold voltage drift
∆Vth and a mobility degradation ∆µ0 . ∆Vth is realized by a voltage source VDeg and a
current controlled current source IDeg is responsible for the mobility degradation. The
value of VDeg and IDeg depend on ∆Ion .

3.1.3. Stress conditions in CMOS logic gates


Static CMOS logic is the primary design style used in digital integrated circuits, since
it has a low static power consumption and it is quite immune to noise [Uyemura, 2001].
Every CMOS logic gate consists of a pull-up and a pull-down network. Those comple-
mentary networks represent two switches, with exactly one switch being closed for every
input combination. The pull-up network, composed of PMOS transistors, is connected
to the supply voltage VDD (logic “1”) and the pull-down network, composed of NMOS
transistors, is connected to ground (logic “0”). The gate delay is determined by the time
the pull-up/pull-down network takes to recharge the output capacitance.
Single-stage logic gates have no internal nets connected to gate terminals of transistors.
Those single-stage gates can only represent inverting logic functions. For more complex,
non-inverting logic functions, multi-stage gates have to be used by connecting single-
stage gates in series.

46
3.1. Aging effects

−4
x 10

8
Degradation

D 6

Id [A]
id 4 ∆ Ion=0%
∆ Ion=5%
IDeg = 2
G ∆ Ion=10%
f (id , ∆Ion )
VDeg = 0 ∆ Ion=20%
f (∆Ion)
0 0.2 0.4 0.6 0.8 1 1.2
Vds [V]
S
(a) (b)

Figure 3.12.: (a) HCI equivalent circuit for a degraded transistor. VDeg and IDeg depend
on ∆Ion . (b) Output characteristic of an NMOS transistor for altered
values of ∆Ion .

A
A Z
Z
1 2 3 4

Figure 3.13.: Inverter gate and waveform.

The simplest logic gate is the inverter. Its pull-up and pull-down networks just consist
of one transistor (see Figure 3.13).
For NBTI the gate terminal of the PMOS transistor has to be negatively biased with
respect to source and drain. Therefore, a logic “0” is applied to the gate input. In this
case, the gate-source voltage Vgs is −VDD , the transistor is in inversion and the channel
is conducting. Hence, the drain of the transistor is charged to VDD as well (Vds = 0 V).
Whenever a logic “0” is applied, the PMOS transistor degrades due to NBTI. NBTI is
frequency independent and Kumar et al. [2006] have shown that every arbitrary signal
can be converted into a periodical signal that causes the same NBTI drift as the original
signal. Hence, it is enough to know the portion of the lifetime a signal is at logic “0”.
This can be expressed by 1 − SP . The static signal probability (SP ) is a statistical
signal property that is defined as the average amount of time a signal is at logic “1”.
For more complex pull-up networks it is more difficult to determine the time a tran-
sistor is stressed due to NBTI. The NOR gate in Figure 3.14 has two PMOS transistors
connected in series, called a stack. For transistor MP B , the condition is the same as for
the single transistor of an inverter (logic “0” at input B ). For transistor MP A again a

47
3. Aging effects and their impact on standard cells

B MP B

A MP A

Z
MN B MN A

Figure 3.14.: NOR gate with two inputs.

logic “0” at the gate terminal is required, but that is not enough. To have the source
of this transistor connected to VDD , a logic “0” has to be applied to input B as well
[Kumar et al., 2007b]. Hence, whenever MP A is stressed due to NBTI, MP B is stressed
as well and the Vth drift of MP B is always equal or larger than the Vth drift of MP A . A
formal method to calculate the portion of the lifetime a transistor is stressed, depending
on the signal probabilities at the inputs and the internal gate structure, is derived in
Section 4.3.3.
NBTI only affects PMOS transistors, hence, only the pull-up network is degraded.
This increases the gate delay just for a falling input transition. The gate delay for a
rising input transition only degrades indirectly. NBTI degrades the output slope as
well. The output slope serves as the input slope for succeeding gates. If the input slope
degrades, the gate delay increases as well. Due to this, the gate delay for a rising input
signal can increase as well.
For HCI, a strong lateral electrical field is needed that accelerates the carriers in the
channel. This is true for the NMOS transistor of the inverter (see Figure 3.13) when a
rising transition is applied to the inverter input. When the signal at the input is still
logic “0”, the NMOS transistor is in its non-conducting and the PMOS transistor is in
its conducting state. The drain of the PMOS transistor is at VDD , the voltage drop and
the electric field across the transistor are maximal. As soon as the NMOS transistor
begins to conduct, hot electrons are generated which damage the transistor.
Vgs of the NMOS transistor is equal to the input voltage and Vds is equal to the output
voltage of the inverter. The conditions for which the hot carrier generation is maximal
is a high Vds and Vgs = Vds for CHC and Vgs = 1/2 · Vds for DAHC. However, at least
the conditions for CHC are never met for an inverter (or any other logic gate), because
Vds has already started to decrease when Vgs is maximal (see waveforms in Figure 3.13).
To consider that the HCI stress in a logic gate is different from the DC stress of a
single transistor, a empirical correction factor for the degradation equation is given.
This correction factor is multiplied by the time the transistor is in stress. It reduces
the stress time dependent on the signal slopes of Vds and Vgs . The considerations above
are valid for a PMOS transistor as well. A PMOS transistor degrades due to HCI for a
falling input slope.
Let’s again take a look at the PMOS transistor stack in Figure 3.14. For transistor

48
3.2. Impact on gate performance

MP A to degrade due to HCI, a falling transition at input A is required but not sufficient.
A strong lateral electrical field only exists if transistor MP B is in its conducting state as
well. This results in a conducting path from VDD to the output capacitance. Current
flows through transistor MP A until the output capacitance is recharged and the number
of hot carriers is proportional to the drain current. The same considerations are true
for transistor MP B in the stack. A formal method to calculate the portion of lifetime
a transistor is stressed due to HCI, depending on the signal probabilities and transition
densities at the gate inputs and the internal gate structure, is derived in Section 4.3.3.
Now the waveform in Figure 3.13 can be divided into regions when a transistor is
stressed due to NBTI or HCI. For a rising input slope (region 1) the NMOS transistor is
degraded due to HCI. When the input signal is logic “1” (region 2) the NMOS transistor
would be stressed due to PBTI (PBTI is not considered yet). For a falling input slope
(region 3) the PMOS transistor is degraded due to HCI and when the input signal is
logic “0” (region 4) the PMOS transistor is in NBTI stress. NBTI increases the gate
delay and output slope for a falling input transition; HCI increases delay and output
slope for both transitions.

3.2. Impact on gate performance


So far, it was discussed which transistor parameters degrade and the factors the pa-
rameter drift depends on. Furthermore, the conditions that have to be fulfilled for a
transistor of a logic gate to degrade are derived. Now, the impact of such a parameter
drift on the performance of logic gates and sequential cells is investigated.

3.2.1. Impact on combinational gates


The impact of aging on combinational gates is obtained via SPICE simulation, by per-
forming a parameter sweep over the parameter that drifts. This provides the sensitivity
of a gate to a parameter drift. To determine the impact of other factors (e.g., tempera-
ture or supply voltage) on the sensitivity, these factors are altered while the sensitivities
are determined. Unless stated otherwise, an inverter with a small driving strength of an
industrial 90 nm cell library is chosen. The simulation conditions are: nominal supply
voltage (1.2 V), 27 ◦C, and nominal process corner. To compare different gate types and
technologies, a fan-out-3 test structure (see Figure 3.15) is chosen. The input slope and
output load of the device under test (DUT) are defined by the test structure. Such kind
of test structures are used, for instance, to evaluate and compare the performance of
different standard cell libraries.

Negative Bias Temperature Instability


A parameter sweep over the local threshold voltage is performed for NBTI. Only the
performance degradation for a falling input slope is investigated.
Figure 3.16 shows a strong dependence of the gate delay sensitivity to the supply
voltage. The sensitivity is given by the slope of the curve in Figure 3.16(a). The

49
3. Aging effects and their impact on standard cells

DUT

Figure 3.15.: Fan-out-3 structure: All gates in the test structure are identical to the
DUT. The voltage source generates a step function. To have a realistic
input signal at the DUT, the step function has to propagate through two
gates before reaching the DUT. Those two gates and the DUT have to
drive three gates.

∆delay/∆Vth (falling input) [%/100mV]


INV; 90nm; 27°C;
INV; 90nm; 27°C;
60
∆delay (falling input) [%]

0.7V
60
0.9V
1.2V 50
40 1.5V
40

20 30

0 20
0 0.02 0.04 0.06 0.08 0.1 0.8 1 1.2 1.4
|∆Vth| [V] VDD [V]

(a) (b)

Figure 3.16.: Supply voltage dependence.

degradation of the gate delay ∆delay is the change of the gate delay normalized to the
gate delay without a parameter drift. Figure 3.16(b) depicts the degradation of the gate
delay over the supply voltage for a ∆Vth of 100 mV. The lower the supply voltage the
larger is the sensitivity.
The impact of temperature is much smaller (see Figure 3.17). Again, the sensitivity
is increased by a lower temperature. As described in Section 3.1.1, it is important to
distinguish between the current and the effective temperature and supply voltage. The
current value defines the sensitivity and the effective value, which is the value over the
lifetime, defines the parameter drift. The worst case is a high effective and a low current
temperature and supply voltage. This is, for instance, the case for a circuit with a high
performance mode and a low power mode. If the circuit is operated for a long time in the
high performance mode, the transistors experience a large parameter drift. If the circuit
is then switched into a low power mode, the circuit becomes very sensitive. Hence, a
large degradation of the circuit performance can be observed.

50
3.2. Impact on gate performance

∆delay/∆Vth (falling input) [%/100mV]


INV; 90nm; 1.2V;
INV; 90nm; 1.2V; 30
30
∆delay (falling input) [%]

−40°C 25
27°C
85°C 20
20
125°C 15

10 10

0 0
0 0.02 0.04 0.06 0.08 0.1 0 50 100
|∆Vth| [V] T [°C]
(a) (b)

Figure 3.17.: Temperature dependence.

INV; 90nm; 1.2V; 27°C 90nm; 27°C; 1.2V; all PMOS identical ∆ Vth
30 25
6delay (falling input) [%]

∆ delay (falling input) [%]

A INV
B 20 NAND2
20 C NOR2
D 15 NOR3

10
10
5

0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|6Vt| [V] |∆Vth| [V]

(a) (b)

Figure 3.18.: Dependence on driving strength and gate type.

Figure 3.18(a) shows that the driving strength of a cell has almost no effect on the
sensitivity. The gate type, on the other hand, has an impact on the sensitivity (see
Figure 3.18(b)). For the NAND and NOR gates, it is assumed that all PMOS transistors
have the same threshold voltage drift. The NOR gates degrade much stronger than the
NAND gate and the inverter for the same ∆Vth . This is caused by the stacked PMOS
transistors in a NOR gate. For a falling input signal the output load is recharged over
two (NOR2) or three (NOR3) degraded PMOS transistors.
In Figure 3.19(a) and 3.19(b) the process corner (fast, nominal, slow) and the transistor
type (low Vth , regular Vth , or high Vth ) are altered. Both have only a minor impact on
the sensitivity.
To determine the impact of input slope and output load, a single gate is simulated
and the input slope and output load are altered. Figure 3.20(a) gives the sensitivity
for four different slope load combinations (slow/fast input slope and small/large output

51
3. Aging effects and their impact on standard cells

INV; 90nm; 27°C; 1.2V INV; 90nm; 27°C; 1.2V


30 30

∆delay (falling input) [%]

6delay (falling input) [%]


slow corner reg Vth
nom corner high Vth
20 fast corner 20 low Vth

10 10

0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|∆Vth| [V] |6Vth| [V]

(a) (b)

Figure 3.19.: Dependence on transistor type and process corner.

(a) (b)

Figure 3.20.: Dependence on input load and output slope.

load). The sensitivity stays almost constant except for the case with a slow input slope
and a small output load, which shows a much higher sensitivity. Figure 3.20(b) depicts
the degradation of the gate delay for a ∆Vth of 100 mV over the range of characterized
input load and output slope pairs.
Besides the gate delay, the impact on the output slope is investigated as well. Fig-
ure 3.21(a) and 3.21(b) show the impact of supply voltage and temperature, respectively.
The degradation of the output slope and the degradation of the gate delay (in terms of
percentage) are about the same.

Hot Carrier Injection


To determine the impact of HCI on the sensitivities, the transistors of the logic gates
are replaced by the HCI equivalent circuit (see Figure 3.12(a)) and a parameter sweep

52
3.2. Impact on gate performance

INV; 90nm; 27°C; INV; 90nm; 1.2V;


30

∆slopeout (falling input) [%]


∆slopeout (falling input) [%]

80 0.7V −40°C
0.9V 27°C
60 1.2V 20 85°C
1.5V 125°C
40
10
20

0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|∆Vth| [V] |∆Vth| [V]

(a) (b)

Figure 3.21.: Dependence of output slope degradation on supply voltage and


temperature.

INV; 90nm; 27°C INV; 90nm; 1.2V


50
∆delay (falling input) [%]
∆delay (falling input) [%]

0.9V −40°C
30
40 1.2V 27°C
1.5V 85°C
30 20 100°C
20
10
10

0 0
0 5 10 15 20 0 5 10 15 20
∆ION [%] ∆ION [%]

(a) (b)

Figure 3.22.: Supply voltage and temperature dependence for HCI.

is performed. This time, ∆Ion is varied, which changes the values of the voltage source
and the current controlled current source of the equivalent circuit.
Figure 3.22 shows the dependence of the sensitivity on supply voltage and temper-
ature. It is similar to the dependence with respect to NBTI. Due to that, no further
dependencies are given for HCI. By comparing the degraded transistor characteristics
for NBTI and HCI (see Figure 3.3 and 3.12(b)), one can see that both aging effects have
a similar impact on the transistor characteristics. This explains their similar impact on
the sensitivities.

3.2.2. Impact on flip-flops


Sequential circuits consist of logic gates and storage elements. Most sequential circuits
are synchronous designs and use edge-triggered flip-flops. In this section, the impact of

53
3. Aging effects and their impact on standard cells

CN

CP
D IV1 TG1 IV2 TG2 IV4 IV6 Q

CN
CP
IV3 IV5

CN IV8 CP
CLK IV7

Figure 3.23.: Schematic of master-slave flip-flop.

Comparison of setup time and delay degradation Comparison of hold time and delay degradation
40 200
PMOS_IV8
30 150 PMOS_IV2

∆tHLD (rising input) [%]


PMOS_IV3
∆tSUP (rising input) [%]

20 all PMOS
100 Inverter
10
PMOS_IV1
0 PMOS_IV8 50
PMOS_TG1
10 PMOS_TG2
all PMOS 0
20
Inverter
30
0.00 0.02 0.04 0.06 0.08 0.10 50
0.00 0.02 0.04 0.06 0.08 0.10
|∆Vth|[V] |∆Vth|[V]

(a) (b)

Figure 3.24.: Plot of sensitivities for setup and hold time.

aging on a master-slave flip-flop (MSFF) (see schematic in Figure 3.23), a commonly


used flip-flop type, is investigated.
Besides the gate delay (in this case the clock-to-q delay dCLK−to−Q ) and output slope,
two other timing constraints, the setup time tSU P and the hold time tHLD , are important
for sequential cells. In contrast to gate delay and output slope, tSU P and tHLD cannot
be measured directly, but tSU P and tHLD are obtained by solving an optimization prob-
lem: Optimize the time difference between data signal change and clock edge such that
dCLK−to−Q is 110 % of the relaxed value.
Figure 3.24 shows the degradation of the setup time when the threshold voltage of the
PMOS transistor decreases (This is the case for a degradation due to NBTI). The solid
lines show the degradation when just one particular transistor degrades. Some transistors
increase and others decrease the degradation of the setup time. The transistors with
the highest positive or negative degradation are chosen. The blue dashed line depicts
the degradation if all transistors have the same Vth drift. The green dotted line shows
the delay degradation of an inverter for comparison. One can see that the setup time
degradation and the delay degradation are almost exactly the same. Figure 3.24(b)
shows the same information for the hold time.

54
3.2. Impact on gate performance

D1
D Q ... D Q

D2
D Q

Clk

tSUP tHLD
Clk

Figure 3.25.: Sequential circuit with setup and hold time.

From this study it can be seen that the sensitivities can well be linearized and that
the degradation of tSU P and tHLD is in the same order as the degradation of gate delays.
This has the following implications for the timing behavior of a sequential circuit:

• For a long timing path (e.g., path ending at D1 in Figure 3.25), the setup time
constraint is relevant. It is violated if the data signal arrives after the setup time at
the receiving FF. Due to aging, the gate delays along the data path degrade and,
therefore, the path delay increases. Whether tSU P increases or decreases depends
on which transistors degrade the most. If tSU P decreases, this would compensate
some amount of the slower data path. If tSU P increases, the timing problem due
to the slower data path is amplified. One has to consider that a long data path
consists of many gates and the degradation of one gate delay is approximately
as large as the setup time degradation. For the investigated MSFF, tSU P is in
the same order of magnitude as the delay of combinational gates (several tens of
picoseconds). Hence, the degradation of tSU P plays a minor role compared to the
degradation of the gate delays along the path.

• For a short timing path (e.g., path ending at D2 in Figure 3.25), the hold time
constraint is relevant. It has to be ensured that the data signal at the MSFF does
not change before the hold time. This time the data path only consists of a few
gates or even none at all. If the path consists of a few gates, the degradation of
the path delay and the hold time degradation can cancel each other out. This is
not the case when there are no gates along the path. For the investigated MSFF
one has to consider that the nominal hold time is only a few picoseconds. This
means also a degradation by 150 % (as seen in Figure 3.24(b)) does not change the
absolute value of the hold time much.

Following this argumentation, it is shown that for timing verification the modeling of

55
3. Aging effects and their impact on standard cells

the gate delay degradation is more important than the modeling of the degraded setup
and hold time. A long timing path consists of many gates and the degradation of one
single gate is comparable to the setup time degradation. For a short timing path without
any gates the degradation of the hold time can be relevant, but not for the investigated
MSFF, because its hold time is only a few pico seconds.

3.2.3. Impact on power dissipation


There are three main factors of power dissipation in a CMOS gate [Chandrakasan and
Brodersen, 1995]:

Switching power dissipation (Pswitching ): Power is consumed by charging the output


load. First, the the load capacitance CL is charged to VDD by the pull-up network.
At the next output signal transition, the charge stored at the capacitance flows
through the pull-down network to ground. Pswitching is given by the following
formula:
Pswitching = T D/2 · fCLK · VDD 2 · CL (3.10)

Short-circuit power dissipation (Pshort−circuit ): Power dissipation caused by a conduct-


ing path from VDD to ground that is formed when the NMOS and PMOS transis-
tors are conducting simultaneously for a short period of time during a transition.
Pshort−circuit is given by:

Pshort−circuit = Ishort−circuit · VDD (3.11)

Leakage power dissipation (Pleakage ): Leakage power originates from the leakage cur-
rents Ileakage of a transistor when it is in off-state:

Pleakage = Ileakage · VDD (3.12)

For sub-100 nm technologies, the gate tunneling current and the subthreshold leak-
age current are the two dominant factors [Piguet, 2005]. The gate tunneling current
strongly depends on oxide thickness, whereas the subthreshold leakage current de-
pends, amongst others, on the threshold voltage.

Pswitching and Pshort−circuit are combined to the dynamic power dissipation and Pleakage
is also known as static power consumption. For the fan-out-3 test structure, Pshort−circuit
is responsible for about 10 % of the dynamic power. The portion of Pshort−circuit would
be increased by a slower input transition or by a smaller output load. To investigate the
impact of aging on these components of power dissipation, Vth of the PMOS transistors is
increased. Pswitching does not depend on the threshold voltage and stays constant. The
same is true for the gate tunneling current. The subthreshold current is exponentially
dependent on gate-source voltage Vgs and threshold voltage Vth :

Ids ∝ eVgs −Vth (3.13)

56
3.3. Technology trend

INV; 90nm; Vnom; 27°C PMOS; 90nm; Vds=Vnom; Vgs = 0V; 27°C
100 100

90 80
Pshort−circuit [%]

Ileakage [%]
80 60

70 40
rising input
60 falling input 20

50 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
|∆ Vth| |∆ Vth|

(a) (b)

Figure 3.26.: (a) Change of Pshort−circuit by altering Vth . Pshort−circuit decreases for a
rising and a falling input transition. (b) Subthreshold current for a PMOS
transistor (with Vgs = 0 V and Vds = 1.2 V) for altered ∆Vth values.

Hence, by increasing the threshold voltage the subthreshold component of Pleakage is


strongly reduced (see Figure 3.26(b)).
The impact of an increased threshold voltage on Pshort−circuit is determined by simu-
lation. The threshold voltage of the PMOS transistor of an inverter is altered and the
drain current Id of the transistor that is going from on- to off-state for the considered
input transition is measured (see. Figure 3.26(a)). One can see that for both transitions
Pshort−circuit decreases when the threshold voltage drift is increased.
By considering all important components of power dissipation, it can be seen that
aging has almost no effect on power dissipation. Indeed, power dissipation is slightly
reduced by aging. Pswitching , which is responsible for a large part of the dynamic power
dissipation, remains unaffected and Pshort−circuit slightly decreases. The static power
dissipation is slightly decreased as well4 . Hence, it is justified that this thesis focuses on
analyzing the impact of aging on timing and not on power dissipation.

3.3. Technology trend


So far, all investigations were done using a 90 nm technology. Now it is investigated
how the drifts and the sensitivities evolve for different technologies. For that purpose,
five technologies are compared: 120 nm, 90 nm, 65 nm LP (low power), 65 nm HP (high
performance), and 45 nm LP5 . The main difference between LP and HP technologies is
4
Pleakage caused by tunneling currents remains constant, but Pleakage caused by subthreshold currents
is reduced
5
For the 45 nm LP and the 65 nm HP technology, just the transistor models and the degradation equa-
tions were available, but no standard cell libraries. In order to have logic gates for those technologies,
gates from the 65 nm LP technology were taken. The transistor types were replaced and the width
and length of the transistors were adjusted.

57
3. Aging effects and their impact on standard cells

8
x 10

vertical E−field [V/m]


6

0
130nm 90nm 65nm 45nm
Technology

Figure 3.27.: Vertical electrical field over technologies at nominal supply voltage.

the gate oxide thickness. Low power technologies have a thicker gate oxide, resulting in
less leakage currents.
The common opinion in the literature is that the Vth drift due to NBTI increases
with newer technologies (e.g., see [Strong et al., 2009; Huard et al., 2009]). This is
due to the strong dependence of the drift on the vertical electrical field. The electrical
field increases because the transistor sizes are scaled more aggressively than the supply
voltage. Figure 3.27 shows the increasing electrical field. This was calculated from data
in the international technology road map for semiconductors [ITRS, 2001, 2009]. The
vertical electrical field is given by:
Vnom
Evertical = (3.14)
tox
Vnom is the predicted nominal supply voltage for a technology and tox is the correspond-
ing physical oxide thickness. However, the degradation equations for the technologies
do not show such a clear picture. The correlation between drift and Vnom can be seen
by comparing Figure 3.28(a) and Figure 3.28(b). The 120 nm technology with a Vnom of
1.5 V has the largest drift, followed by 90 nm, 65 nm LP and 45 nm LP (Vnom = 1.2 V).
The 65 nm HP technology (Vnom = 1.0 V) shows the smallest drift over the lifetime. If
the drifts over the lifetime are calculated for a VDD of 1.2 V for all technologies, the
difference between the technologies (see Figure 3.28(b)) is less than 10 mV.
All five technologies still have a SiO2 gate dielectric, hence, the impact of high-k
metal gates is not considered yet. With high-k metal gates, the gate dielectric becomes
thicker again. This reduces the electrical field. However, it is observed that NMOS
transistors experience a Vth drift due to PBTI in the same order of magnitude as the
PMOS transistors due to NBTI.
In the last several years, the research focus was on the NBTI effect, but Huard et al.
[2009] argue that HCI is no longer negligible due to a constant lateral field increase
since the 120 nm technology node. As can be seen in Figure 3.29(a) the lateral electrical
field increases as well with newer technologies (Elateral = Vnom/Lmin with Lmin being the
minimal gate length).
Figure 3.29(b) shows the ∆Ion drift for PMOS and NMOS transistors due to HCI
as calculated with the degradation equations. The PMOS transistors degrade stronger

58
3.3. Technology trend

PMOS; 125°C; Vnom; W=10µm; Lmin PMOS; 125°C; 1.2V; W=10µm; Lmin
2
10 2
120nm 10
120nm
90nm 90nm
|∆ Vth| [% of Vth0]

|∆ Vth| [% of Vth0]
65nm LP 65nm LP
65nm HP 65nm HP
1 1
10 45nm LP 10 45nm LP

0 0
10 0 1
10 0 1
10 10 10 10
lifetime [y] lifetime [y]

(a) (b)

Figure 3.28.: Transistor drifts due to NBTI and for different technologies at nominal
supply voltage (a) and at a supply voltage of 1.2 V (b).

DF=100; Vnom; 25°C; W=10µm; Lmin


1
10
7 PMOS
x 10
4 120nm
90nm
65nm LP
lateral E−field [V/m]

3 65nm HP
45nm LP
∆ ION [%]

0 NMOS
10 120nm
2
90nm
65nm LP
1 65nm HP
45nm LP

0
130nm 90nm 65nm 45nm 10
0 5
10
Technology lifetime [h]

(a) Lateral electrical field over technologies (b) Transistor drifts due to NBTI and for different
at nominal supply voltage. technologies at nominal supply voltage

Figure 3.29.: HCI over technology nodes.

than the NMOS transistors. The PMOS transistors show a clear technology trend. The
drift increases with newer technologies. The only exception is the 45 nm LP technology
with a drift smaller than the one of the 65 nm technologies.
However, the parameter drift is only half the truth, it is equally important how the
sensitivities evolve over the technologies. Figure 3.30(a) shows the sensitivity of the
gate delay with respect to a Vth drift. 120 nm, with 1.5 V nominal supply voltage, has
the lowest sensitivity. 65 nm HP (1.0 V nominal supply voltage) reveals the largest
sensitivity. The other technologies have a nominal supply voltage of 1.2 V and lie in
between. Hence, the sensitivities show the completely opposite behavior than the drifts.
The 65 nm HP technology, for instance, has the smallest drifts, but the highest sensitivity.
To compare the degradation of the gate delay for those five technologies, seven use
profiles from the business units of an industry partner were chosen. Use profiles specify
the operating conditions a circuit must be able to sustain during its lifetime. It con-
sists, among other parameters, of a specified lifetime, a maximal supply voltage and a
temperature profile. The temperature is either given by a mean value, by intervals or

59
3. Aging effects and their impact on standard cells

INV; Vnom; 27°C


Sensitivities over technologies

∆delay(falling input) [%]


120nm 35
30
90nm 30

∆delay/∆Vth [%/100mV]
65nm LP
25
20 65nm HP
45nm LP 20

15
10
10

5
0
0 0.02 0.04 0.06 0.08 0.1 0
120nm 90nm 65nm HP
∆Vt [V] Technology node

(a) (b)

Figure 3.30.: Sensitivity of the inverter delay for different technologies.

INVïdelay degradation for different stress profiles


25
Profile A
6delay (falling input) [%]

20 Profile B
Profile C
Profile D
15
Profile E
Profile F
10
Profile G
5

0
65nm LP 45nm LP 120nm 90nm 65nm HP
Technology

Figure 3.31.: Degradation of inverter delay for different technologies and use profiles.

by a Gaussian distribution. The 65 nm HP technology shows the largest degradation of


the gate delay for a falling input transition (see Figure 3.31), followed by the 90 nm and
the 120 nm technologies. The low power technologies with a thick gate oxide show the
lowest degradation.

3.4. Summary
For an aging-aware timing analysis aging effects that cause a parameter drift are rele-
vant. The two most severe drift-related aging effects nowadays are NBTI and HCI. In
order to determine the degradation caused by aging effects, the parameter drifts due
to a particular aging effect have to be considered as well as the sensitivity of a gate
performance with respect to a parameter drift.
The physical mechanism behind NBTI is not yet completely understood. NBTI can
best be modeled by an increased Vth of the PMOS transistors. A special characteristic
of NBTI is that the Vth drift recovers when the transistor is no longer stressed. NBTI
is strongly dependent on the supply voltage. As soon as high-k metal gates are used,
PBTI must also be considered because then for such gates, PBTI shows a drift in the

60
3.4. Summary

same order of magnitude as NBTI.


HCI was the dominant aging effect until it was outplayed by NBTI. However, due
to the constantly increasing lateral electric field in newer technology nodes, HCI is no
longer negligible. HCI leads to a threshold voltage drift and to a mobility degradation.
The supply voltage dependence of HCI is quite strong as well.
How sensitive a gate is to a parameter drift caused by NBTI or HCI is also strongly
dependent on the supply voltage. In contrast to the parameter drifts, however, the
sensitivity is increased by a lower supply voltage.
It is shown that modeling the degradation of the gate delay is more important than
modeling the degradation of a flip-flop. A long timing path consists of many gates and
the degradation of one single gate is comparable to the setup time degradation. For
short timing paths without any gates the degradation of the hold time can be relevant,
but not for the investigated master-slave flip-flop, because its hold time is only a few
picoseconds.
Aging causes a circuit to slow down, but the power consumption is almost not affected.
It is even likely that the power consumption is slightly reduced. Pswitching stays constant
and Pshort−circuit is slightly reduced. The subthreshold leakage current, one component
of the static power consumption, is also reduced by an increased threshold voltage.
In theory, the degradation due to NBTI as well as HCI should increase in advanced
technologies, due to increasing vertical and lateral electrical fields in a transistor. For
HCI, the degradation equations represent this trend. However, these trends can not be
seen in the degradation equations for NBTI. On possible reason is that it is hard to
compare the NBTI drifts for different technologies because NBTI strongly depends on
several manufacturing steps. A clear trend can be seen for the sensitivity of a gate. It
increases with newer technologies, due to the reduced supply voltage. One can conclude
that the degradation caused by HCI will increase in newer technologies. For NBTI, it
depends on what predominates — the reduced drift or the increased sensitivity.

61
4. Aging-aware static timing analysis
For performing an aging-aware TA on gate level, a gate model is required that provides
the aged gate delay instead of the fresh one. This is the main difference compared to a
traditional STA without aging. The proposed aging-aware gate model is called AgeGate
[Lorenz et al., 2009a] and it has the following advantages compared to the state-of-the-art
approaches discussed in Chapter 2.2.2:

Analyzing impact of NBTI and HCI: The proposed aging-aware gate model is not lim-
ited to just one aging effect. It analyses the combined impact of NBTI and HCI.
From the aging-aware gate models that were already introduced, all except the
LUT-based gate model in [Chen et al., 2011] considers just one aging effect. The
results of our proposed approach show that the mean degradation of the circuit
delay is 10.1 % for NBTI and 3.2 % for HCI. Hence, HCI can not be neglected
although NBTI is the dominant aging effect for the investigated 90 nm technology
and the chosen operating conditions.

Individual parameter drifts: The single transistors of a gate degrade individually, be-
cause due to the workload at the gate inputs and the internal gate structure the
time the transistors are in stress mode differs. A formal way to calculate individ-
ual parameter drifts for every transistor is developed. A canonical gate model is
used, which can consider the impact of the individual parameter drifts on the gate
performances. The results show that the degradation is overestimated by 20 %
without considering individual parameter drifts.

Degradation of the output slope: The proposed approach not only calculates an aged
gate delay, but in addition an aged output slope is determined. Like in a tradi-
tional STA, signal waveforms are modeled as ramps. The output slope of one gate
determines the input slope of a succeeding gate and this, in turn, impacts the gate
delay of the succeeding gate. The results show that the degradation of the circuit
delay is underestimated by 24 % when the fresh output slope instead of an aged
output slope is taken to calculated the gate delay.

Easy extensibility: The gate model considers two aging effects at the moment, but the
approach can easily be extended. The proposed approach is based on calculating
the transistor drifts and then computing the aged gate performances. For calcu-
lating the aged gate performances, the sensitivities of the gate performances with
respect to a parameter drift are required. Other aging effects that cause a drift
of transistor parameters can be taken into account if degradation equations are
available and the sensitivities for this new aging effect are characterized.

63
4. Aging-aware static timing analysis

One effect that gets important in technologies with a metal gate is positive bias
temperature instability (PBTI). PBTI is the counterpart of NBTI and degrades
NMOS transistors. Another effect that might become relevant in the future and
must be modeled is TDDB. In [Choudhury et al., 2010] it is shown that also
TDDB leads to a degradation of transistor characteristics before it comes to an
catastrophic breakdown.
The degradation equations can also be replaced by more accurate ones to take the
recovery effect for NBTI into account.

Independence of the use profile: Another advantage compared to aged LUT-based


approaches, like Glacier [Wu et al., 2000], is that the operating conditions over
lifetime and the workload just affect the degradation equations. Since the sensi-
tivities, which are obtained when the cell library is characterized, are independent
of the use profile, the library does not have to be re-characterized in case the use
profile changes (e.g., the temperature over lifetime is not 125 ◦C but 110 ◦C).

The chapter is organized as follows: First the complete aging analysis flow is intro-
duced (Section 4.1), then it is explained how the workload can be determined (Sec-
tion 4.2). In Section 4.3 the proposed aging-aware gate model is explained. The charac-
terization of the standard cells is described in Section 4.4 and results for several bench-
mark circuits are given in Section 4.5.

4.1. Aging-aware STA flow


An aging-aware static timing analysis (ASTA) works similar to a traditional STA (see
Section 2.1). The main difference is that an aging-aware gate model is required to
compute aged gate performances (gate delay and output slope) instead of fresh ones.
Those aged gate performances depend on the use profile and workload at the gate inputs.
Figure 4.1 summarizes the ASTA flow:

1. The operating conditions over lifetime are specified by globally setting the supply
voltage Vef f and the temperature Tef f . The approach could also be extended to
take voltage drops and temperature gradients over a chip into account by having
individual supply voltage and temperature values for every gate. That way, the
accuracy of the aging analysis could be increased.

2. The workload at the gate inputs is required to calculate individual transistor drifts.
The workload is defined by gate input signals over lifetime and it is determined by
two statistical parameters, signal probability (SP ) and transition density (T D):
• SP and T D can be obtained by performing a logic simulation of the circuit.
However, this requires typical input signals for the circuit and contradicts the
fundamental idea of a static timing analysis, which is independent of input
signals.

64
4.1. Aging-aware STA flow

Figure 4.1.: Aging analysis flow

• Another approach to obtain SP and T D are probabilistic methods which were


developed for analyzing dynamic power dissipation. Probabilistic approaches
just require the values for SP and T D at the primary inputs. These values
are propagated through the circuit.

• If neither realistic input signals nor values for SP and T D at the primary
inputs are available, a worst-case analysis can be performed. Worst-case
values for SP and T D are specified that are used for all nets of the circuit.
By choosing 0 % for SP , it is guaranteed that all PMOS transistors are in
inversion during the entire lifetime and the circuit degrades maximal due to
NBTI. For T D the specification of worst-case values is more difficult because
it has to be considered that due to the delay of the gates a signal may change
several times before settling to its static value (this is referred to as glitches).

In the proposed approach, a probabilistic method is used whenever the workload


should be considered. Probabilistic methods are described in more detail in Sec-
tion 4.2.

3. After operating conditions and the workload are determined, the aged gate per-
formances can be calculated by modifying the Function update_edge from page
21 (see Function update_edge_aged on page 66). First, the stress probabilities
for the single transistors of a gate are obtained. The stress probability is the per-
centage of time that a transistor is stressed by a particular aging effect during
the lifetime. Next, the parameter drifts for the single transistors are computed by
means of degradation equations for NBTI and HCI. Finally, the aged gate delay is
computed by adding up the fresh gate delay and the gate delay degradation.

65
4. Aging-aware static timing analysis

Function update_edge_aged(u,v)
/* Recursive function to update the gate delay of an edge (u, v) */
if dvalid ((u, v)) == F alse then
/* Update gate delay based on input slope and output load */
slope = get_slope_from_node(u);
load = get_load_from_node(v);
stress_probabilities = get_Pstress();
drif ts = get_drifts(use_profile, stress_probabilities);
df resh ((u, v)) = get_delay_from_LUT(slope, load);
∆d((u, v)) = get_degradation(slope, load, drifts);
daged ((u, v)) = df resh ((u, v)) + ∆d((u, v));
dvalid ((u, v)) = T rue
end
return daged ((u, v))

4.2. Workload determination


The degradation of a gate depends strongly on the gate input signals over lifetime. In
Section 3.1.3 it is described that a logic “0” at a gate input results in a degradation of
the PMOS transistors due to NBTI. The fraction of the lifetime a signal is at logic “1” is
given by a statistic signal property called static signal probability. According to Najm
[1994]:
Signal probability: The signal probability SP (x) of a node x is the average number of
clock periods a signal is at logic “1”.
Hence, the probability that a signal is logic “0” is 1−SP (x) and is from now on referred
to as SP (x). For HCI on the other hand, it is relevant how often a signal changes its
logic state. The statistical signal property of interest is the transition density [Najm,
1994]:
Transition density: The transition density T D(x) of a signal x is the mean number of
signal transitions per clock period.
Hence, to consider the impact of the workload on the degradation, the exact signal
waveforms for all gate inputs are not required. It is enough to obtain the two statistical
signal properties for all gate inputs.
One possibility to determine SP and T D is by logic simulation. If typical/worst-case
circuit input vectors are available, they can be used to simulate the circuit, store the
signal waveforms at every gate input, and compute SP and T D. Such an approach
is called strongly input pattern dependent because typical circuit input waveforms are
required.
The remaining approaches, discussed here, just require values for SP and T D at
the primary inputs to be specified. These approaches are called weakly input pattern
dependent.

66
4.2. Workload determination

Xakellis and Najm [1994] generate random signal vectors for the primary inputs, which
have the specified SP and T D. Then, logic simulation is used to obtain the signal
waveforms at every gate input and SP and T D for those signals are computed. This is
repeated until the stopping criterion is reached. At every iteration a new mean value for
SP and T D at every gate input is calculated. The stopping criterion is fulfilled when
all mean values are within a confidence interval specified in advance. Although, this
approach is very accurate, it is quite time consuming.
The following approaches are called probabilistic methods because they propagate
the statistical signal properties directly from the primary inputs into the circuit. The
approaches differ in how accurately they consider the spatial and temporal dependence
of signals.
Spatial and temporal dependence are defined as follows:

Spatial dependence: Two signals may depend on one another. For instance, both sig-
nals cannot be logic “0” at the same point in time. Spatial dependence arises
when a circuit has feedback (sequential circuits) and for a signal that splits and
reconverges again. In general, probabilistic methods assume spatial independent
signals at the primary inputs.

Temporal dependence: The logic value of a signal for two points in time may be inter-
dependent. A clock signal, for instance, is logic “1” during half a clock period and
logic “0” in the succeeding half.

Cirit [1987] computes the signal probability at a gate output y = f (x1 , x2 , . . . , xn )


from the signal probabilities at the gate inputs by the following recursive formula:

SP (y) = SP (x1 ) · SP (fx1 ) + SP (x1 ) · SP (fx1 ) (4.1)

fx1 and fx1 are the cofactors of f with respect to x1 . For a NAND gate (y = x1 · x2 )
the signal probability at the output is:

SP (y) = SP (x1 ) · SP (x2 ) (4.2)

If temporal independence is assumed, the transition density is easily calculated by ([Yeap,


1998] p. 64):
T D(y) = 2 · SP (y) · SP (y) (4.3)
By not just propagating the signal probability but also the transition density through
the circuit, temporal dependence can be taken into account. Najm [1993] propagates
T D by the following formula:
n
∂y
T D(y) = SP ( ) · T D(xi ) (4.4)
X

i=1
∂xi

∂y/∂x is the Boolean difference (∂y/∂x := yx ⊕yx ) and the signal probability of the Boolean
difference is the probability that the gate is sensitized and the transition at input xi is

67
4. Aging-aware static timing analysis

b
a x
c
b z
y
c 0 1
(a) (b)

Figure 4.2.: An example on calculating signal probabilities

observed at the gate output. It is assumed that the signals at the inputs x1 to xn are
spatially independent.
SP and T D at the nets can be computed directly from the statistical signal properties
at the primary gate inputs and not by propagating SP and T D from the gate inputs
to the gate outputs. In this case no internal spatial independence has to be assumed.
It is just assumed that the signals at the primary inputs are independent. A binary
decision diagram (BDD) is used to express the logic function of a signal dependent on
the primary inputs. The cofactors can now easily be calculated by following the true
and the false branch of the particular node of the BDD. The Boolean difference can be
computed with a BDD as well. Hence, Equation 4.1 and 4.4 can be used to compute
SP and T D for a signal directly from SP and T D at the primary inputs.
The difference between propagating SP from the gate inputs to the gate outputs or
directly computing SP from the primary inputs is illustrated by the following example
(see Figure 4.2). All three primary inputs have a signal probability of 0.5 and a transition
density of 1. SP and T D at the internal nets x and y are the same for both approaches:

SP (x) = SP (a) · SP (xa ) + SP (a) · SP (xa ) = SP (a) · SP (b) + SP (a) · 0 = 0.25


∂x ∂x
   
T D(x) = SP · T D(a) + SP · T D(b) = SP (b) · SP (a) + SP (a) · SP (b) = 1
∂a ∂b
SP (y) = SP (b) · SP (c) = 0.25
T D(y) = 1

First SP and T D at z are computed from SP and T D of the internal nets:

SP (z) = SP (x) · SP (y) = SP (a) · SP (b)2 · SP (c) = 0.0625


T D(z) = SP (y) · T D(z) + SP (x) · T D(y) = 0.5

68
4.3. AgeGate: Aging-aware gate model

When the BDD is used and SP (z) and T D(z) are computed from SP and T D at the
primary inputs directly, it looks as follows:

SP (z) = SP (a) · SP (b) · SP (c) = 0.125


∂z ∂z ∂z
     
T D(z) = SP · T D(a) + SP · T D(b) + SP · T D(c)
∂a ∂b ∂c
= SP (b · c) · T D(a) + SP (a · c) · T D(b) + SP (a · b) · T D(c) = 0.75

The difference in SP (z) and T D(z) results from the fact that the first approach does
not consider the spatial correlation from the reconvergent paths starting at b.
Unfortunately, circuits of industrial complexity are too large to set up the BDD for the
entire circuit. This is the reason why in this thesis the statistical signal properties are
propagated from the gate inputs to the gate outputs and the resulting accuracy penalty
is accepted. In [Najm, 1993] a compromise is proposed. The circuit is partitioned and
a BDD is generated for each partition of the circuit. That way the spatial correlation
within a partition is kept and just at the partition borders the correlation is lost.

4.3. AgeGate: Aging-aware gate model


After the operating conditions over lifetime have been specified and the values for SP
and T D at the gate inputs have been obtained, the aged gate performances can be
computed. AgeGate consists of three fundamental parts:

• A canonical gate model

• Technology specific degradation equations

• Information about the internal gate structure

The canonical gate model provides the aged gate performances dependent on the pa-
rameter drifts of the single transistors. These drifts are calculated by technology specific
degradation equations. The workload has an essential impact on the parameter drifts,
since it defines the fraction of the lifetime a transistor is actually stressed by a particular
aging effect. To determine this impact, information about the internal gate structure is
required.

4.3.1. Canonical gate model


The canonical gate model corresponds to a first-order Taylor series approximation at the
nominal gate performance qf resh :

qaged = qf resh + ∆q = qf resh + χqm,p · ∆pm (4.5)


X X

m∈G p∈P

The aged gate performance (qaged ) is the sum of the fresh gate performance (qf resh ) and
the degradation of the gate performance (∆q). G is the set of all transistors of the gate

69
4. Aging-aware static timing analysis

and P is the set of all parameters that drift due to aging effects. χqm,p are the sensitivity
coefficients and ∆pm is the parameter drift of a particular transistor m. The sensitivity
coefficients are defined as:
∂q
χqm,p = |∆p =0 (4.6)
∂∆pm m
It is the partial derivative of q to a drift ∆pm at the nominal parameter value (∆pm = 0).
For the aged gate delay daged this results in:
!
∂d ∂d
daged = df resh + · ∆Vth,m + · ∆Ion,m (4.7)
X

m∈G
∂Vth,m ∂Ion,m

The sensitivity coefficients ∂d/∂Vth,m and ∂d/∂Ion,m are obtained together with the fresh
gate delay df resh when the gate is characterized. The drifts ∆Vth,m and ∆Ion,m are
computed during aging analysis by degradation equations. The aged output slope is
modeled similarly to the aged gate delay.
Figure 4.3 shows that only a small error is introduced by linearizing the dependence
of the gate performance to a parameter drift. The degradation of an inverter delay for
a drift of Vth and Ion is shown. The dependencies are once simulated on transistor level
and also calculated by means of the sensitivities. The comparison shows a good match
until 10 % degradation of Ion and 50 mV Vth drift. Those are drift values which are
reached just for very demanding operating conditions over lifetime (10 y, 125 ◦C, and
110 % Vnom ). Hence, the linearized sensitivities in the canonical gate model are justified.
Should the parameter drifts become too large and the error of the linear model be no
longer acceptable in future technologies, it is possible to move to a quadratic gate model
as it it proposed in [Zhang et al., 2005] for statistical static timing analysis (SSTA).

4.3.2. Degradation equations


In order to compute the aged gate performances with the canonical gate model, the
parameter drifts for all transistors are required. These drifts are calculated by degrada-
tion equations. For NBTI a threshold voltage drift (∆Vth ) is provided and for HCI the
degradation equation yields a drift of the drain saturation current in terms of percentage
(∆Ion ):

∆Vth = f1 (Vef f , Tef f , tstress , L) (4.8)


∆Ion = f2 (Vef f , Tef f , tstress , W ) (4.9)

The equations are already discussed in Sections 3.1.1 and 3.1.2. The drifts depend on
the effective supply voltage over lifetime Vef f , the effective temperature over lifetime
Tef f , the time tstress and the transistor sizes W and L.
The time tstress states for how long the transistor is stressed due to an aging effect
during the lifetime tlif e . The stress time can be expressed as:

tstress = Pstress · tlif e (4.10)

70
4.3. AgeGate: Aging-aware gate model

Inverτer delay degradaτion w.r.τ. parameτer drifτ


30 30
Simulaτion
Sensiτiviτy
25 25
(falling inpuτ τransiτion) [%]
20 20

15 15

10 10

 5 5


00 5 10 15 20 00 20 40 60 80 100


 [%]  [mV]

Figure 4.3.: Degradation of inverter delay by ∆Ion and ∆Vth , respectively. Solid lines
show dependencies calculated with sensitivities and dotted lines show de-
pendencies simulated on transistor level. Analyzing conditions are 27 ◦C,
1.2 V and 15 pF capacitive load.

Pstress is the probability that a transistor is stressed during tlif e . Pstress differs for
the individual transistors of a gate. The individual stress probability depends on the
workload at the gate inputs and on the internal gate structure.
A transistor must be negatively biased with respect to source and drain in order to
degrade due to NBTI (see example in Figure 4.4). For transistor MP C this is the case
when a logic “0” is applied to the input C. Hence, the stress probability of MP C just
depends on the signal probability at C. For transistor MP B on the other hand, a logic “0”
must be applied to B but also to input C. Otherwise the gate is not negatively biased
with respect to the source node of the transistor. Hence, the stress probability for MP B
depends on the workload at input B and C and in addition on the internal structure
of the gate [Kumar et al., 2007b]. More precisely it depends on the position of the
transistor in the PMOS stack. The challenge to determine individual transistor drifts
is to obtain the stress probabilities for all the transistors a gate consists of by means of
the values for SP and T D at the gate inputs and the internal gate structure.

4.3.3. Calculation of Stress Probabilities


To calculate the parameter drifts of a transistor, the stress probabilities Pstress,N BT I
and Pstress,HCI have to be obtained. Generally applicable methods, which can easily be
automated, are presented to calculate the two stress probabilities for every transistor of
a gate.

71
4. Aging-aware static timing analysis

SP = 0.5
C MP C

SP = 0.4
B MP B

SP = 0.3
A MP A

Z
MN C MN B MN A

Figure 4.4.: NOR gate with three inputs

Stress Probability for NBTI


A PMOS transistor M is in stress condition when it is in inversion. Hence, M degrades
due to NBTI when its gate terminal is negatively biased with respect to its source and
drain terminals. This can be expressed by the following two conditions:

A: logic “0” applied to the gate terminal of M

B: logic “1” applied to the source or drain terminal of M

For the calculation of the stress probability for HCI, which is introduced in Section 4.3.3,
the probability that an NMOS transistor is conducting is needed as well. Due to that,
a new probability Pon is introduced. For a PMOS transistor, Pon is the probability that
the gate terminal is at logic “0”, given by 1 − SP at the gate terminal. For an NMOS
transistor, Pon is the probability that the gate terminal is at logic “1”, which equals SP
at the gate terminal. Hence, the probability for Condition A is equal to Pon of M :

P (A) = Pon,M (4.11)

For NBTI the gate terminal must be negatively biased with respect to its source and
drain terminal. However, Condition B just considers the logic value at one of both
transistor terminals. The reason is that when the transistor is conducting (condition A
fulfilled) it is enough to have a logic “1” at the source (drain) terminal, since the drain
(source) terminal will be charged to the same value.
Condition B is fulfilled if a conducting path exists between the supply voltage VDD
and the source or drain terminal [Stempkovsky et al., 2009] of the transistor M . Hence,
all PMOS transistors along the conducting path must have a logic “1” applied to their
gate terminals as well. There might be multiple paths from VDD to the source or drain
terminal of a transistor. In this case P (B)i is calculated separately for every path
P AT HN BT I,i :

P (B)i = , if signals are independent (4.12)


Y
Pon,t
t∈P AT HN BT I,i

72
4.3. AgeGate: Aging-aware gate model

B'

Figure 4.5.: Example explaining the signal dependence.

P AT HN BT I,i is the set of all transistors along a conducting path. How those paths are
determined is explained in Section 4.4.2. For independent signals at the gate inputs, the
probabilities can simply be multiplied.
The overall probability P (B) is the probability that at least one path is conducting1 :
P (B) = 1 − ( (1 − P (B)i )) (4.13)
Y

However, if the signals are dependent, this has to be taken into account when the
probability for condition B is calculated. To calculate P (B) for transistor MP A (see
Figure 4.4), P AT HN BT I consists of MP B and MP C . Both gate terminals have a signal
probability of 0.5. If the signals are independent, P (B) would be 0.5 · 0.5 = 0.25. If the
signals B and C are dependent (see Figure 4.5) it is not that easy to calculate P (B)
for transistor MP A . In the first case (signals C and B), both signals are never logic “0”
at the same time, hence, both transistors will never be in inversion at the same time
and therefore P (B) is 0. In the second case (signals C and B 0 ), the signals are always
logic “0” at the same time and P (B) is 0.5.
The larger the probability for Condition B, the larger is the transistor drift and the
increase of the gate delay. A worst-case assumption for Condition B is that all transistors
of a path tend to be in inversion at the same time. In this case, the minimum of the
probabilities Pon for all transistors in P AT HN BT I,i limits P (B)i :
P (B)i = min (Pon,t ) , worst-case assumption if signals are dependent
t∈P AT HN BT I,i
(4.14)
If there is more than one path, a worst-case assumption for Condition B is that just
one path is conducting at a time and the probabilities P (B)i can simply be added:
P (B) = min( P (B)i , 1) , worst-case assumption if signals are dependent (4.15)
X

When the workload should be considered for the aging analysis, a probabilistic method
is used to compute the signal probabilities at the gate inputs. The probabilistic method
assumes independent input signals and the dependence of reconvergent signals is lost as
well. Hence, the worst-case assumption is used in order to have a conservative result.
Finally, Pstress,N BT I is the probability that both conditions A and B are fulfilled:
P (A) · P (B) , if signals are independent
(
Pstress,N BT I = P (A ∧ B) = (4.16)
min (P (A), P (B)) , if signals are dependent
1
This is 1 minus the probability that no path is conducting at all

73
4. Aging-aware static timing analysis

For Pstress,N BT I it has also to be taken into account whether independent signals are
assumed or not.
For illustration, the probability Pstress,N BT I for transistor MP A in Figure 4.4 is cal-
culated. For independent signals Pstress,N BT I = (1 − 0.5) · (1 − 0.5) · (1 − 0.3) = 0.175.
Otherwise, Pstress,N BT I is the minimum of the three SP values of the transistors in the
stack, hence, Pstress,N BT I = min(0.5, 0.5, 0.7) = 0.5.

Stress Probability for HCI

A transistor degrades due to HCI when carriers are accelerated and injected into the
gate oxide. The required electric field along the channel exists when the transistor
switches from its non-conducting (off) to its conducting (on) state. For an NMOS
(PMOS) transistor this implies a rising (falling) signal transition at the gate terminal.
Furthermore, the degradation depends on the charge that flows through a transistor.
Only if all other transistors along a path P AT HHCI from supply voltage/ground to
the cell output are in inversion, the output load is recharged. Otherwise only internal
capacitances are recharged which are substantially smaller and neglected in the proposed
approach. For HCI, two stress conditions have to be fulfilled for a transistor M :

C: transition from off- to on-state at transistor M

D: conducting path from supply voltage/ground to output load

T D at the gate terminal of M is a measure for the number of transitions (Condition C).
Furthermore, all other transistors along the path P AT HHCI,i must be in inversion to
form a conducting path (Condition D):

, if signals are independent


 Y

 Pon,t

t∈P AT HHCI,i \{M }
P (D)i =
min (Pon,t ) , worst-case assumption if signals are dependent



t∈P AT HHCI,i \{M }
(4.17)

1 ( (1 − P (D)i )) , if signals are independent


 Y

 −

i
P (D) =
min( P (D)i , 1) , worst- case assumption if signals are dependent
X



i
(4.18)
The considered transistor M itself is excluded from the path P AT HHCI , because M does
not have to be in inversion. For P(D) it has to be distinguished between independent
and dependent signals as well.
To obtain Pstress,HCI , time tstress,HCI has to be computed first. The number of
transitions from off- to on-state during the whole lifetime is T D/2·fCLK ·tlif e , with fCLK
being the clock frequency. This number multiplied by P (D) is the number of effective

74
4.3. AgeGate: Aging-aware gate model

SP = 0.5
C MP C

SP = 0.4
B MP B

SP = 0.3
A MP A MP Z
int Z

MN C MN B MN A MN Z

Figure 4.6.: OR gate with three inputs and an internal signal int.

transitions. The number of effective transitions times the input slope is tstress,HCI .
Hence, Pstress,HCI is:
tstress,HCI
Pstress,HCI = = T D/2 · fCLK · P (D) · sIN (4.19)
tlif e

Multi-stage gates
The aging-aware gate model, described so far, is capable of determining the aged gate
performances of single-stage gates. Single-stage gates have no internal nets that are
connected to gate terminals of transistors. Examples for single-stage gates are inverters,
NAND and NOR gates. But a simple buffer is already a multi-stage gate, because the
two inverters are connected via an internal net. For multi-stage gates the following
problems arise when the stress probabilities are calculated [Lorenz et al., 2010d]:
1. The values for SP and T D of internal signals (e.g., int in Figure 4.6) are un-
known. These values are necessary to calculate Pstress,N BT I and Pstress,HCI for
the transistors MP Z and MN Z .
2. The transition time sIN of internal signals is unknown as well. sIN is required to
compute Pstress,HCI with Equation 4.19.
To obtain the statistical signal properties the probabilistic method from [Najm, 1991]
is used. Probabilistic methods can not just be used to propagate SP and T D from the
gate inputs to the gate output but also to propagate them to internal nets. To do so,
the logic function of the internal signal is determined when the gate is characterized.
The transition time sIN of internal signals needed in Equation 4.19 is obtained dur-
ing the characterization of the gate. Like the output slope sOU T , it is characterized
dependent on input slope at the gate input and output load at the gate output.

Consideration of temporal variation of temperature and voltage


So far, an identical temperature Tef f and supply voltage Vef f for all gates and over the
entire lifetime are assumed. In Section 3.1.1 it is discussed how temperature and voltage

75
4. Aging-aware static timing analysis

Percentage of lifetime Temperature


20 % 125 ◦C to 150 ◦C
60 % 60 ◦C to 85 ◦C
20 % −20 ◦C to 27 ◦C

Table 4.1.: An example for a temperature profile. The lifetime is 10y and Vef f is Vnom .

differences across the chip can be taken into account by having an individual Tef f and
Vef f value for every gate. In this section two methods are proposed to determine the
parameter drifts when Tef f and Vef f change during the lifetime. Hence, there exists a
temperature and/or voltage profile. Such a profile has to be defined in the specifications
for a circuit. A temperature profile could for instance look as shown in Table 4.1.
To ensure a conservative result, the upper bounds of the temperature intervals are
taken. This results in the following time-temperature-tuples (ti , Ti ):

(2 y, 150 ◦C), (6 y, 85 ◦C), (2 y, 27 ◦C) (4.20)

The first proposed method just works for temperature profiles. The basic idea is
to determine the effective temperature Tef f that results in an equivalent drift as the
temperature profile over the same time. In both degradation equations for NBTI and
HCI the temperature dependence is given by the Arrhenius equation:
− kEaT
∆Vth , ∆Ion ∝ e b (4.21)

Ea is the activation energy (e.g., 0.16 eV for NBTI) and kb is the Boltzmann constant.
The time dependence is modeled for both effects as follows:

∆Vth , ∆Ion ∝ tn (4.22)

With n being a constant (e.g., 0.23 for NBTI). First, an arbitrary reference temperature
Tref is chosen and the times of the time-temperature-tuples are adjusted in a way that
the degradation stays the same (ti , Ti ) → (ti,ref , Tref ):
Ea
− kET
a ! −k
tni · e b i = tni,ref · e b Tref (4.23)

Solving Equation 4.23 for ti,ref results in:


 1/n
−E a( 1 − 1 )
ti,ref = ti · e kb Ti T ef
r (4.24)

When this is done for the example above and NBTI, the following times are calculated:

t1,ref = 2 y, t2,ref = 0.19 y and t3,ref = 7 h (4.25)

t1 equals t1,ref , because T1 was chosen as the reference temperature. It can be seen
that the first tuple with the high temperature dominates the degradation (the drift after

76
4.4. Characterizing the standard cells

2 y at 27 ◦C equals the drift after 7 h at 150 ◦C). Because of the identical reference
temperature, the times can now be added:
n
ttot = (4.26)
X
ti,ref
i=1

The tuple (tlif e , Tef f ) is calculated from the tuple (ttot , Tref ) by setting ttot to tlif e and
adjusting the temperature that the drift stays the same:
!!−1
1 kb ttot
Tef f = − · n · ln (4.27)
Tref Ea tlif e

In the example the effective temperature is 119 ◦C. This results in a threshold voltage
drift of 50 mV due to NBTI.
The second method works for temperature as well as voltage profiles. The drift for
every time interval is first calculated separately and then the drifts are combined. In the
example above the following threshold voltage drifts ∆Vth,i can be computed:

(2 y, 150 ◦C) : ∆Vth,1 = 49 mV


(6 y, 85 ◦C) : ∆Vth,2 = 28 mV
(2 y, 27 ◦C) : ∆Vth,3 = 7 mV

The drifts cannot simply be added because the nonlinear time dependence has to be
taken into account:
1/n n
 
1/n 1/n
∆Vth = ∆Vth,1 + ∆Vth,2 + ∆Vth,3 (4.28)
This method also results in a degradation due to NBTI of 50 mV.

4.4. Characterizing the standard cells


The completely automated characterization of the standard cells collects all the infor-
mation required for an aging analysis on gate level. For a traditional STA using a
LUT-based delay model without considering aging, the fresh delay and output slope for
every timing arc are required. Delay and slope are stored in two-dimensional LUTs de-
pendent on input slope and output load. To calculate aged gate performances additional
information is necessary for AgeGate:
• Sensitivities ∂q/∂∆p of the gate performances with respect to a parameter drift for
the canonical gate model (Equation 4.5).

• The conducting paths P AT HN BT I,i and P AT HHCI,i for all transistors of a gate
are required. This information enables the calculation of the probabilities P (B)
(Equation 4.13) and P (D) (Equation 4.18).

• The logic function and signal slope for all internal signals of multi-stage gates.
These are required to calculate the stress probabilities.

77
4. Aging-aware static timing analysis

The determination of sensitivities and paths is discussed in the next subsections. The
logic function is obtained by a structural recognition algorithm developed at the EDA
institute at TUM. The algorithm is based on a structural recognition algorithm for
analog circuits [Massier et al., 2008]. It analyzes the pull-up as well as the pull-down
network of the single gate stages and generates the logic function of all internal nodes
and the output node. The slope of internal signals is determined when the delay and
output slope of the gate is characterized. It is stored in two-dimensional LUTs dependent
on gate input slope and output load.

4.4.1. Obtaining the sensitivities


The sensitivities for the canonical gate model are obtained together with the fresh gate
performances qf resh . The adjoint sensitivity analysis [Pillage et al., 1995, chap. 9],
integrated in the SPICE simulator, is used for this purpose. It is a very efficient approach,
much faster than using finite differences for the sensitivities. For NBTI the sensitivity
of q with respect to a drift of the threshold voltage ∆Vth is required:
∂q
χq∆Vth ,n = (4.29)
∂∆Vth,n

For HCI the sensitivity of q with respect to ∆Ion is needed. Unfortunately, it can
not be determined directly by means of the adjoint sensitivity analysis, because ∆Ion
is, unlike ∆Vth , not a transistor parameter. But there is an equivalent circuit for a
degraded transistor due to HCI (see Figure 3.12(a)). The equivalent circuit can be used
to simulate an aged transistor on circuit level. It maps ∆Ion on a threshold voltage drift
∆Vth and a mobility degradation ∆µ0 . ∆Vth is realized by a voltage source VDeg and a
current controlled current source IDeg realizes the mobility degradation. This equivalent
circuit can be used to calculate the sensitivity χq∆Ion by means of the chain rule:

∂q ∂q ∂VDeg,n ∂q ∂IDeg,n
χq∆Ion ,n = = + (4.30)
∂∆Ion,n ∂VDeg,n ∂∆Ion,n ∂IDeg,n ∂∆Ion,n

The partial derivatives ∂q/∂VDeg and ∂q/∂IDeg are obtained by replacing all transistors
by their equivalent circuit using the adjoint sensitivity analysis. The remaining partial
derivatives can be derived from the equations for VDeg and IDeg .

4.4.2. Obtaining the internal gate structure


The internal gate structure determines P AT HN BT I and P AT HHCI , which are necessary
for calculating P (B) (Equation 4.13) and P (D) (Equation 4.18).
P AT HN BT I is a path from the source or drain terminal of a considered PMOS tran-
sistor to the supply voltage. For the OR3 gate in Figure 4.6, P AT HN BT I for transistor
MP C consists of the transistors MP A and MP B . To determine P AT HN BT I , just the
pull-up network is considered. A breadth-first search, starting at the source terminal
of the considered transistor, is performed to find VDD . It is important that the search
algorithm does not stop when VDD is reached, because there might be more than one

78
4.4. Characterizing the standard cells

B MP B

A MP A
C MP C
Z
MN A

MN B MN C

Figure 4.7.: Complex gate implementing the logic function z = a · (b + c).

path in a gate. An example for multiple paths is the complex gate with the logic func-
tion z = a · (b + c) shown in Figure 4.7. There exist two paths for transistor MP C . If
transistor MP B is in on-state, then the source of MP C is connected to VDD . Hence,
P AT HN BT I,1 consists just of transistor MP B . The second path P AT HN BT I,2 consists
of transistor MP A , since the drain MP C is connected to VDD if MP A is conducting.
P AT HHCI is required for HCI. It leads from VDD /ground along the considered transis-
tor to the output of the gate. P AT HHCI is determined by performing two breadth-first
searches. For a PMOS transistor, again the pull-up network is taken into account and
for an NMOS transistor the pull-down network. The first search starts at the source
terminal of the considered transistor and looks for VDD or ground, respectively. And
the second search starts at the drain terminal of the considered transistor and looks for
the gate output. It is again possible to have multiple paths. In the example with the
complex gate (Figure 4.7) two paths exist for the transistor MP A . The first path consists
of the transistors MP A and MP B and the second path consists of the transistors MP A
and MP C .
To calculate P (B) and P (D) during aging analysis, all paths P AT HN BT I,i and
P AT HHCI,i for all transistors of a gate are already determined during the characteriza-
tion of the gate and stored in the gate model.

4.4.3. Simplification of the gate model


This section is about reducing the gate model by removing LUTs, which can be neglected
because they have (almost) no effect on the aged gate performance. The advantages of
such an simplified gate model are that it needs less storage space and the aging analysis
is accelerated, because when the sensitivity is removed the corresponding parameter drift
does not have to be calculated as well. There are four LUTs for every timing arc in a
traditional LUT-based gate model (one LUT for delay/output slope for rising/falling
input transition). A NAND gate with two inputs, for instance, has eight LUTs.
AgeGate has additional LUTs for the sensitivities χqm,p and the signal slopes of in-

79
4. Aging-aware static timing analysis

ternal nets. For every nominal LUT one sensitivity per NMOS transistor χqm,∆Ion and
two sensitivities per PMOS transistor (χqm,∆Vth and χqm,∆Ion ) are required. Hence, the
AgeGate model for a two input NAND gate has 56 LUTs and an OR gate with four
inputs has even 408 LUTs.
LUTs of sensitivities that have (almost) no impact on the degradation of the gate
performance can be removed. This impact is given by:

∆q m,p = χqm,p · ∆pm (4.31)

For every sensitivity it is checked whether ∆q m,p is smaller than a specified limit. For
this purpose the drift ∆pm must be specified as well. For instance, the threshold voltage
drift of a PMOS transistor has no noteworthy impact on the gate delay for a rising input
change, because in this case the pull-down network has to recharge the output load.
If 0.1 % of the nominal gate performance is chosen as the limit and the drifts are
100 mV for ∆Vth and 20 % for ∆Ion (these drift values are much larger than what can
be observed in reality), the LUTs of the NAND gate are reduced from 56 to 39 and the
LUTs for the OR gate are reduced from 408 to 168.

4.5. Results
4.5.1. Waveform dependence of parameter drift
Transistor parameter drifts and aged signal slopes are mutually dependent. A small
experiment should show, whether it is justified to calculate the parameter drifts in the
proposed approach from fresh output slopes or if an iterative approach is beneficial.
For this purpose a NOR2 ring oscillator is simulated with RelXpert (65 nm LP, 1.7 V,
145 ◦C , 700 h). In a first run, the fresh waveforms are used to degrade the transistors. In
a second run, the aged waveforms after 700 h are used. The aged waveforms are obtained
by simulating the degraded ring oscillator from the first run. The truth should be in
between those two simulations, since in reality the waveform would degrade continuously
within the 700 h affecting the parameter drift and the drift, vice versa, affecting the
signal waveform. The degradation of the oscillator frequency is 5.35 % for fresh slopes
and 5.43 % for aged slopes (see Figure 4.8). An iterative approach would give a value in
between. Hence, there is no significant advantage of an iterative approach.
This can be explained by the fact that NBTI is a static effect and the slope of the
waveform has no impact on the degradation caused by it. Only the degradation caused
by HCI is dependent on the time the signal is in transition. However, as it can be seen
later in Section 4.5.3 NBTI is the dominant aging effect.

4.5.2. Comparison of AgeGate, circuit-level simulation and measurements


Before analyzing the ISCAS’85 test circuits, the accuracy of AgeGate is investigated. In
Figure 4.9 the degradation of a ring oscillator is determined by measurement, simula-
tion on circuit level and the proposed aging analysis approach. For the simulation on

80
4.5. Results

Figure 4.8.: Ring oscillator waveforms of fresh (leading waveform in magenta) and aged
(shifted waveforms in red and blue) simulations. The transistor drifts for
the aged simulations were determined once by the fresh waveform and the
aged waveform. Independent of which waveform was taken to determine the
drifts, the aged waveforms are almost indistinguishable

transistor level, the transistors in the transistor level netlist are replaced by the equiva-
lent circuits (see Figure 3.12(a)), the same parameter drifts as for the aging analysis on
gate level are applied and a SPICE simulation is performed. The upper diagram shows
the degradation when the device under test did not oscillate during stress. During this
static stress the device is only affected by NBTI. In the lower diagram, the device was
oscillating during stress. This time both aging effects are relevant. Simulation and aging
analysis match quite well. Measurement results were only available for the upper case.
The results show a mismatch compared to the aging analysis and the simulation. It can
be assumed that a large part of the error is caused by inaccurate degradation equations.
The degradation determined with the proposed aging analysis is a bit smaller than the
simulated degradation on transistor level. This can be explained by the linearization
of the sensitivities. As it can be seen on Figure 4.9, the degradation calculated with
linearized sensitivities is smaller than the degradation simulated on transistor level.

4.5.3. Aging analysis results


For evaluation purposes, an industrial 90 nm standard cell library is characterized. The
following use profile was chosen for the aging analysis: a lifetime of 10 y, a temperature
Tef f of 125 ◦C, and a supply voltage VDD of 1.32 V.
Figure 4.10 shows how the arrival times at the primary outputs of the benchmark
circuit c880 increase over lifetime. SP and T D values are determined by the probabilistic

81
4. Aging-aware static timing analysis

no oscillation during stress


6

frequency degradation [%]


5
4
3
2
1
0
oscillation during stress
12
measurement
frequency degradation [%]

10
simulation
8 aging analysis
6
4
2
0 5h 144h 500h
stress duration

Figure 4.9.: Frequency degradation of a 65 nm inverter ring oscillator stressed for 500 h
at defined stress conditions.

method for SP = 0.2 and T D = 0.2 at the primary inputs. The figure indicates that
it is not enough just to consider the most critical nominal path during aging analysis
because the order of the arrival times can change over lifetime (signals 866 and 874).
It is difficult to compare AgeGate to the different state-of-the-art aging-aware gate
models, because the published results are based on different technologies. Hence, espe-
cially the degradation equations are different. Instead, it is shown how the accuracy of
the aging analysis is increased by the special features of the proposed aging-aware gate
model. The special features are: consideration of NBTI and HCI, computation of aged
output slopes and calculation of individual parameter drifts.
In Table 4.2 the path delay degradation ∆delay of the critical path is depicted for
a worst-case analysis of the ISCAS’85 benchmark circuits. The nominal path delays
without aging (NOM) are given as a reference. The degradation due to both effects
(BOTH) as well as due to just one effect (NBTI, HCI) is analyzed. When both effects
are considered, the degradation of the critical path delay is between 12.0 % and 15.4 %.
The dominant aging effect for this technology and the chosen use profile is NBTI, with
a performance degradation of up to 12.3 %. In the last column (NO_SLP) values for
∆delay are given if no aged output slope is computed. By comparing ∆delay with and
without considering the aged output slope, it can be seen that not considering aged
output slopes results in an underestimation of the degradation by 24 % on average.
For the column BOTH also the run time on an Opteron workstation with 2.4 GHz
and 2 GB RAM is given in parenthesis. It can be seen that the proposed model can be
evaluated quickly.
For the diagram in Figure 4.11 an aging analysis with individual transistor drifts is

82
4.5. Results

Figure 4.10.: The five slowest output arrival times over lifetime for ISCAS’85 circuit
c880. Individual workloads for the gates were obtained for SP = 0.2 and
T D = 0.2 at primary inputs. Signals 866 and 874 change order with time.

NOM HCI NBTI BOTH NO_SLP


[ns] [%] [%] [%] ([s]) [%]
c17 0.18 4.7 8.4 13.0 (2.57) 9.8
c432 2.30 4.0 10.9 15.4 (4.19) 11.1
c499 1.51 3.8 11.3 15.2 (6.52) 11.4
c880 1.88 3.4 8.5 12.0 (6.62) 10.0
c1355 1.81 4.1 9.8 13.4 (8.61) 10.2
c1908 2.50 3.2 10.2 13.8 (9.96) 10.0
c2670 2.87 2.6 10.1 12.8 (17.86) 10.2
c3540 3.45 2.7 9.9 13.0 (19.61) 10.2
c5315 3.12 2.6 10.0 12.8 (26.97) 10.4
c6288 8.88 1.7 12.3 14.2 (29.35) 9.0
c7552 2.61 3.0 9.7 12.9 (34.07) 9.9
Ø 2.83 3.2 10.1 13.5 (15.12) 10.2

Table 4.2.: Degradation of critical path delays for different analyzer settings.

83
4. Aging-aware static timing analysis

Figure 4.11.: Comparison of analysis with and without individual transistor drifts.

compared to an aging analysis where it is assumed that all transistors of a gate degrade
as much as the worst amongst them. The diagram shows the benefit of calculating
individual transistor drifts. Without individual transistor drifts the mean degradation
is overestimated by 20 %.

4.6. Summary
An aging analysis flow on gate level capable of determining the impact of the two dom-
inant drift-related aging effects on circuit timing was introduced. The developed aging-
aware gate model, AgeGate, consists of a canonical gate model, technology specific
degradation equations, and information about the internal gate structure. What dis-
tinguishes AgeGate from existing aging-aware gate models is that it considers the aged
output slope, it takes NBTI and HCI into account, and it calculates individual transistor
drifts. The results show that both aging effects are relevant, not calculating an aged
output slope underestimates the performance degradation by 24 %, and not computing
individual transistor drifts overestimates the degradation by 20 %.

84
5. Identifying possible critical paths in aged
circuits
When the operating conditions over lifetime and the individual workloads of the gates
are known, the degraded circuit delay and the critical path causing this delay can be
determined by the aging-aware timing analysis described in Chapter 4. If the operating
conditions and workload are not (exactly) known, just a worst case analysis can be
performed (see Section 4.1). Due to this uncertainty, multiple possible critical paths
(PCPs) may exist. This chapter is about identifying these PCPs.
A PCP is a path that is the critical path of a circuit for a specific combination of
operating conditions and workload. However, it would be too complex and inefficient to
identify PCPs with this definition. Hence, a weakened definition is used instead:
A possible critical path (PCP) is a path that cannot be excluded from the paths that
become the critical path of a degraded circuit for a specific combination of temperature
Tef f , supply voltage Vef f , workload of the input signals and lifetime tlif e .
This definition reflects how PCPs are determined: Those paths are identified that are
for sure no PCPs and the remaining paths are considered as PCPs.
Several applications arise from knowing the PCPs:
• The TG of a combinational circuit can be reduced until it just contains PCPs.
This reduced TG can be used as a timing model for modules, such as adders or
multipliers. Since such a timing model is generated once and can be used whenever
the module is instantiated in a more complex hierarchical design, it accelerates the
aging-aware timing verification of complex digital circuits compared to an analysis
on gate level.
• PCPs can also be utilized to monitor a system during its lifetime. The delay of
the PCPs is determined in periodic intervals and countermeasures are taken if the
path delay is no longer within the safe operating range. Such an adaptive system
can react, for instance, by reducing the clock frequency of the aged circuit.
• PCPs are also beneficial for optimizing a circuit to minimize the circuit perfor-
mance loss due to aging. Existing optimization approaches [Wu and Marculescu,
2009; Wang et al., 2009a,b; Bild et al., 2009] depend on knowing in advance the
gates that degrade the most. Hence, the operating conditions and the workload
of a circuit must be known. If the operating conditions and the workload are
unknown, the PCPs can help to optimize a circuit nevertheless. They yield the
information which gates might become critical. By combining this information
and the information which gates have a huge impact on circuit performance, those
gates can be identified that should be protected from excessive degradation.

85
5. Identifying possible critical paths in aged circuits

• Finally, PCPs are required by already published papers. Chen et al. [2011] pro-
pose a path-based aging-aware timing analysis. Wang et al. [2008] introduce node
criticality computation. By protecting the identified critical nodes the delay degra-
dation can be reduced. Both approaches need PCPs. However, they just take the
upper X % (e.g., 10 %) of the paths with the longest aged delay. This is quite
inaccurate and the number of PCPs could either be over- or underestimated.

The remainder of this chapter is organized as follows: The next section describes
prerequisites for the proposed approach. Then the method to identify the PCPs is
introduced (see Section 5.2). This method is extended in Section 5.3 by considering that
some paths must degrade even if the workload is unknown. In Section 5.4 it is described
how process variations and variations of the operating conditions can be considered as
well. Section 5.5 introduces two applications of PCPs, the generation of aging-aware
timing models and the benefit of PCPs for testing aged circuits. Results follow in
Section 5.6 and the chapter is summarized in Section 5.7.

5.1. Prerequisites
Without exact operating conditions and workloads, the degraded gate delay cannot be
exactly determined. However, it is possible to determine an interval for the gate delay.
The lower bound is the fresh gate delay (df resh ), since aging always increases the gate
delay. The upper bound is the maximal aged gate delay (daged ). To determine the upper
bound, a validity region must be defined by specifying maximal values for the effective
temperature Tef f , the effective supply voltage Vef f and the lifetime itself.
The foundation to identify the PCPs is a timing graph as described in Section 2.1.2.
However, this time the edge weights, which are given by the gate delays, are not deter-
ministic quantities but intervals. For this reason, all other timing quantities (q) (e.g AT,
D2S, SLACK, . . . ) are intervals as well. The intervals are stored as tuples. The first
element of the tuple is the fresh value qf resh and the second element is the aged value
qaged . An example for a TG with annotated nodes and edges is given in Figure 5.1.
The timing quantities that are stored at the nodes and edges can change during the
computation of the PCPs because elements of the timing graph that do not belong to
a PCP are removed. An incremental timing analysis, as described in Section 2.1.3, is
used to update the changed timing quantities whenever they are read. Every tuple has
a valid flag. When the timing quantity is read, it is first checked whether it is valid. If
not, the timing quantity is updated, stored and the valid flag is set again.

5.2. Identification of PCPs


The timing graph is annotated with tuples for all required timing quantities. These tuples
represent the intervals the timing quantities are in for the specified validity region. The
reduction steps introduced in this section determine the PCPs. The following interval

86
5.2. Identification of PCPs

Figure 5.1.: TG annotated with arrival time and delay to sink at every node.

operations are required for that purpose:

sum(a, b) = sum([af resh , aaged ], [bf resh , baged ]) := [af resh + bf resh , aaged + baged ] (5.1)
max(a, b) := [max(af resh , bf resh ), max(aaged , baged )] (5.2)
a < b := aaged < bf resh (5.3)

Operations for subtraction, min and greater than (>) can be defined correspondingly.
Even though the goal is to determine the PCPs, it is crucial that the reduction steps
do not depend on enumerating every single path in the timing graph and decide whether
it is a PCP or not. This would make it impossible to determine the PCPs for circuits of
industrial relevance, since the number of paths increases exponentially with the number
of nodes. Two criteria are used to determine whether a path is a PCP or not:

Criterion 1: A path must have a maximal aged path delay Daged greater than the critical
path delay of the fresh circuit D(Pcrit )f resh (or just Dcrit,f resh ). Otherwise it is
not a PCP because Pcrit will always have a greater path delay.

Criterion 2: Even if a path A has an aged path delay greater than Dcrit,f resh , it might
not be a PCP. If there is another path B that has a greater path delay than A for
all possible operating conditions and workloads, path A is not a PCP.

5.2.1. Slack reduction step


The slack reduction step checks the first criterion. A positive slack (see Equation 2.6)
at a node n indicates that the signal arrives soon enough at n to arrive at T before the
specified required time REQT(T ). By setting the required time at T to Dcrit,f resh , a
node n with a positive aged slack (SLACK(n)aged ) indicates that all aged paths through
n arrive at T before Dcrit,f resh . Hence, no path through this node n is a PCP and the
node can be removed from the timing graph. This is checked for all nodes of the timing
graph.

87
5. Identifying possible critical paths in aged circuits

Algorithm 3: Slack reduction step


/* Remove nodes with positive aged slack */
foreach node in TG do
if SLACK(node)aged > 0 then
clean_remove_node(node)
end
end

The slack reduction step (see Algorithm 3) has a time complexity of O(n), with n
being the number of nodes in the timing graph. The nodes are not simply removed from
the timing graph, instead the function clean_remove_node (on page 94) is called. This
function checks if additional nodes and edges can be removed from the timing graph and
assures that the remaining graph is a valid TG (see Section 5.2.6).

5.2.2. Path delay reduction step


The second reduction step also checks whether the aged delay of a path is less than
Dcrit,f resh (Criterion 1). This time it is checked whether an edge and not a node can be
removed. The largest path delay from S to T along a given edge (u, v) can be calculated
as follows:
D = AT(u) + d((u, v)) + D2S(v) (5.4)
AT(u) gives the maximal delay of all paths to the node u, d((u, v)) is the edge delay
and D2S(v) is the maximal delay of all paths from v to T . If the path delay interval D
is less than Dcrit , this edge can be removed, because no aged path through this edge is
slower than Dcrit,f resh . This is checked for all edges of the timing graph.
An example is given in Figure 5.2. The path delay interval of path P is calculated
with Equation 5.4 to [6, 9]. The delay of the critical fresh path is [10, 12]. Hence, P is
not a PCP and the edge (b, d) can be removed.
This reduction step has a time complexity of O(e) with e being the number of edges
in the timing graph. The pseudo code is given in Algorithm 4.

5.2.3. Arrival time reduction step


This reduction step checks for conditions described in Criterion 2. The arrival time
at a node v is determined by computing the arrival times along all incoming edges
and calculating the maximum of them. If the arrival time interval along an edge (u, v)
(AT(u)+d((u, v))) is smaller than the arrival time after the max-operation (AT(v)), then
signals along (u, v) never determine the arrival time at v and the edge can be removed.
This is done for all edges in the graph.
This can also be explained by longest path intervals (see Figure 5.3). This interpre-
tation is beneficial for the common edge reduction step discussed in Section 5.2.5. A
longest path interval determines every fresh or aged arrival time (or delay to sink). Two
longest path segments are given. Path segment V (dashed line) is the path to v that

88
5.2. Identification of PCPs

Algorithm 4: Path delay reduction step


/* Remove edges with max path delay along this edge lesser than
required time */
foreach node in TG do
foreach suc in successorsnode do
/* checked edge is (node, suc) */
maxP athDelayOverEdge ← AT(node) + d((node, suc)) + D2S(suc) ;
if maxPathDelayOverEdge < REQT(T ) then
clean_remove_edge(node, suc) ;
end
end
end

Figure 5.2.: Illustration of path delay reduction step. Edge (b, d) can be removed because
the delay of path P is less than the delay of path Pother .

89
5. Identifying possible critical paths in aged circuits

Figure 5.3.: Illustration of arrival time reduction step. Edge (d, e) can be removed be-
cause arrival time interval along edge (d, e) is less than the arrival time at e
after the max-operation.

determines the fresh arrival time (AT(v)f resh ) and path segment U (solid line) is the
path to v that determines the maximal aged arrival time at v along the edge (u, v).
These path segments can easily be obtained because for each arrival time it is stored
from which edge it results. If the path delay interval of segment U is less than the
interval of segment V , then the edge (u, v) can be removed.
The arrival time reduction step (see Algorithm 5) has a time complexity of O(e), with
e being the number of edges in the timing graph.

Algorithm 5: Arrival time reduction step


/* Remove edges that do not contribute to atime at a node */
foreach node in TG do
foreach pre in predecessors(node) do
atimeOverP re ← AT(pre) + d((pre, node)) ;
if AT(node) > atimeOverPre then
clean_remove_edge(pre, node) ;
end
end
end

5.2.4. Delay to sink reduction step

This reduction step is almost equivalent to the arrival time reduction step. This time
not the delay from S to a node is considered but the delay from the node to T (D2S).
D2S is determined by computing the delay to T for all outgoing edges of a node u and
computing the maximum of them. If the delay to T along an edge (u, v) is less than the
delay to T at u, the edge can be removed (see Algorithm 6).

90
5.2. Identification of PCPs

Algorithm 6: Delay to sink reduction step


/* Remove edges that do not contribute to delay to sink at a node */
foreach node in TG do
foreach suc in succsessors(node) do
d2sinkOverSuc ← d((node, suc)) + D2S(suc) ;
if D2S(node) > d2sinkOverSuc then
clean_remove_edge(node, suc) ;
end
end
end

Figure 5.4.: Example for the common edge reduction step.

5.2.5. Common edge reduction step


In Criterion 2 two paths are compared to each other. For paths which share common
edges this comparison of two path intervals is too pessimistic as the following example
illustrates (see Figure 5.4): Path V (dashed line) has a delay of [7, 13] and the delay of
path U (solid line) is [4, 8]. V is a PCP and U is also considered a PCP because U is
not slower than V (upper bound of U is not less than lower bound of V ). Path V and
U have a common edge (a, b). For the calculation of the lower bound of path V all the
fresh gate delays along the path are added and for the calculation of the upper bound of
path U all aged gate delays are added. This means that for the common edge (a, b) there
is once assumed the upper bound of the gate delay and once the lower bound. This is
impossible. Although the actual gate delay is unknown during the PCP identification,
a timing arc must have the same delay independent of the path that is investigated. By
assuming an identical delay for common edges (the aged delay of common edges is set
to fresh edge delay resulting in a fixed edge delay and not an interval), the new path
delays are: D(V ) = [7, 10] and D(U ) = [4, 5]. The path delay interval of U is now less
than the interval of V , hence, U is not a PCP.
This example shows that whenever two paths, that share common edges, are compared
an identical edge delay has to be assumed for common edges in order not to be overly
pessimistic.

91
5. Identifying possible critical paths in aged circuits

But how to take common edges into account? An exact method is the following: First,
all paths, which have an aged path delay slower than Dcrit,f resh , are enumerated. Then,
the delays of two paths are compared. For common edges, the fresh edge delay is assumed
for the aged edge delay as well. If one path interval is less than the other interval, this
path is not a PCP and can be removed. This exact method has an exponential time
complexity and cannot be used for complex circuits.
Baba and Mitra [2009] propose a more efficient method to consider common edges.
This method extends the arrival time reduction step. Hence, it is block-based not path-
based like the exact method. The arrival time reduction step removes an edge (u, v) if
the arrival time at v along this edge is less than the resulting arrival time at v after
the max-operation. As shown in the arrival time reduction step, this can be interpreted
as comparing two path segments. The longest path segment V consists of all edges
that determine the fresh arrival time at v after the max-operation and the longest path
segment U determines the aged arrival time at v along edge (u, v). If V and U have
common edges, then the same edge delay for common edges must be assumed.
However, just adding up the updated edge delays along the path segments is not
enough. By changing the common gate delays, U and V itself could have changed. In
the example (see Figure 5.4) by setting the aged delay of (a, b) to 2, the aged arrival time
at b is now determined by the second incoming edge (f, b). In [Baba and Mitra, 2009]
this is solved by setting the aged delay of common edges to the fresh value and running
the STA again to determine the changed arrival times. Hence, whenever common edges
are detected the STA is performed again with changed edge delays to decide whether an
edge can be removed or not1 . This takes a lot of time as can be seen from the results of
[Baba and Mitra, 2009].
In the proposed approach it is not necessary to run the STA again. This is possible
because the join-slacks (see Section 2.1.5) indicate how far the gate delay can be de-
creased before the arrival time at a node is determined by another edge. In the example,
the aged join-slack between edge (a, b) and edge (f, b) indicates that when the aged gate
delay is reduced by more than 2 time units, the arrival time is determined by edge (f, b).
For the path segment U two path delay intervals are calculated; the path segment delay
D(U ) when common edges are not considered and the path segment delay D(U ) when
for common edges the fresh gate delay is used. To decide whether an edge (u, v) can be
removed, the following cases have to be distinguished (see Figure 5.6):
1. If D(U ) < D(V ), then remove common edge:
Even without considering common edges, the edge (u, v) can be removed.
2. If D(U ) not < D(V ), then do not remove common edge:
The edge cannot be removed, because even if the fresh delay is used for common
edges the path delay is still too large.
3. Else (D(U ) < D(V ) and D(U ) not < D(V )), it depends on U 0 :
If Vf resh is between D(U )aged and D(U )aged , it depends on the path segment U 0
1
After that the common edge delays have to be reset and the STA has to be run once more to get back
the original state of the timing graph.

92
5.2. Identification of PCPs

with the next smaller delay than U whether (u, v) can be removed or not.
a) If D(U 0 ) < D(V ), then remove common edge:
This is like case 1. From U it cannot be decided whether (u, v) can be removed
but from U 0 .
b) If D(U 0 ) not < D(V ), then do not remove common edge:
This is like case 2. From U it cannot be decide whether (u, v) can be removed
but from U 0 .
c) If D(U 0 ) < D(V ) and D(U 0 ) not < D(V ), it depends on next U 0 :
Like case 3. Look at the U 0 with the next smaller delay.
d) If no U 0 , then remove common edge:
If there is no path segment U 0 with a smaller delay than U , then the delay
D(U ) can be assumed and the edge is removed.

The pseudo code for this reduction step is given in Algorithm 7. The only differ-
ence compared to the Algorithm 5 for the arrival time reduction step is the function
edge_can_be_removed, which checks for the different cases given above that have to be
considered.

Algorithm 7: Common edge reduction step


/* Remove edges that do not contribute to atime at a node (common
edges considered) */
foreach node in TG do
foreach pre in predecessors(node) do
atimeOverP re ← AT(pre)+d((pre, node)) ;
if edge_can_be_removed(pre, node) then
clean_remove_edge(pre, node) ;
end
end
end

The delay to sink reduction step can be extended in a similar way to take common
edges into account. In this case the branch-slacks are required to iterate over the path
segments from a node to T .
With the exact method more edges can be removed, because all paths that share
common edges are compared to each other. The example in Figure 5.5 illustrates the
difference: Let’s assume path A (solid line) and B (dashed line) are PCPs, but path
C (dotted line) is not a PCP because the path delay interval of C is smaller than the
interval of A when common edges are considered. However, path C is not slower than
path B. When just the longest path segments at x are compared to each other, A and
C are never compared and path C is not removed from the PCPs.
Whenever case 3.c is detected, the next longest path segment U 0 must be determined.
In order not to have a worst-case time complexity dependent on the number of paths

93
5. Identifying possible critical paths in aged circuits

Figure 5.5.: Example that shows difference between proposed and exact method for com-
mon edges.

in the timing graph, a maximal number N of paths U 0 that should be determined is


specified. If the considered edge cannot be removed after N path segments are checked,
then it is assumed that the edge cannot be removed. This way the worst-case time
complexity just depends on the number of edges e in the timing graph, hence, O(e).

5.2.6. Removing edges and nodes


When a reduction step detects that a node or an edge can be removed, often additional
nodes and edges have to be removed as well from the graph to have a valid timing graph
again.
Whenever an edge (u, v) is removed by clean_remove_edge, it is checked whether the
node u has any additional successors. If not, this node is removed as well by calling
clean_remove_node. It is also checked whether the node v has any additional predeces-
sors. Otherwise, v is removed as well by calling clean_remove_node.
The function clean_remove_node removes not just the node itself, but additionally
all the edges heading to or leaving from this node.
By removing the edge (u, v) in Figure 5.3 the node u can also be removed, because u
has no additional successors. When u is removed, all its incoming edges are removed as
well, hence, (b, u) is removed.

5.3. Realistic aged path delays


So far, intervals are used for the gate delay, because the specific delay of an aged gate
is unknown (since operating conditions and workload over lifetime are unknown). An
interval for the path delay is calculated by adding up the gate delays along the path. But
is it really possible that along a path all gates degrade maximal (upper bound of path
delay interval) or all gates do not degrade at all (lower bound of path delay interval)?

94
5.3. Realistic aged path delays

U'
u

U v

V V

Path V

Case 1: Remove edge (u,v)


Path U

Case 2: Not remove edge (u,v)


Path U

Case 3: Depends on U'


Path U

Case 3.a: Remove edge (u,v)


Path U
Path U'

Case 3.b: Not remove edge (u,v)


Path U
Path U'

Case 3.c: Depends on next smaller U'


Path U
Path U'

Case 3.d: Remove edge (u,v)


Path U
no Path U'

Figure 5.6.: Graphical representation of the common edge reduction step cases. Edge
(u, v) can be removed if aged delay of path U is smaller than fresh delay of
path V .

95
5. Identifying possible critical paths in aged circuits

In this section it is investigated whether it is justified to use intervals for gate and path
delays, or not2 .
In the following, it is shown that intervals for the gate delay are justified. It is shown
as well that the upper bound of the path delay interval is realistic as long as a given
path is statically sensitizable. The lower bound of a path delay can also be reached as
long as just one input transition is considered. But the lower bound of the interval is
often too pessimistic if for a given path the maximum of the delays for a rising and a
falling input transition are considered. This can be used to further reduce the number
of PCPs.

5.3.1. Gate delay interval


The degradation of a gate strongly depends on the workload. NBTI only degrades PMOS
transistors. Hence, only the gate delay for a falling input transition degrades. For NBTI
the workload impact is defined by the signal probability at the gate inputs. If SP is 0
at an inverter input, the inverter delay degrades maximal. On the other hand, if SP is
1, the delay will not degrade at all. A NOR or NAND gate also degrades maximal when
SP at the inputs is 0 and does not degrade if SP is 1. Hence, it is justified to use an
interval for the gate delay because the lower and the upper bound of the interval can be
reached.

5.3.2. Realistic aged path delays for an inverter chain


Before investigating a general path, let’s have a look at an inverter chain. Figure 5.7
shows the dependence of the delay of an inverter chain on the signal probability SP IN at
the input IN. The aged path delays for a rising input transition D(Pr )aged and a falling
input transition D(Pf )aged are shown (solid lines). Pf degrades the most, when SP IN
is 0. Then, SP at all gates with a falling input transition is 0 and the gates degrade
maximal. On the other hand, if SP IN is 1, the aged path delay is the delay of the fresh
inverter chain, because SP is 1 for all gates with a falling input transition. For Pr it
is the exact opposite. No degradation when SP IN is 0 and maximal degradation when
SP IN is 1.
The path delay of interest is the maximum of the path delays for a rising and a falling
transition:
max(D(Pr )aged , D(Pf )aged ) (5.5)
2
For the investigation of the realistic aged path delay, it is assumed that just the workload is unknown
and (at least lower bounds for) Tef f , Vef f and tlif e are known. Otherwise the lower bound of the
path delay interval is of course equal to the fresh path delay because a lifetime of 0 could be assumed
if tlif e is unknown.
At the moment just NBTI is considered. The reason is that NBTI depends on the static signal
probability and, although, the actual SP is unknown, it must be between 0 % and 100 %. Further-
more, the results in Chapter 4 show that NBTI is the dominant aging effect.
Nevertheless, it should be possible to consider HCI as well. To consider HCI, an upper bound for
TD has to be defined. For a glitch free circuit, for instance, the upper bound for TD is 1.

96
5.3. Realistic aged path delays

input transition

   
IN OUT
   

Figure 5.7.: Path delay of an inverter chain (10 inverters) with respect to SP at the
input.

The inverter chain can still degrade maximal (for SP IN = 1 or SP IN = 0). However,
it is no longer possible that the path does not degrade at all. The gate delays for one
transition do not degrade when the delays for the opposite transition degrade the most
and vice versa. The inverter chain now degrades minimal for SP IN = 0.5 (intersection
of solid lines), but the minimal degradation is already 85 % of the maximal degradation.

5.3.3. Maximal aged path delay of a general path


A (general) path (see Figure 5.8) consists of an input, an output, the gates along the
path and side inputs. Side inputs are gate inputs that are not on the path. Just single
staged gates (e.g., inverter, NAND and NOR) are considered. This is no limitation,
because complex gates are set up from those basic gates. The signal probabilities of the
gates along a path are interdependent. They depend on the logic interconnection and
the SP at the PIs.
Like an inverter chain, a general path can degrade maximal if it can be sensitized
statically. A path is statically sensitizable if at least one input vector exists that sets all
side inputs of the gates to their non-controlling value.
A path can be specified by the nodes along the path in the timing graph (0, 1, . . . , m).

97
5. Identifying possible critical paths in aged circuits

1 0

Figure 5.8.: A general path

Hence, the input and output of a gate are two consecutive nodes i and i+1. fi denotes the
logic function dependent on the primary inputs of a node i is fi . The static sensitization
condition of a logic gate is given by the Boolean difference:
∂fi+1
= fi+1fi ⊕ fi+1f (5.6)
∂fi i

The sensitization condition specifies the input vectors for which a transition at the gate
input propagates to the gate output. A path is statically sensitizable if all the gates
along the path fulfill the sensitization condition:
m−1
∂fi+1
=1 (5.7)
Y

i=0
∂fi

If a path is statically sensitizable it behaves like an inverter chain and it is possible


that SP is 0 for every on-path gate input with a falling input transition. For a NOR
gate the non-controlling value is logic “0”. Hence, all side inputs are at logic “0” as
well and the gate degrades maximal (this is necessary due to the serial connection of the
PMOS transistors in a NOR gate). But for a NAND gate all side inputs are forced to
logic “1” to statically sensitize it. Nevertheless, the timing arc of the NAND gate that
is on the path still degrades maximal because of the parallel connection of the PMOS
transistors. Hence, if the path is statically sensitizable, the upper bound of a path delay
interval is realistic.

5.3.4. Minimal aged path delay for a general path


Like for an inverter chain, it is not possible for a general path that the gates do not
degrade when both input transitions are considered simultaneously, because the aged
path delay for a rising and a falling input transition compete with each other (unless the
path consists just of NAND gates). To determine the minimal delay of an aged path,
an optimization problem is formulated. The task is to minimize the maximum of both
path delays for a rising Pr and a falling Pf input transition:

minimize max(D(Pr ), D(Pf )) (5.8)


SP

First constraint for the optimization is that the signal probabilities are between 0 % and
100 %:
s.t. 0 ≤ SP ≤ 1 (5.9)

98
5.3. Realistic aged path delays

An exact approach to this minimization problem would be to use the signal proba-
bilities at the PIs as free variables. The aged delay of the considered path depends on
the signal probabilities of the on-path gate inputs and the off-path gate inputs (side
inputs). The relation between the signal probabilities at the gate inputs and the signal
probabilities at the PIs is given by the logic interconnection. Considering this during
the optimization would lead to a complex nonlinear optimization problem with multiple
local minima.
To find the global minimum efficiently, the problem is simplified and a valid lower
bound for the minimization problem is obtained. It is assumed that the signal probabil-
ities at the side inputs can be chosen in a way that the aged gate delay becomes minimal
without considering the logic interconnection. Only the logic interconnection of the path
itself is considered, the logic interconnection of the rest of the circuit is neglected. This
enables us to minimize the aged path delay further than would be possible without this
simplification. Hence, one gets a valid lower bound of the minimal aged path delay.
The free variables SP are now only the signal probabilities at the on-path gate inputs.
Hence, the path delay dependent on SP is required.
First, the gate delay degradation ∆d of an edge (i, o) for a falling input transition
depends on SP i of the on-path gate input i:

∆d((i, o))f = ki · (1 − SP i )n (5.10)

n is the time exponent given in the degradation equation 3.7. The factor ki combines the
other dependencies (operating conditions, lifetime) which are fixed for this optimization.
The path delays for a rising and a falling input transition can now be written as:

D(Pr ) = D(Pr )f resh + kl · (1 − SP l )n (5.11)


X

l∈Nr

D(Pf ) = D(Pf )f resh + kl · (1 − SP l )n (5.12)


X

l∈Nf

SP l is the signal probability at the node l. Nr (Nf ) are sets of gate inputs along the
path which have a falling input transition for a rising (falling) input at the path input.
Additional constraints consider that the values for SP cannot be chosen freely, since
the signal probability at a gate output depends on the signal probabilities at the gate
inputs. For an inverter, the signal probability at the output SP o is given by 1 − SP i at
the input. For a NOR gate with two inputs, the signal probability at the output SP o
depends on both inputs SP i and SP j :

SP o = (1 − SP i ) · (1 − SP j ) (5.13)

Let’s assume i is the on-path input and j is the side input. SP j is not a free variable for
the optimization, but it affects SP o which is again a free variable for the optimization.
This can be considered by solving (5.13) for SP j :

SP o
SP j = 1 − (5.14)
1 − SP i

99
5. Identifying possible critical paths in aged circuits

1
NAND

SPo

IN
V
NOR
0 SPi 1

Figure 5.9.: Graphical representation of the constraints for the gate types.

By taking into account that SP j is between 0 and 1, the following two relations between
SP i and SP o can be obtained:

SP o
0≤1− (5.15)
1 − SP i
SP o
1≥1− (5.16)
1 − SP i

From 5.15 the following constraint for a NOR gate can be derived:

0 ≤ SP o ≤ 1 − SP i , if (i,o) is a NOR gate (5.17)

For NAND gates and inverters similar constraints arise:

1 ≥ SP o ≥ 1 − SP i , if (i,o) is a NAND gate (5.18)


SP o = 1 − SP i , if (i,o) is an inverter (5.19)

In Appendix A it is shown how the constraint for the NAND gate is derived and that
these constraints are also valid if a NAND or NOR gate has more than two inputs.
The diagram in Figure 5.9 shows the constraints for the gate types graphically. The
optimization tries to choose the SP s in such a way that the gates degrade as little as
possible. Hence, the SP at the gate input should be 1. The signal probability at the
gate output should also be 1, because the gate output is the input of the succeeding
gate. However for an inverter, having a SP of 1 at the input means that the SP at the
output is 0. The inverter does not degrade but the succeeding gate degrades maximal
(this increases the path delay for the opposite transition at the input). The same is true
for a NOR gate. If the SP at the input is 1, then the SP at the output is 0. Only for a
NAND gate it is possible to have a SP of 1 at the input and the output.
The equality and inequality constraints ( 5.9, 5.17, 5.18, 5.19) are linear but the cost
function ( 5.8, 5.11) is nonlinear. Unfortunately, this nonlinear optimization problem

100
5.3. Realistic aged path delays

still has multiple local minima. Due to that the optimization problem was transformed
into a linear optimization problem by setting the time exponent n to 1:

D(Pr ) ≈ D(Pr )f resh + kl · (1 − SPl ) (5.20)


X

l∈Nr

D(Pf ) ≈ D(Pf )f resh + kl · (1 − SPl ) (5.21)


X

l∈Nf

To linearize the max operation in 5.8 a slack variable s is introduced:

minimize max(D(Pr ), D(Pf )) = minimize s (5.22)


SP s,SP

s.t. s ≥ D(Pr )
s ≥ D(Pf ))

Now the minimization problem can be solved efficiently. The solution of this linear
problem is still a valid lower bound for the minimal path delay. This can be seen by
looking once again at Figure 5.7. Shown are the exact path delays (solid lines) as
well as the linearized path delays (dashed lines). The intersection of both dashed lines
is the minimum of the maximum of both path delays. The minimal aged path delay
degradation after linearization is 50 % of the maximal aged path degradation, compared
to 85 % in the exact case. Hence, it is a valid lower bound.
The degradation of the gate delay in Equation 5.10 is just dependent on the signal
probability of the on-path input (switching input). This is correct for an inverter because
it has just one input. For a NAND gate it is correct as well, since the delay degradation
of the timing arc from the switching input to the output (almost) entirely depends on
the signal probability at the switching input due to the parallel connection of the PMOS
transistors. However for a NOR gate, this is not the case. The PMOS transistors are
connected in series. If the PMOS transistor that is nearest to the supply voltage is
connected to an input with a SP of 1, then the gate does not degrade. In the path delay
equations (5.20, 5.21) this is considered by removing those NOR gates from the sets Nr
and Nf where the PMOS transistor of the switching input is not directly connected to
the supply voltage. Otherwise, a side input is connected to the PMOS transistor that is
directly connected to the supply voltage and the signal probability of this input can be
chosen freely since the interconnection of the rest of the circuit is neglected.

5.3.5. Minimal aged circuit delay


The minimal aged path delay can now be used to determine a minimal aged circuit delay.
However, it is not enough to determine the minimal aged path delay just for the path
with the largest maximal aged path delay. Another path could have a larger minimal
aged path delay.
An exact method would be to determine the minimal aged path delay for all the
paths with a maximal aged path delay greater than D(Pcrit )f resh . It is enough to con-
sider paths with a maximal path delay greater than D(Pcrit )f resh , because paths with a

101
5. Identifying possible critical paths in aged circuits

slower maximal aged path delay will never have a minimal aged path delay greater than
D(Pcrit )f resh .
The number of paths that have to be considered in the exact method might be too
many. Instead, again a lower bound for the minimal aged circuit delay is determined by
obtaining the minimal aged path delay of the N slowest paths and taking the maximum
of them.

5.3.6. Use of minimal aged circuit delay in reduction steps


The minimal circuit delay can now be used to reduce the number of identified PCPs.
Criterion 1 says that a path is only a possible critical path if the aged path delay is not
less than D(Pcrit )f resh . This is a necessary condition but the criterion can be refined:
Criterion 1*: A path must have a maximal aged path delay Daged greater than the
minimal aged circuit delay. Otherwise it is not a PCP because the path defining
the minimal aged circuit delay will always have a greater path delay.
The minimal aged circuit delay is then used in the slack and path delay reduction step
instead of D(Pcrit )f resh .

5.3.7. Wrap-up
This section was about investigating whether intervals for the gate and path delays are
justified. It was shown that an interval for the aged gate delay is justified and the
upper bound of the aged path delay interval is justified as well if the path is statically
sensitizable.
The lower bound of the aged path delay interval is equal to the fresh path delay.
However, for many paths the minimal aged path delay is unequal to the fresh path
delay if the maximum of the path delay for both input transitions is considered. This
is because due to the inverting characteristic of CMOS logic it is not possible that the
gate itself and its succeeding gate do not degrade (if the gate is an inverter or a NOR
gate).
An optimization problem was formulated to obtain the minimal aged path delay. The
optimization problem was too complex to solve it exactly. By simplifying3 and linearizing
the optimization problem, it could be efficiently solved. The solution is still a valid lower
bound for the minimal aged circuit delay.
From the minimal aged path delay a minimal aged circuit delay can be determined.
This minimal aged circuit delay could be used to further reduce the number of PCPs by
refining the Condition 1.

5.4. Considering process variations


This section shows how SSTA and aging-aware timing analysis can be combined to
consider the impact of process variation on PCPs.
3
SP s at the side inputs of the considered path can be chosen independently of one another

102
5.4. Considering process variations

Besides aging, process variation is another limiting factor for circuit reliability. So
far, process variation was not considered and deterministic values are assumed for the
gate delays. However, due to process variation even the fresh gate delays cannot be
determined exactly.
Until recently, global process variations and uncertainties of the current operating
conditions (Tcurr and Vcurr ) are considered by corner cases. Corner cases can be used as
well to take global process variation and uncertainties into account when the PCPs are
determined: All PCPs are determined by obtaining the PCPs for the different corner
cases and computing the union of them.
Due to ongoing miniaturization, local process variation within a single chip has in-
creased so much that it can no longer be neglected. Local variations can not be con-
sidered with corner cases. For that purpose, SSTA has been developed which models
timing quantities (e.g., delay, arrival time, slack) as probability distributions.
Figure 5.10 illustrates the idea of how aging analysis and SSTA can be combined. In
the nominal case, timing quantities are deterministic values. On the one hand, there is
aging. Due to aging, those deterministic timing quantities become time dependent. To
identify the PCPs, for each timing quantity an interval is considered. On the other hand,
there is process variation. It results in a probability distribution for timing quantities.
Combining aging and process variation results in an interval with random variables as
lower and upper bounds.

5.4.1. Block-based statistical static timing analysis


The approach is based on the block-based SSTA by Visweswariah et al. [2006]. This
SSTA is briefly summarized. All timing quantities are represented in the canonical
first-order form: n
â = a0 + ai xi + ar xr (5.23)
X

i=1
a0 is the nominal value, xi represents the variation of n global sources and xr is a
random variable modeling the pure random effect of process variation. ai and ar are the
sensitivities of the timing quantities to xi and xr , respectively. ai and ar are scaled that
xi and xr are Gaussian distributions with zero mean and unit variance (N (0, 1)).
For a block-based timing analysis two operations are required: sum and max. The sum
ŝ of the random variables â and b̂ = b0 + ni=1 bi · xi + br · xr,b is obtained by adding the
P

coefficients of the global variation si = ai +bi . The independent variation ar ·xr,a +br ·xr,b
is replaced by sr · xr,s . sr is determined by matching the variance of ar · xr,a + br · xr,b
and sr · xr,s .
To compute the maximum m̂ = max(â, b̂), the tightness probability Ta is required. Ta
is the probability that â is greater than b̂:
a0 − b0
Ta = P (â > b̂) = Φ( ) (5.24)
θ
q
Φ is the cumulative distribution function and θ = σa2 + σb2 − 2cov. σa2 and σb2 are the
variances and cov is the covariance of â and b̂. The tightness probability Tb is (1 − Ta ).

103
5. Identifying possible critical paths in aged circuits

P (d)
d

Pro
vari
g
Agin

cess
atio
ns
P (d)

P (d)
t

df resh daged dˆ
P (d)

dˆf resh dˆaged

Figure 5.10.: Basic idea for combining aging effects and process variations.

104
5.4. Considering process variations

µ and σ 2 of m can be computed as follows:


a0 − b0
µm = Ta a0 + Tb b0 + θ · Φ( ) (5.25)
θ
a0 − b0
2
σm = Ta · (σa2 + a20 ) + Tb · (σb2 + b20 ) + (a0 + b0 ) · θ · Φ( ) − µm (5.26)
θ
Now, the maximum m̂ can be again written as a canonical form:
n
max(a, b) = m0 + mi xi + mr xr,m (5.27)
X

i=1

m0 is the mean of m̂, mi is Ta · ai + (1 − Ta ) · bi and mr is obtained by matching the


variance of max(â, b̂) and the variance σm
2 .

The result of the sum and the max of two canonical forms is again a canonical form.
Hence, all timing quantities of the timing graph can be expressed as canonical forms.

5.4.2. Representation of timing quantities


Without considering process variation, a timing quantity (q) is an interval with the fresh
value q f resh as lower bound and the maximal aged value q aged as upper bound. To con-
sider process variation, those two bounds become random variables and are represented
as canonical forms:
n
q̂ f resh = q f resh + q i xi + q r xr (5.28)
X

i=1
n
q̂ aged = q aged + q i xi + q r xr (5.29)
X

i=1

For âaged the impact of aging and the impact of process variation can be added because
they are independent (see [Fischer et al., 2008]).
For intervals with random variables the operations sum, max and greater than (“>”)
have to be defined as well. For sum, the lower bounds and upper bounds are added similar
to Equation 5.1. For max, the maximum of the lower bounds and the maximum of the
upper bounds is calculated like in Equation 5.2. For intervals with random variables,
the greater than operation returns a probability:

P (a > b) = P (âf resh > b̂aged ) (5.30)

However, a binary decision is required to decide whether a node or an edge in a timing


graph can be removed. The solution is to introduce a threshold δ. If the probability is
greater than δ, then the element can be removed:

a > b := P (a > b) > δ = P (âf resh > b̂aged ) > δ (5.31)

The combined statistical and aging-aware timing analysis is used but not limited to
determining the PCPs. It can as well be used independent of that for an aging-aware

105
5. Identifying possible critical paths in aged circuits

SSTA. What is not taken into account, so far, is that the transistor parameter drifts
are also probability distributions. Aging effects are statistical processes. Two identical
transistors under identical conditions age differently. To consider this also the nominal
aged value aaged would be a random variable.

5.5. Applications
After introducing the methods to identify the PCPs, here are two applications of PCPs.

5.5.1. Aging-aware timing model for modules


To keep pace with the unabatedly growing complexity of integrated circuits, circuit
design begins at higher and higher levels of abstraction. Furthermore, performance
degradation due to aging effects can no longer be neglected [Austin et al., 2008]. Hence,
timing models at higher abstraction levels are required that accurately describe the
impact of aging.
A module is a circuit with a well-defined function and interface (e.g. adders, multipli-
ers, memory blocks, or even whole processors). The advantage of modules is that they
are designed once and can easily be reused.
The timing model describes the maximal delay of a module. A single value is enough
to specify the timing of a module if aging is not considered (e.g., adder with delay = 1 ns).
By considering performance degradation due to aging, the aged circuit delay depends
on the operation conditions over lifetime and the workload. The aged circuit delay is
defined by the critical path and there is more than one possible critical path for an aged
circuit.
An aging-aware timing model at higher abstraction levels enables one to:

• consider the impact of aging on a system early in the design process,

• determine the system performance quickly at the system level,

• perform an extensive exploration of the design space.

Timing models that take process variation into account have already been published
[Garg and Marculescu, 2007; Li et al., 2009]. To the best of my knowledge, this is the
first aging-aware timing model above gate level.
Such models can, for instance, be used in high-level synthesis (HLS). One important
step in HLS is scheduling. During scheduling, arithmetic/logical operands are mapped
on time slots of duration T0 (see Figure 5.11). Therefore, a pre-characterized library with
different implementations of modules is required. The single implementations differ in
their characteristics (delay, area, power). The schedule is generated by choosing optimal
implementations from the pre-characterized library [Coussy and Morawiec, 2008].
When a module ages, its delay increases. If this is not taken into account during
synthesis, it is possible that the system fails before the end of its specified lifetime
because the time for performing a calculation is no longer sufficient. When a module is

106
5.5. Applications

+ +
aging

T0 + +

Figure 5.11.: The dotted circles indicate the aged performances. The circuit fails because
the second adder needs the result before the first adder has finished its
calculation.

1
7
2 10
6
7
S 3 T 2
8 6 10
11
4 S T
9 8
11
5 4
(a) (b)

Figure 5.12.: The timing graph of the ISCAS’85 circuit c17 is shown in (a). This is
a simplified TG because each net is just represented by one node. An
example of a reduced TG is shown in (b).

characterized for the library, it is unknown how the cell will be utilized. Therefore, a
timing model is needed which provides the delay of a module dependent on operating
conditions over lifetime and workload.
The fundamental idea is to use a strongly reduced TG as a timing model. The maximal
aged circuit delay is determined by a PCP. Hence, it is not necessary to consider the
complete timing graph of a module (see Figure 5.12(a)), but it is enough to just consider
the part of the TG that consists of edges that belong to PCPs (see Figure 5.12(b)).
The timing model is characterized by generating the reduced timing graph that just
contains edges that are part of PCPs. When an aging-aware timing analysis is performed
and the aged delay for the module is needed, the reduced timing graph is evaluated. First,
the workload from the module inputs has to be propagated to the nodes of the timing
graph. Then, the delays of the remaining edges of the reduced timing graph can be
computed by means of AgeGate.
The timing model [Lorenz et al., 2010a,b] is a gray-box model, because it takes the
internal structure of the module into account. It is as accurate as an aging analysis on
gate level, but it is much faster. The speed-up of the timing model depends on how far

107
5. Identifying possible critical paths in aged circuits

200

150

Counts
100

50

0 0 2 4 6 8 10
Circuit delay degradation [%]

Figure 5.13.: Distributionfresh


of circuit
delaydelay
degradation for 1000 workload samples. The dot-
worst-case
ted line is the
aged circuit delay
path worst-case degradation
delay distribution (1000 MC runswhen it is assumed
with random workload) that all PMOS
transistors degrade maximal.

the timing graph could be reduced. The results show a mean speed-up of 30 ×.

5.5.2. Monitoring of aging circuits


A circuit gets slower and may fail when it ages. The amount of degradation depends on
the operating conditions over lifetime and on the workload. If those conditions are not
(precisely) known in the design phase, then the exact degradation of the circuit cannot
be determined and the circuit may fail.
The histogram in Figure 5.13 shows the distribution of the delay degradation of circuit
c7552 from the ISCAS’85 benchmark circuits for 1000 random workload samples. The
degradation was obtained by an aging analysis on gate level. The degradation of the
circuit delay ranges from just 3.0 % up to 8.6 % of the fresh circuit delay for a 90 nm
technology.
If the workload of a circuit is known in advance, the circuit can be optimally designed.
Otherwise, if the workload is not (exactly) known, a worst-case design must be chosen
to be certain that the circuit doesn’t fail during its specified lifetime. This makes the
product less competitive since area, power and performance are wasted. For instance,
c7552 from Figure 5.13 might be designed for a worst-case degradation of 9 % (dotted
line) but the actual degradation might be just 4 %.
A smarter way to deal with uncertain workload conditions is proposed in [Agarwal
et al., 2007] for the first time. The actual degradation of the circuit is periodically
monitored during its lifetime and the system can take countermeasures if it detects that
the degradation is too large. This facilitates a better-than-worst-case design style with
smaller guard bands. It makes the product more competitive since it must not be assured
at design time that the circuit works correctly for rare workload conditions that it will
quite likely not experience during its specified lifetime.

108
5.5. Applications

Delay fault testing


Existing methods to measure circuit degradation rely on single transistors [Reisinger
et al., 2006], a generic test structure [Tschanz et al., 2009], or replica of critical paths
[Hofmann et al., 2010]. They have the drawback of not considering the actual workload
of the circuit. If, on the other hand, razor flip-flops [Das et al., 2009] are used to detect
delay faults, it is not assured that the current critical path is sensitized before it has
degraded too much and the circuit fails.
The only reliable way to determine the degradation of the circuit is to measure the
path delay of the current critical path. However, the current critical path is unknown,
since it depends on the workload (e.g., c7552 had 46 different critical paths for 1000
workload samples). Therefore, the path delay of all PCPs must be determined.
A delay fault test is used to test the combinational logic of a sequential circuit. It
consists of two vectors V1 and V2 . An enhanced-scan test [Bushnell and Agrawal, 2000] is
assumed (see Figure 5.14), which allows to apply two arbitrary vectors to the combina-
tional logic. For this purpose, a normal-scan circuit has to be equipped with additional
hold latches after the scan flip-flops. These latches hold vector V1 while V2 is read in
by the scan chain. Only that way V1 and V2 can be applied directly one after another
at speed (at the operating frequency of the circuit). The purpose of V1 is to sensitize
the path under test (set the side inputs to non-controlling values). The transition from
V1 to V2 activates the path by initiating the appropriate transition at the beginning of
the path under test. One clock period later the resulting output of the combinational
logic is stored into the receiving flip-flops and compared to the target value to check for
a delay fault.
For this application it is not enough to know whether a path still fulfills the timing
specification or not. It is crucial to know in advance that a path will shortly fail.
Therefore, the delay fault test must not be performed at speed, but at a slightly shorter
clock period. The shorter clock period for the test depends on how often the PCPs are
tested and on the desired guard band between the detection of a degraded path and an
actual failure of the circuit.

How the system can react


A controller initiates the delay test and reacts when a path degrades too much. It can
be implemented in software, since the circuit just has to be tested every several weeks
at most. Hence, the proposed aging monitor requires relatively little area overhead,
especially when it is already an enhanced-scan design is available.
Although the focus of this chapter is to identify PCPs, here are several alternatives
how a system can react when a circuit degrades too much:
• The system reduces the clock frequency or increases the supply voltage to com-
pensate for the degraded circuit delay [Mintarno et al., 2010].
• The degraded circuit can be replaced by another equivalent circuit. Due to the
recovery of the threshold voltage drift caused by NBTI also the degraded circuit
may be used again after some recovery time [Sylvester et al., 2006].

109
5. Identifying possible critical paths in aged circuits

D Q
Hold D Q
Latch
SCAN SCAN

SCAN EN SCAN EN

combinational logic


D Q
Hold D Q
Latch
SCAN SCAN

SCAN EN 1 SCAN EN

0
D Q
Hold D Q
Latch
SCAN SCAN

SCAN EN SCAN EN

SCAN SCAN CLK SCAN


CHAIN 1 ENABLE CHAIN 1

Figure 5.14.: Enhanced-scan design. The standard scan design is extended by hold
latches. Thereby, the first delay test vector V1 is latched by the hold
latches while the second delay test vector V2 is read into the scan chain.

110
5.5. Applications

• Sometimes it is enough to know that a circuit may fail. In probabilistic CMOS


[Chakrapani et al., 2006], a recent research area, faults in CMOS circuits are
accepted. For instance, a degraded processor core can be used for less critical
tasks (audio and video applications are quite fault-tolerant).

• By testing all PCPs, the system does not only know that a circuit degrades but it
also knows which paths of that circuit are too slow. Hence, the circuit is just too
slow for several input vectors which have to be avoided.

Extensions for the methods to identify PCPs


When the PCPs are used for delay tests, some differences compared to the method
introduced in Section 5.2 have to be considered. First, the PCPs must be sensitizable
and no path must be removed from the PCPs due to another PCP that is not sensitizable.
Second, unlike for timing model generation, the clock period of the circuit is known and
can be used to reduce the number of PCPs further. Third, a final reduction step is
introduced that enumerated the paths.
Those three differences are discussed in more detail in the following three subsections.

Testability of paths
When nodes or edges are removed, it is crucial that all remaining PCPs are testable,
otherwise it might not be possible to determine the degradation of the current critical
path. This has to be considered in the arrival time and in the delay to sink reduction
step. To remove an edge from the timing graph, it is checked whether a path segment A
has a larger delay than a path segment B. An edge is only removed if the path segment
with the greater delay is statically sensitizable. In fact, it is enough if there is a path
segment which is testable and has a greater delay than path segment A to remove the
edge. It does not necessarily have to be the path segment B.

Considering the clock period of the circuit


In contrast to the timing model generation for modules, the specified clock period is
known for this application . The clock period determines the required time at T (used
in slack and path delay reduction step). If the determined required time is larger than
the minimal aged circuit delay obtained in Section 5.3.4, the number of PCPs can be
further reduced.
If, for instance, the clock period is set to 150 % of the critical path delay of the fresh
circuit, there won’t be any paths that must be tested, because for the technologies and
aging effects that are consider the path delay does not degrade more than 50 %.

Path-based reduction step


In Section 5.2, it was argued that path-enumerative methods are too inefficient to handle
complex digital circuits. However, if the remaining PCPs are too many to enumerate

111
5. Identifying possible critical paths in aged circuits

Figure 5.15.: Path-based reduction step

them all, they are too many to test them all as well. Therefore, at least the final reduction
step that works on an already reduced TG may be path-based. This final reduction step
doesn’t remove any nodes or edges, it just identifies those paths in the reduced TG that
have a path delay greater the fresh critical path delay. This is done by enumerating all
paths with respect to the path delay in descending order. The enumeration is stopped
when the first path has a maximal aged path delay less than the required time at T (see
Figure 5.15).

State of the art of path enumeration techniques

Finding paths for the purpose of testing a circuit is a research topic for quite some time.
The first approaches just considered the nominal gate delay [Li et al., 1989; Sharma and
Patel, 2002]. Their goal was to identify paths to test all gates for delay faults. Later,
process variation was considered [Lu et al., 2005; Zolotov et al., 2010] by identifying
critical paths for all process space conditions.
Two other publications are concerned with testing aged circuits. Wang et al. [2007a]
introduce path-enumerative methods to identify paths which exceed the clock period
under worst-case aging conditions. A optimization problem is set up to obtain these
maximal aged path delays. However, a mistake was made by not considering that NBTI
just degrades every other gate (those with a falling input transition). Without this
mistake, the maximal aged path delay would be equal to the upper bound of the path
delay intervals, as it is discussed in Section 5.3.3.
Baba and Mitra [2009] proposed a method to identify the paths of an aged circuit that
must be tested. Gate delay intervals are defined and methods are introduced to remove
nodes and edges. Their approach is significantly improved in the following points:

• The impact of process variation is considered when the PCPs are determined (Sec-
tion 5.4).

112
5.6. Results

• The correlation of gate delays along a path is taken into account (Section 5.3). In
[Baba and Mitra, 2009] path delays are always intervals.

• Baba and Mitra [2009] determine the PCPs first and in a separate step is checked
whether those paths are sensitizable or not. If a path is detected not to be sensiti-
zable, it must be checked whether the removal of other paths from the PCPs was
unjustified.

• An aging-aware STA has just to be performed once. In [Baba and Mitra, 2009] a
STA has to be run again whenever a common edge is detected (Section 5.2.5).

• The final reduction step to determine the PCPs for testing aged circuits is enumer-
ative, which allows the removal of some additional paths from the already reduced
TG.

• In the results section it is shown that by calculating the number of PCPs for
different lifetimes the number of paths that must be tested in the beginning of the
lifetime can significantly be reduced.

5.6. Results
The proposed approach is tested with ISCAS’85 and ITC’99 benchmark circuits. The
circuits are synthesized with an industrial 90 nm cell library. To generate the aging-
aware gate models, single staged gates (inverters, NOR and NAND gates with 2 to 4
inputs) from the library are characterized. The operating conditions are 1.32 V, 125 ◦C
and a specified lifetime of 10 years. Those harsh conditions result in a large maximal
threshold voltage drift (17 % of nominal threshold voltage) and, therefore, lead to large
intervals for the gate delays.
The benchmark circuits are used for the following investigations:

• The minimal aged circuit delay Daged,min is determined.

• It is checked how far the timing graph can be reduced. This is relevant for the
aging-aware timing model.

• And the number of PCPs for the circuits is obtained. The PCPs are important for
testing the circuits during the lifetime.

5.6.1. Minimal aged delay


Table 5.1 shows the fresh circuit delay, the maximal aged circuit delay and the minimal
aged circuit delay of the ISCAS’85 circuits. The maximal aged delay Daged is simply
obtained by adding up the maximal aged gate delays. The minimal aged path delay
(Daged,min ) is the result of an minimization of the 1000 slowest aged paths of the circuit,
as discussed in Section 5.3.4.

113
5. Identifying possible critical paths in aged circuits

Df resh [ns] Daged [ns] Daged,min [ns] ∆Daged,min [%]


c17 1.16e-10 1.27e-10 1.16e-10 0
c432 1.17e-09 1.28e-09 1.17e-09 0
c499 1.27e-09 1.39e-09 1.29e-09 17.3
c880 9.12e-10 9.93e-10 9.19e-10 8.63
c1355 1.43e-09 1.57e-09 1.44e-09 7.03
c1908 1.88e-09 2.06e-09 1.88e-09 0
c2670 9.83e-10 1.06e-09 9.83e-10 0
c3540 1.63e-09 1.78e-09 1.66e-09 19.3
c5315 1.8e-09 1.97e-09 1.82e-09 9.96
c6288 4.93e-09 5.37e-09 4.98e-09 13.3
c7552 1.68e-09 1.83e-09 1.69e-09 8.8

Table 5.1.: Minimal aged circuit delay

The minimal delay degradation is defined as follows:

Daged,min − Df resh
∆Daged,min = (5.32)
Daged − Df resh

A ∆Daged,min of 0 % means that Daged,min is equal to Df resh and a ∆Daged,min of


100 % would mean that Daged,min is equal to Daged . The values of Daged,min lie between
0 % and 19 %. There are several reasons why those values are below the values reached
for an inverter chain (see Section 5.3.2):

• The linearization of the gate delay dependencies results in a smaller minimal aged
delay (50 % compared to 85 % for the inverter chain).

• It depends on the gate types along a given path. As shown in Section 5.3.4, only
the constraints for an inverter and a NOR gate prevent that the gates along the
path do not age (If a given path just consists of NAND gates, then Daged,min would
be equal to Df resh ).

• Furthermore, the difference of the fresh path delay for a rising D(Pr )f resh and
a falling D(Pf )f resh input transition is relevant. Minimized is the maximum of
D(Pr )aged and D(Pf )aged . If the difference is too large, the maximum does not
chang, since the SPs are chosen in a way that only the smaller of both path delays
is increased.

5.6.2. Node and edge reduction


For the aging-aware timing model, the achievable speed-up compared to a aging-aware
analysis on gate level depends on how far the timing graph can be reduced. The reduction

114
5.6. Results

Initial Reduction Speed-up


Nodes Edges Nodes [%] Edges [%] Time [s]
c17 26 40 57.7 70 0.0 3.6 ×
c432 526 890 79.8 86.3 0.7 22.1 ×
c499 1152 2010 66 77.4 1.7 17.2 ×
c880 998 1750 86 90.8 0.9 28.7 ×
c1355 1262 2104 62.2 73.7 2.3 6.7 ×
c1908 930 1712 80 87.4 1.5 16.9 ×
c2670 1732 2994 95.4 97 2.9 55.0 ×
c3540 1912 3496 71.3 80.7 2.5 30.6 ×
c5315 3326 6200 93.8 96.2 6.5 43.4 ×
c6288 5268 9968 26.8 43.7 17.5 6.7 ×
c7552 4900 8348 90.3 93.5 9.0 96.0 ×
Ø 73.6 81.5 4.1 29.7 ×

Table 5.2.: Reduction ratios of nodes and edges

(ratio) for nodes (edges) is defined as follows:

initial nodes (edges) − reduced nodes (edges)


reduction (ratio) = (5.33)
initial nodes (edges)
Table 5.2 shows the initial number of nodes and edges, the achieved reduction ratios,
the time the characterization took and the speed-up compared to an aging-aware TA on
gate level. On average, the number of nodes could be reduced by 74 % and the number
of edges could be reduced by 82 %, which results in a speed-up of 30 ×. This can be
explained because the runtime depends on the number of nodes as well as on the number
of edges4 .

5.6.3. Possible critical paths


In Table 5.3 results are given for the proposed approach and, as a comparison, for
the approach described by Baba and Mitra [2009]. Just those benchmark circuits with
more than 500 gates are shown. In the state-of-the-art approach all reduction steps are
performed except the path enumeration step and the minimal aged delay is not computed
for the slack and path delay reduction step.
For the determination of PCPs, it is assumed that the specified clock period of the
circuits is equal to the fresh circuit delay Dcrit,f resh . This is a worst-case assumption,
since it results in the largest possible number of PCPs. In a real product, there would be
a safety margin between Dcrit,f resh and the specified clock period. Without this safety
4
It is assumed that the runtime depends linearly on the number of edges and nodes of the timing
graph: Roughly 1/5 of the nodes and 1/5 of the edges remains. Hence, the resulting run time is
1/5 · 1/5 = 1/25 of the initial runtime or the speed-up is 25 ×.

115
5. Identifying possible critical paths in aged circuits

margin it is almost inevitable that the system has to react because the circuit degrades
to much.
As Baba and Mitra [2009] do not take process variations (PVs) into account,the results
are first compared without considering PVs. The proposed approach can reduce the
number of PCPs compared to Baba and Mitra [2009] by a factor of 2.7 × (column
Impr. in Table 5.3). For all circuits, except for c6288 and b19, the number of PCPs
is reasonably small, so that it is feasible to test them all. For circuit c6288 and b19, a
traditional worst-case design must be used. For the other circuits a better-than-worst-
case design can be used by testing all identified PCPs periodically. It seems that the
number of identified PCPs is more dependent on the TG structure than on the pure
circuit size: Circuit c6288 with just 2600 gates has over 1012 PCPs, however, b18 with
over 80 000 gates just has 236 PCPs.
The runtime to determine all PCPs of a circuit with considering PVs is 30 min on
average on a workstation with a 2.4 GHz CPU and 8 GB RAM. Without circuit b19,
which took about 7 h, the average runtime of the remaining circuits is 10 min.
Finally, the last two columns show the results when all reduction steps are performed
and local PVs are considered. The δ, which defines when a timing quantity is considered
greater than another quantity, is set to 0.9. Hence, a timing quantity is considered
greater than another one when the probability for this is greater 90 %. For the moment
just one source of variation is considered, namely the pure random variation of the
threshold voltage xr . xr is set to 10 % of the nominal Vth . However, as described in
Visweswariah et al. [2006] an arbitrary number of varying parameters can be considered.
Due to the uncertainty of the gate delays introduced by PVs, the number of PCPs that
have to be tested is increased.
More detailed results for all benchmark circuits are given in the Appendix B. There, it
is shown how far the number of PCPs can be reduced by the individual reduction steps.
The test time can be further reduced by determining sets of PCPs for different time
periods (see Table 5.4). The PCPs for circuit c3540 are calculated every 2 years until
the specified lifetime is reached. Hence, for the first 2 years just 175 paths have to be
checked. The number of paths to be checked increases with time and in the last two
years all 1318 have to be checked.

5.7. Summary
Aging is one of the main factors limiting the reliability of nano-scale circuits. The
degradation of a circuit strongly depends on operating conditions and workload. If
those conditions are not (yet) known, it is hard to accurately predict the degraded
timing behavior of a circuit, which is given by the delay of the critical path. A method
is proposed to identify all paths of a circuit that may become critical due to degradation,
the so called possible critical paths.
First, this is done by introducing intervals for gate delays, since the exact delays are
unknown. Later, it is shown that those intervals for the path delay are sometimes too
pessimistic and an efficient method to calculate a lower bound for the path delay, called

116
5.7. Summary

Initial Mitra Proposed approach


process variations (PVs) are neglected with PVs
# Gates # Paths # PCPs # PCPs Impr. runtime [s] # PCPs runtime [s]
c499 534 452608 1487 375 4.0 24.79 696 65.25
c1355 589 522368 3376 2224 1.5 41.39 3074 117.91
c2670 708 31286 21 21 1.0 12.43 46 23.42
c3540 905 4248254 15276 1345 11.4 57.13 2608 173.93
c5315 1484 738816 1568 899 1.7 33.09 2012 94.4
c6288a 2601 5.1 · 1016 6.8 · 1012 4.1 · 1012 1.6 559.96 9.7 · 1012 661.31
c7552 2242 448564 3173 522 6.1 31.02 942 79.07
b04 540 185324 120 120 1.0 17.08 378 39.64
b05 1156 189666 3295 981 3.4 20.57 1672 61.59
b07 615 4046 10 10 1.0 7.03 24 14.5
b11 1050 7156 21 16 1.3 10.32 25 21.77
b12 1497 20020 237 144 1.6 13.18 247 32.79
b14 5718 2.0 · 108 52170 10948 4.8 880.73 49004 1340.81
b15 10236 2.0 · 107 42089 6387 6.6 176.25 14116 649.8
b17 24840 6.1 · 107 275 173 1.6 305.71 238 782.09
b18a 83679 2.6 · 1026 236 236 1.0 1589.8 608 4642.54
b19a 144747 4.7 · 1022 1.0 · 1019 1.0 · 1019 1.0 3737.23 2.4 · 1019 25182.16
b20a 13097 6.7 · 1012 6138 5283 1.2 452.09 9998 890.82
b21 13052 7.2 · 1012 3796 3452 1.1 470.35 5755 771.66
b22 19731 6.7 · 1012 2436 1489 1.6 397.52 2828 871.6
Average 2.7 7.4 min 30 min

Table 5.3.: Comparison of the proposed approach to the approach from Baba and Mitra
[2009]. Shown are the initial number of gates and paths in the circuit, the
number of PCPs, the improvement in the number of PCPs of our approach
compared to Baba and Mitra [2009] and the runtimes with and without
considering process variations.
a
Number of PCPs for those circuits is determined without checking statical sensitizability because
the BDDs for the circuits were too large and could not be set up. However, statical sensitizability
checking should be possible with a SAT-solver based approach (e.g. Drechsler et al. [2008]).

Time [y] 2 4 6 8 10
# PCPs 175 396 773 1114 1318

Table 5.4.: Number of PCPs over time of circuit c3540

117
5. Identifying possible critical paths in aged circuits

minimal aged path delay, is presented. A way to incorporate process variation when the
PCPs are determined is introduced as well.
Two applications for PCPs are given: An aging-aware timing model for modules and
the usage of PCPs to monitor a circuit in the field. The results show that the timing
model has a mean speed-up of 30 × compared to a timing analysis on gate level and the
number of paths that must be tested can be reduced by 2.7 × compared to a state-of-
the-art approach.

118
6. Conclusion
Aging leads to a time-dependent change of device parameters. Unlike other effects that
cause a variation of ICs, aging effects have not received much attention yet. However
due to the ongoing miniaturization, the degradation of the circuit performance caused
by aging effects increases. Furthermore, the performance gain by moving from one tech-
nology to the next decreases. Hence, generous safety margins are no longer affordable,
since this makes the transition to the latest technology generations uneconomical. To
enable the continued scaling, new design techniques are required that allow the reduction
of the safety margins. The contribution of this thesis are very accurate analyzing and
monitoring methods to determine the timing degradation of aged circuits.
First, the two dominant drift-related aging effects were investigated. It was shown
how the parameter drift can be modeled and which impact those drifts have on the gate
performances. It turned out that the gate delay as well as the output slope is increased.
However, the power dissipation of a gate is not affected or even slightly reduced.
An aging analysis flow on gate level capable of determining the impact of the two
dominant drift-related aging effects on circuit timing was developed and implemented.
The centerpiece of the analysis flow is an aging-aware gate model called AgeGate. Age-
Gate consists of a canonical gate model, technology specific degradation equations, and
information about the internal gate structure. In contrast to existing aging-aware gate
models, AgeGate takes NBTI and HCI into account, it does not just compute an aged
gate delay but an aged output slope as well and, last but not least, it considers individual
transistor drifts. The results show that both aging effects are relevant, not calculating
an aged output slope underestimates the performance degradation by 24 % on average,
and not computing individual transistor drifts overestimates the degradation by 20 % on
average.
The continued scaling requires that the design is done on higher and higher levels of
abstraction. Based on AgeGate, an aging-aware timing model for modules was proposed.
The basic idea of the timing model was to determine all possible critical paths of a
module that might become critical due to aging. This is done by removing all elements
of a timing graph that do not belong to a possible critical path. This way, the timing
model is as accurate as an aging-aware timing analysis on gate level but a mean speed-up
of 30 × (maximum speed-up 96 ×) could be achieved.
Aging is an increasing reliability concern in advanced technologies. The timing degra-
dation of a circuit strongly depends on the workload and the operating conditions over
lifetime. However, often these factors are unknown during the design of a circuit. A
method that monitors the circuit by testing the delay of all possible critical paths was
introduced. This way, countermeasures must only be taken if the circuit degrades too
much. The circuit is more competitive, since it must not be designed for worst-case con-

119
6. Conclusion

ditions. Compared to a state-of-the-art approach the number of possible critical paths


could be reduced by a factor of 2.7. Furthermore, process variation can be considered
for the identification of possible critical paths.

120
A. Constraints for NAND and NOR gates
First, the constraint for a NAND gate with two inputs is derived. The SP o at the output
of the NAND gate is:
SP o = 1 − SP i · SP j (A.1)
Solving Equation A.1 for the side input SP j :

1 − SP o
SP j = (A.2)
SP i
Taking into account that SP j is between 0 and 1 gives the following constraint:

0 ≤ SP j ≤ 1 (A.3)
1 − SP o
0≤ ≤1 (A.4)
SP i
SP o ≥ 1 − SP i (A.5)

Next the constraint for a NAND gate with three inputs is derived:

SP o = 1 − SP i · SP j · SP k (A.6)
1 − SP o
SP k = (A.7)
SP i · SP j

SP k must also be between 0 and 1:

0 ≤ SP k ≤ 1 (A.8)
1 − SP o
0≤ ≤1 (A.9)
SP i · SP j
SP o ≥ 1 − SP i · SP j (A.10)

By considering that SP j is between 0 and 1, the lower bound of the inequality for SP o
of a three input NAND gate is equal to the constraint for a NAND gate with two inputs:

SP o ≥ 1 − SP i (A.11)

The constraint for a NAND gate with n inputs is equivalent.


Finally, it is shown that the constraint for a NOR gate with 3 (or n) inputs is equivalent
to the constraint for a two input NOR gate.

SP o = (1 − SP i ) · (1 − SP j ) · (1 − SP k ) (A.12)

121
A. Constraints for NAND and NOR gates

SP o
SP k = 1 − (A.13)
(1 − SP i ) · (1 − SP j )

0 ≤ SP k ≤ 1 (A.14)
SP o
0≤1− ≤1 (A.15)
(1 − SP i ) · (1 − SP j )
SP o
0≤ ≤1 (A.16)
(1 − SP i ) · (1 − SP j )

SP o ≤ (1 − SP i ) · (1 − SP j ) (A.17)
By considering that SP j is between 0 and 1, the upper bound of the inequality for SP o
of a three (or a n) input NOR gate is equal to the constraint for a two input NOR gate:

SP o ≤ (1 − SP i ) (A.18)

122
B. More detailed results for PCP
identification
Table B.1 shows the number of PCPs and the corresponding runtimes for all ISCAS’95
and ITC’99 circuits. The reduction steps, as discussed in Section 5.2, are applied to the
initial TG one after another. First, the slack reduction step is performed. Next, the
path delay reduction step is applied to the already reduced TG. Then, the arrival time
and the delay to sink reduction steps are performed. The column “All reduction steps
considering minimum aged circuit delay” shows the resulting number of PCPs when
Dcrit,f resh is replaced by Dcrit,aged,min (which is relevant for the slack, the path delay
and the pathbased reduction steps) and all reduction steps are performed again. Finally,
the last column shows the number of PCPs and the runtime when the same reduction
steps as in the previous column are performed but process variations are considered as
well.

123
Initial Slack reduc- Path delay reduc- Arrival time and All reduction All reduction
tion step tion step delay to sink re- steps considering steps considering
duction steps minimum aged minimum aged
circuit delay circuit delay and
process variations
# Gates # Paths # PCPs ([s]) # PCPs ([s]) # PCPs ([s]) # PCPs ([s]) # PCPs ([s])
c17 7 18 6 (0.00) 6 (0.00) 3 (0.00) 3 (0.00) 5 (0.19)
c432 226 123652 22378 (0.17) 13126 (0.01) 157 (0.05) 157 (0.08) 167 (24.62)
c499 534 452608 69517 (0.99) 29518 (0.05) 1487 (1.24) 375 (2.53) 696 (65.25)
c880 438 16956 1650 (0.58) 1007 (0.01) 98 (0.05) 74 (0.08) 118 (18.92)
c1355 589 522368 72216 (0.74) 40432 (0.05) 3376 (2.55) 2224 (4.90) 3074 (117.91)
c1908 431 1.5e+06 138412 (0.86) 130736 (0.02) 4596 (16.49) 2091 (23.44) 5407 (394.14)
c2670 708 31286 2402 (1.98) 1068 (0.01) 21 (0.02) 21 (0.04) 46 (23.42)
c3540 905 4.2e+06 923326 (0.80) 357372 (0.06) 15276 (0.45) 1345 (0.45) 2608 (173.93)
c5315 1484 738816 82572 (4.44) 75368 (0.02) 1568 (0.06) 899 (0.11) 2012 (94.40)
c6288a 2601 5.1e+16 3.5e+16 (0.92) 2.3e+16 (0.33) 6.8e+12 (6.99) 4.1e+12 (10.92) 9.7e+12 (661.31)
c7552 2242 448564 18310 (6.08) 9480 (0.03) 3173 (0.12) 522 (0.26) 942 (79.07)
b02 28 72 4 (0.02) 3 (0.00) 3 (0.00) 3 (0.00) 3 (0.55)
b03 233 1632 164 (0.20) 131 (0.01) 80 (0.03) 80 (0.17) 96 (9.97)
b04 540 185324 17172 (1.45) 15336 (0.01) 120 (0.03) 120 (0.05) 378 (39.64)
B. More detailed results for PCP identification

b05 1156 189666 19653 (1.48) 12392 (0.04) 3295 (0.17) 981 (0.53) 1672 (61.59)
b06 74 238 23 (0.03) 18 (0.00) 10 (0.01) 7 (0.02) 16 (2.08)
b07 615 4046 37 (1.10) 35 (0.01) 10 (0.01) 10 (0.03) 24 (14.50)
b08 155 2632 176 (0.13) 91 (0.00) 9 (0.01) 9 (0.02) 6 (11.08)
b09 167 1858 270 (0.18) 200 (0.00) 83 (0.04) 59 (0.08) 40 (7.78)
b10 213 1790 183 (0.17) 143 (0.00) 45 (0.02) 42 (0.07) 55 (9.33)
b11 1050 7156 106 (2.17) 87 (0.00) 21 (0.02) 16 (0.05) 25 (21.77)
b12 1497 20020 708 (2.66) 523 (0.02) 237 (0.09) 144 (0.35) 247 (32.79)
b13 307 1216 3 (0.45) 3 (0.00) 2 (0.00) 2 (0.01) 5 (7.89)
b14 5718 2e+08 6.1e+06 (24.44) 5.2e+06 (0.14) 52170 (32.75) 10948 (46.50) 49004 (1340.81)
b15 10236 2e+07 1.7e+06 (28.39) 894146 (0.38) 42089 (2.29) 6387 (10.95) 14116 (649.80)
b17 24840 6.1e+07 10898 (212.04) 7608 (0.04) 275 (0.21) 173 (0.63) 238 (782.09)
b18a 83679 2.6e+26 2.6e+07 (1001.56) 2.4e+07 (0.06) 236 (0.14) 236 (0.23) 608 (4642.54)
b19a 144747 4.7e+22 2.2e+22 (2133.45) 1.8e+22 (2.48) 1e+19 (67.52) 1e+19 (228.69) 2.4e+19 (25182.16)
b20a 13097 6.7e+12 2.7e+06 (82.07) 2.5e+06 (0.13) 6138 (0.99) 5283 (1.09) 9998 (890.82)
b21 13052 7.2e+12 7.6e+11 (61.67) 7.3e+11 (0.20) 3796 (1.05) 3452 (1.85) 5755 (771.66)
b22 19731 6.7e+12 519916 (114.18) 385166 (0.12) 2436 (0.69) 1489 (1.03) 2828 (871.60)
Table B.1.: Detailed results for the proposed reduction steps

124
a
Number of PCPs for those circuits is determined without checking statical sensitizability because the BDDs for the circuits were too large and
could not be set up. However, statical sensitizability checking should be possible with a SAT-solver based approach (e.g. Drechsler et al. [2008].)
Bibliography
M. Agarwal, Bipul C. Paul, Ming Zhang, and Subhasish Mitra. Circuit Failure Prediction
and Its Application to Transistor Aging. In IEEE VLSI Test Symposium, pages 277–
286, May 2007.

M. A. Alam, H. Kufluoglu, D. Varghese, and S. Mahapatra. A comprehensive model for


PMOS NBTI degradation: Recent progress. Microelectronics Reliability, 47(6):853 –
862, June 2007.

Charles J. Alpert, Anirudh Devgan, and Stephen T. Quay. Buffer Insertion for Noise
and Delay Optimization. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 18(11):1633–, November 1999.

Todd Austin, Valeria Bertacco, Scott Mahlke, and Yu Cao. Reliable Sytems on Unreliable
Fabrics. IEEE Design and Test, 2008.

A. H. Baba and Subhasish Mitra. Testing for transistor aging. In IEEE VLSI Test
Symposium, pages 215 – 220, May 2009.

Thomas Baumann, Stefan Drapatz, Georg Georgakos, Karl Hofmann, and Christian
Pacha. Accelerating and Masking Properties of Transistor Degradation of Selected
Digital Circuit Topologies. Honey milestone report 3.1.2-q11, Infineon Technologies,
August 2010.

Manuel J. Bellido, Jorge Juan, and Manuel Valencia. Logic-Timing Simulation and the
Degradation Model. Imperial College Press, London, 2006.

D.R. Bild, G.E. Bok, and R. P. Dick. Minimization of NBTI performance degradation
using internal node control. In Design, Automation and Test in Europe (DATE), pages
148–153, 2009.

David Blaauw, Kaviraj Chopra, Ashish Srivastava, and Louis Scheffer. Statistical Timing
Analysis: From Basic Principles to State of the Art. IEEE Trans. on CAD of Integrated
Circuits and Systems, 4:589–607, 2008.

David T. Blaauw, Chanhee Oh, Vladimir Zolotov, and Aurobindo Dasgupta. Static
electromigration analysis for on-chip signal interconnects. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 22(1):39–48, January
2003.

Michael L. Bushnell and Vishwani D. Agrawal. Essentials of Electronic Testing. Kluwer


Academic Publisher, 2000.

125
Bibliography

Cadence. Reliability simulation in integrated circuit design. Technical report, Cadence


Design Systems, Inc., 2003.

Cadence. ECSM - Effective Current Source Model. http://www.cadence.com/


Alliances/languages/Pages/ecsm.aspx, 2007.

Lakshmi N. B. Chakrapani, Bilge E. S. Akgul, Suresh Cheemalavagu, Pinar Korkmaz,


Krishna V. Palem, and Balasubramanian Seshasayee. Ultra-efficient (embedded) SOC
architectures based on probabilistic CMOS (PCMOS) technology. In Design, Automa-
tion and Test in Europe (DATE), pages 1110–1115, 2006.

A. P. Chandrakasan and R. W. Brodersen. Minimizing power consumption in digital


CMOS circuits. Proceedings of the IEEE, 83(4):498 – 523, April 1995.

Jifeng Chen, Shuo Wang, Nemat Bidokhti, and Mohammad Tehranipoor. A Framework
for Fast an Accurate Critical-Reliability Paths Identification. In IEEE North Atlantic
Test Workshop (NATW), May 2011.

Liang-Chi Chen, Sandeep K. Gupta, and Melvin A. Breuer. A new gate delay model
for simultaneous switching and its applications. In ACM/IEEE Design Automation
Conference (DAC), pages 289–294, New York, NY, USA, 2001. ACM.

Mihir Choudhury, Vikas Chandra, Kartik Mohanram, and Robert C. Aitken. Analytical
model for TDDB-based performance degradation in combinational logic. In Design,
Automation and Test in Europe (DATE), pages 423 – 428, 2010.

M. A. Cirit. Estimating dynamic power consumption of CMOS circuits. In IEEE/ACM


International Conference on Computer-Aided Design (ICCAD), 1987.

Philippe Coussy and Adam Morawiec. High-Level Synthesis from Algorithms to Digital
Circuits. Springer, 2008.

John Croix and Martin Wong. Blade and razor: cell and interconnect delay analysis
using current-based models. In ACM/IEEE Design Automation Conference (DAC),
pages 386–389, June 2003.

S. Das, C. Tokunaga, S. Pant, W. H. Ma, S. Kalaiselvan, K. Lai, D.M. Bull, and David T.
Blaauw. RazorII: In Situ Error Detection and Correction for PVT and SER Tolerance.
IEEE Journal of Solid-State Circuits, 44(1):32–48, January 2009.

Rolf Drechsler, Stephan Eggersglueß, Gooerschwin Fey, Andreas Glowatz, Friedrich


Hapke, Juergen Schloeffel, and Daniel Tille. On Acceleration of SAT-Based ATPG
for Industrial Designs. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 27(7):1329–1333, July 2008.

Robert Entner. Modeling and Simulation of Negative Bias Temperature Instability. PhD
thesis, Technische Universität Wien, 2007.

126
Bibliography

Thomas Fischer, E. Amirante, Karl Hofmann, M. Ostermayr, Peter Huber, and Doris
Schmitt-Landsiedel. A 65nm test structure for the analysis of NBTI induced statistical
variation in SRAM transistors. In European Solid-State Device Research Conference
(ESSDERC), pages 51–54, September 2008.

S. Garg and Diana Marculescu. System-Level Process Variation Driven Throughput


Analysis for Single and Multiple Voltage-Frequency Island Designs. In Design, Au-
tomation and Test in Europe (DATE), pages 1–6, April 2007.

Tibor Grasser, B. Kaczer, W. Goes, T. Aichinger, P. Hehenberger, and M. Nelhiebel.


A two-stage model for negative bias temperature instability. In IEEE International
Reliability Physics Symposium (IRPS), pages 33 – 44, April 2009.

Stephan Henzler, Martin Wirnshofer, and Dominik Lorenz. Intrinsic time margin mon-
itoring for assessment of process variation and aging, 2009.

S. Herbert and D. Marculescu. Mitigating the impact of variability on chip-


multiprocessor power and performance. Very Large Scale Integration (VLSI) Sys-
tems, IEEE Transactions on, 17(10):1520 –1533, oct. 2009. ISSN 1063-8210. doi:
10.1109/TVLSI.2009.2020394.

Karl Hofmann, Hans Reisinger, K. Ermisch, C. Schlunder, Wolfgang Gustin, T. Pompl,


Georg Georgakos, K.v. Arnim, J. Hatsch, T. Kodytek, Thomas Baumann, and Chris-
tian Pacha. Highly accurate product-level aging monitoring in 40nm CMOS. In
Symposium on VLSI Technology (VLSIT), pages 27–28, June 2010.

Vincent Huard, CR Parthasarathy, Alain Bravaix, Chloe Guerin, and Emmanuel Pion.
CMOS device design-in reliability approach in advanced nodes. In IEEE International
Reliability Physics Symposium (IRPS), pages 624–633, 2009.

A. E. Islam, H. Kufluoglu, D. Varghese, S. Mahapatra, and M. A. Alam. Recent Issues


in Negative-Bias Temperature Instability: Initial Degradation, Field Dependence of
Interface Trap Generation, Hole Trapping Effects, and Relaxation. IEEE Transactions
on Electron Devices (TED), pages 2143 – 2154, September 2007.

ITRS. The International Technology Roadmap for Semiconductors: Process Integration,


Devices & Structures (PIDS) . http://www.itrs.net/Links/2001ITRS/PIDS.pdf,
2001.

ITRS. The International Technology Roadmap for Semiconductors: Process Inte-


gration, Devices & Structures (PIDS). http://www.itrs.net/Links/2009ITRS/
2009Chapters_2009Tables/2009Tables_FOCUS_C_ITRS.xls, 2009.

Yun-Cheng Ju and Resve A. Saleh. Incremental techniques for the identification of


statically sensitizable critical paths. In ACM/IEEE Design Automation Conference
(DAC), pages 541–546, New York, NY, USA, 1991. ACM.

127
Bibliography

Kunhyuk Kang, Sang Phill Park, Kaushik Roy, and Muhammad A. Alam. Estimation of
statistical variation in temporal NBTI degradation and its impact on lifetime circuit
performance. In IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), pages 730–734, Piscataway, NJ, USA, 2007. IEEE Press.

Margot Karam, W. Fikry, H. Haddara, and H. Ragai. Implementation of hot-carrier


reliability simulation in ELDO. In IEEE International Symposium on Circuits and
Systems (ISCAS), volume 5, pages 515–518, 2001.

Christoph Knoth, Irina Eichwald, Petra Nordholz, and Ulf Schlichtmann. White-Box
Current Source Modeling Including Parameter Variation and Its Application in Timing
Simulation. In International Workshop on Power and Timing Modeling, Optimization
and Simulation (PATMOS), pages 200–210, September 2010.

Christoph Knoth, Carsten Uphoff, Sebastian Kiesel, and Ulf Schlichtmann. SWAT: Sim-
ulator for Waveform-Accurate Timing including Parameter Variations and Transistor
Aging. In International Workshop on Power and Timing Modeling, Optimization and
Simulation (PATMOS), September 2011. to appear.

Haldun Kufluoglu, V. Reddy, A. Marshall, J. Krick, T. Ragheb, C. Cirba, A. Krishnan,


and C. Chancellor. An Extensive and Improved Circuit Simulation Methodology For
NBTI Recovery. In IEEE International Reliability Physics Symposium (IRPS), pages
670–675, 2010.

Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. An Analytical Model for
Negative Bias Temperature Instability. In IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pages 493–496, 2006.

Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. NBTI-aware synthesis of


digital circuits. In ACM/IEEE Design Automation Conference (DAC), pages 370–375,
New York, NY, USA, 2007a. ACM.

Sanjay V. Kumar, Chris H. Kim, and Sachin S. Sapatnekar. NBTI-Aware Synthesis of


Digital Circuits. In ACM/IEEE Design Automation Conference (DAC), pages 370–
375, 2007b.

Yung-Huei Lee, Neal Mielke, Marty Agostinelli, Sukirti Gupta, Ryan Lu, and William
McMahon. Prediction of Logic Product Failure Due To Thin-Gate Oxide Breakdown.
In IEEE International Reliability Physics Symposium (IRPS), pages 18 – 28, 2006.

Bing Li, Ning Chen, Manuel Schmidt, Walter Schneider, and Ulf Schlichtmann. On
Hierarchical Statistical Static Timing Analysis. In Design, Automation and Test in
Europe (DATE), April 2009.

Wing Ning Li, Sudhakar M. Reddy, and Sartaj K. Sahni. On Path Selection in Combi-
national Logic Circuits. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 26(1):56–63, January 1989.

128
Bibliography

Zhihong Liu, Bruce W. McGaughy, and James Z. Ma. Design tools for reliability analysis.
In ACM/IEEE Design Automation Conference (DAC), pages 182–187, 2006.

Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Aging Analysis of Circuit
Timing Considering NBTI and HCI. In IEEE International On-Line Testing Sympo-
sium (IOLTS), pages 3–8, June 2009a.

Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Alterungsanalyse digitaler


Schaltungen auf Gatterebene. In GMM/GI/ITG-Fachtagung Zuverlässigkeit und En-
twurf, pages 81–86. VDE Verlag GMBH, September 2009b.

Dominik Lorenz, Martin Barke, Daniel Mueller-Gritschneder, Georg Georgakos, and


Ulf Schlichtmann. Aging model for timing analysis at register-transfer-level. In
ACM/IEEE International Workshop on Timing Issues in the Specification and Syn-
thesis of Digital Systems, March 2010a.

Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Aging analysis at gate and
macro cell level. In IEEE/ACM International Conference on Computer-Aided Design
(ICCAD), pages 77–84, November 2010b.

Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Timing-Modell für Makrozellen
zur Alterungsanalyse. In GMM/GI/ITG-Fachtagung Zuverlässigkeit und Entwurf,
pages 41–47, September 2010c.

Dominik Lorenz, Georg Georgakos, and Ulf Schlichtmann. Aging-aware Timing Analysis
of Combinatorial Circuits on Gate Level. it - Information Technology, 4, August 2010d.

Dominik Lorenz, Martin Barke, and Ulf Schlichtmann. Efficiently analyzing the impact of
aging effects on large integrated circuits. Microelectronics Reliability, (0):–, 2012. ISSN
0026-2714. doi: 10.1016/j.microrel.2011.12.029. URL http://www.sciencedirect.
com/science/article/pii/S0026271411005622.

Xiang Lu, Zhuo Li, Wangqi Qiu, D. M. H. Walker, and Weiping Shi. Longest-path
selection for delay test under process variation. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, pages 1924 – 1929, December 2005.

Yinghai Lu, Li Shang, Hai Zhou, Hengliang Zhu, Fan Yang, and Xuan Zeng. Statistical
reliability analysis under process variation and aging effects. In ACM/IEEE Design
Automation Conference (DAC), pages 514–519, July 2009.

Hong Luo, Yu Wang, Ku He, Rong Luo, Huazhong Yang, and Yuan Xie. A Novel
Gate-Level NBTI Delay Degradation Model with Stacking Effect. In Nadine Azemard
and Lars Svensson, editors, Integrated Circuit and System Design. Power and Timing
Modeling, Optimization and Simulation, volume 4644 of Lecture Notes in Computer
Science, pages 160–170. Springer Berlin / Heidelberg, 2007a.

Hong Luo, Yu Wang, Ku He, Rong Luo, Huazhong Yang, and Yuan Xie. Modeling
of PMOS NBTI Effect Considering Temperature Variation. In IEEE International

129
Bibliography

Symposium on Quality Electronic Design (ISQED), pages 139–144, Washington, DC,


USA, 2007b. IEEE Computer Society.
R. E. Lyons and W. Vanderkulk. The Use of Triple-Modular Redundancy to Improve
Computer Reliability. IBM Journal of Research and Development, 6(2):200–209, April
1962.
Elie Maricau and Georges Gielen. Efficient Variability-Aware NBTI and Hot Carrier Cir-
cuit Reliability Analysis. IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, 29(12):1884–1893, December 2010.
J. Greg Massey. NBTI: What We Know and What We Need to Know. In IEEE Inter-
national Integrated Reliability Workshop Final Report, pages 199–211, 2004.
Tobias Massier, Helmut Graeb, and Ulf Schlichtmann. The Sizing Rules Method for
CMOS and Bipolar Analog Integrated Circuit Synthesis. IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 27(12):2209–2222, De-
cember 2008.
Evelyn Mintarno, Joelle Skaf, Rui Zheng, Jyothi Velamala, Yu Cao, Stephen P. Boyd,
Robert W. Dutton, and Subhasish Mitra. Optimized self-tuning for circuit aging. In
Design, Automation and Test in Europe (DATE), pages 586–591, 2010.
Natasa Miskov-Zivanov and Diana Marculescu. Modeling and Optimization for Soft-
Error Reliability of Sequential Circuits. IEEE Transactions on Computer-Aided De-
sign of Integrated Circuits and Systems, 27(5):803–816, May 2008.
Yoshio Miura and Yasuo Matukura. Investigation of Silicon-Silicon Dioxide Interface
Using MOS Structure. Japanese Journal of Applied Physics, page 180, 1966.
Gordon E. Moore. Cramming More Components onto Integrated Circuits. International
Journal of High Speed Electronics and Systems, 38(8), April 1965.
Farid N. Najm. Transition Density: A Stochastic Measure of Activity in Digital Circuits.
In ACM/IEEE Design Automation Conference (DAC), pages 644 – 649, June 1991.
Farid N. Najm. Transition Density: A New Measure of Activity in Digital Circuits.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 12
(2):310 – 323, February 1993.
Farid N. Najm. A survey of power estimation techniques in VLSI circuits. IEEE Transac-
tions on Very Large Scale Integration (VLSI) Systems, 2(4):446–455, December 1994.
Sani R. Nassif. Design for variability in DSM technologies. In IEEE International
Symposium on Quality Electronic Design, March 2000.
Bipul C. Paul, Kunhyuk Kang, Haldun Kufluoglu, M.A. Alam, and K. Roy. Temporal
Performance Degradation under NBTI: Estimation and Design for Improved Relia-
bility of Nanoscale Circuits. In Design, Automation and Test in Europe (DATE),
volume 1, pages 169–174, Los Alamitos, CA, USA, 2006. IEEE Computer Society.

130
Bibliography

Christian Piguet. Low-power electronics design. CRC Press, 2005.

Lawrence T. Pillage, Ronald A. Rohrer, and Chandramouli Visweswariah. Electronic


Circuit and System Simulation Methods. McGraw-Hill, Inc., 1995.

Jessica Qian, Satyamurthy Pullela, and Lawrence T. Pillage. Modeling the “Effective ca-
pacitance" for the RC interconnect of CMOS gates. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 13(12), December 1994.

Stewart E. Rauch III. The statistics of NBTI-induced VT and beta mismatch shifts in
pMOSFETs. IEEE Transactions on Device and Materials Reliability, pages 89 – 93,
December 2002.

Hans Reisinger, O. Blank, Wolfgang Heinrigs, A. Muhlhoff, Wolfgang Gustin, and Chris-
tian Schlünder. Analysis of NBTI Degradation- and Recovery-Behavior Based on
Ultra Fast VT-Measurements. In IEEE International Reliability Physics Symposium
(IRPS), pages 448–453, March 2006.

Hans Reisinger, O. Blank, Wolfgang Heinrigs, Wolfgang Gustin, and Christian Schlün-
der. A Comparison of Very Fast to Very Slow Components in Degradation and Re-
covery Due to NBTI and Bulk Hole Trapping to Existing Physical Models. IEEE
Transactions on Device and Materials Reliability, 7(1):119–129, 2007.

Renesas. Semiconductor Reliability Handbook. Renesas Electronics, 2008.

T. Sakurai and A. R. Newton. Alpha-Power Law MOSFET Model and its Applications
to CMOS Inverter Delay and Other Formulas. IEEE Journal of Solid-State Circuits
SC, 25(2):584–594, April 1990.

Sachin S. Sapatnekar. Timing. Kluwer Academic Publishers, 2004.

Louis Scheffer, Luciano Lavangno, and Grant Martin, editors. EDA for IC implemen-
tation, circuit design, and process technology. Electronic Design Automation for Inte-
grated Circuits Handbook. CRC Press, Boca Raton, 2006.

Christian Schlünder, J. M. Berthold, M. Hoffmann, J.-M. Weigmann, Wolfgang Gustin,


and Hans Reisinger. A New Smart Device Array Structure for Statistical Investiga-
tions of BTI Degradation and Recovery. In IEEE International Reliability Physics
Symposium (IRPS), May 2011.

Dieter K. Schroder and Jeff A. Babcock. Negative bias temperature instability: Road
to cross in deep submicron silicon semiconductor manufacturing. Journal of Applied
Physics, 94(1), 2003.

G. Semeraro, G. Magklis, R. Balasubramonian, D.H. Albonesi, Sandhya Dwarkadas,


and Michael A. Scott. Energy-efficient processor design using multiple clock domains
with dynamic voltage and frequency scaling. In Symposium on High-Performance
Computer Architecture, pages 29–40, February 2002.

131
Bibliography

Ellen M. Sentovich, Kanwar Jit Singh, Luciano Lavagno, Cho Moon, Rajeev Murgai,
Alexander Saldanha, Hamid Savoj, Paul R. Stephan, Robert K. Brayton, and Al-
berto L. Sangiovanni-Vincentelli. SIS: A System for Sequential Circuit Synthesis.
Memorandum UCB/ERL M92/41, Electronics Research Laboratory, University of
California, Berkeley, CA 94720, May 1992.
M. Sharma and J. H. Patel. Finding a small set of longest testable paths that cover
every gate. In IEEE International Test Conference (ITC), pages 974 – 982, December
2002.
Alexander Stempkovsky, Alexey Glebov, and Sergey Gavrilov. Calculation of Stress
Probability for NBTI-Aware Timing Analysis. In IEEE International Symposium on
Quality Electronic Design (ISQED), pages 714–718, March 2009.
Alvin W. Strong, Ernest Y. Wu, Rolf-Peter Vollertsen, Jordi Sune, Giuseppe La Rosa,
Stewart E. Rauch III, and Timothy D. Sullivan. Reliability Wearout Mechanisms in
Advanced CMOS Technologies. Series on Microelectronic Systems. IEEE Press, 2009.
Dennis Sylvester, David Blaauw, and Eric Karl. Elastic: An adaptive self-healing archi-
tecture for unpredictable silicon. IEEE Design & Test of Computers, 23(6):484–490,
2006.
Synopsys. Composite Current Source. http://www.synopsys.com/products/
solutions/galaxy/ccs/cc_source.html, 2006.
Synopsys. HSPICE User Guide: Simulation and Analysis, September 2008.
E. Talpes and D. Marculescu. Toward a multiple clock/voltage island design style for
power-aware processors. Very Large Scale Integration (VLSI) Systems, IEEE Trans-
actions on, 13(5):591 –603, may 2005. ISSN 1063-8210. doi: 10.1109/TVLSI.2005.
844305.
James Tschanz, Keith A. Bowman, Steve Walstra, Marty Agostinelli, Tanay Karnik,
and Vivek De. Tunable replica circuits and adaptive voltage-frequency techniques for
dynamic voltage, temperature, and aging variation tolerance. In Symposium on VLSI
Circuits, pages 112–113, June 2009.
Robert H. Tu, Elyse Rosenbaum, Wilson Y. Chan, chester C. Li, Eric Minami, Khandker
Quader, Ping Keuung Ko, and Chenming Hu. Berkeley Reliability Tools - BERT.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
12:1524–1533, 1993.
John P. Uyemura. CMOS logic circuit design. Kluwer Academic Publisher, 2001.
Chandu Visweswariah, Kaushik Ravindran, Kerim Kalafala, Steven G. Walker, Samba-
sivan Narayan, Daniel K. Beece, Jeff Piaget, Natesan Venkateswaran, and Jeffrey G.
Hemmet. First-Order Incremental Block-Based Statistical Timing Analysis. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25(10),
October 2006.

132
Bibliography

Wenping Wang, Zile Wei, Shengqi Yang, and Yu Cao. An efficient method to iden-
tify critical gates under circuit aging. In IEEE/ACM International Conference on
Computer-Aided Design (ICCAD), pages 735–740, Piscataway, NJ, USA, 2007a. IEEE
Press.

Wenping Wang, Shengqi Yang, Sarvesh Bhardwaj, Rakesh Vattikonda, Sarma Vrudhula,
Frank Liu, and Y. Cao. The impact of NBTI on the performance of combinational
and sequential circuits. In ACM/IEEE Design Automation Conference (DAC), pages
364–369, New York, NY, USA, 2007b. ACM.

Wenping Wang, Shengqi Yang, and Yu Cao. Node Criticality Computation for Cir-
cuit Timing Analysis and Optimization under NBTI Effect. In IEEE International
Symposium on Quality Electronic Design (ISQED), pages 763–768, March 2008.

Yu Wang, Xiaoming Chen, Wenping Wang, Varsha Balakrishnan, Yu Cao, Yuan Xie,
and Huazhong Yang. On the efficacy of input Vector Control to mitigate NBTI effects
and leakage power. In IEEE International Symposium on Quality Electronic Design
(ISQED), pages 19–26, March 2009a.

Yu Wang, Xiaoming Chen, Wenping Wang, Yu Cao, Yuan Xie, and Huazhong Yang.
Gate replacement techniques for simultaneous leakage and aging optimization. In
Design, Automation and Test in Europe (DATE), pages 328–333, April 2009b.

Wikipedia. Altern — wikipedia, die freie enzyklopädie, 2011. URL http://de.


wikipedia.org/w/index.php?title=Altern&oldid=92273683. [Online; Stand 23.
August 2011].

Kai-Chiang Wu and Diana Marculescu. Joint logic restructuring and pin reordering
against NBTI-induced performance degradation. In Design, Automation and Test in
Europe (DATE), pages 75–80, April 2009.

Lifeng Wu, Jingkun Fang, Hirokazu Yonezawa, Yoshiyuki Kawakami, Nobufusa Iwan-
ishi, Heting Yan, Ping Chen, Alvin I-Hsien Chen, Norio Koike, Yoshifumi Okamoto,
and Chune-Sin Ye. GLACIER: a hot carrier gate level circuit characterization and
simulation system for VLSI design. In IEEE International Symposium on Quality
Electronic Design (ISQED), pages 73–79, 2000.

Michael G. Xakellis and Farid N. Najm. Statistical Estimation of the Switching Activity
in Digital Circuits. In ACM/IEEE Design Automation Conference (DAC), pages 728
– 733, June 1994.

Gary Kok-Hoo Yeap. Practical Low Power Digital VLSI Design . Springer, 1998.

L. Zhang, W. Chen, Y. Hu, J. A. Gubner, and C. C.-P. Chen. Correlation-Preserved Non-


Gaussian Statistical Timing Analysis with Quadratic Timing Model. In ACM/IEEE
Design Automation Conference (DAC), 2005.

133
Bibliography

Vladimir Zolotov, Jinjun Xiong, Hanif Fatemi, and Chandu Visweswariah. Statistical
Path Selection for At-Speed Test. IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, pages 749 – 759, May 2010.

134
List of Figures

1.1. IC design flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12


1.2. Aging-aware timing analysis of a circuit. Aging effects degrade transistor
parameter, which results in increased gate delays over time. The critical
path delay increases as well and the timing specification might be violated
during the specified lifetime. . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1. LUT-based gate model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


2.2. Circuit and corresponding timing graph . . . . . . . . . . . . . . . . . . . 18
2.3. Computation of the arrival time (AT). . . . . . . . . . . . . . . . . . . . . 19
2.4. Example of the incremental timing algorithm. Arrival time at red (dark
grey) nodes is not valid. To update arrival time at node T, all invalid
arrival times are recursively updated (dashed arrows). . . . . . . . . . . . 22
2.5. Diagram of a sequential logic circuit. The timing constraints (setup and
hold time) of a flip-flop are given as well. . . . . . . . . . . . . . . . . . . 23
2.6. An example for calculating the branch slacks. . . . . . . . . . . . . . . . . 24
2.7. TG with branch slacks (arc between to edges) and delays to sink (number
next to the node) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8. Aged LUT-based gate model as proposed in [Chen et al., 2011]. . . . . . . 29
2.9. Gate delay degradation as a linear function of ∆Vth . . . . . . . . . . . . 30
2.10. Transformation of arbitrary signals into periodic signals with same signal
probability and transition density. . . . . . . . . . . . . . . . . . . . . . . 31
2.11. Drawing of an NBTI threshold voltage drift caused by consecutive stress
and relaxation phases (thin black line) and the ∆Vth drift given by the
long term prediction model (thick orange line). . . . . . . . . . . . . . . . 31

3.1. 36 mV Vth drift due to NBTI at 1.2 V VDD (a). Sensitivity of the gate
delay degradation to a threshold voltage drift (b). Hence, NBTI causes
about 10 % degradation of the output delay for a rising input transition. . 36
3.2. Cross section of a PMOS transistor. . . . . . . . . . . . . . . . . . . . . . 37
3.3. Output characteristic of a PMOS transistor for altered values of ∆Vth . . . 39
3.4. Time dependence of Vth drift due to NBTI. . . . . . . . . . . . . . . . . . 40
3.5. Temperature dependence of ∆Vth for altered values of Vgs . . . . . . . . . . 40
3.6. Transistor width dependence. Marked is the minimal transistor width
used in the standard cell libraries. . . . . . . . . . . . . . . . . . . . . . . 41
3.7. Drift over time for an AC stress. . . . . . . . . . . . . . . . . . . . . . . . 42
3.8. Duty cycle dependence of NBTI. . . . . . . . . . . . . . . . . . . . . . . . 43
3.9. Drain avalanche hot carrier. . . . . . . . . . . . . . . . . . . . . . . . . . . 45

135
List of Figures

3.10. Channel hot carrier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45


3.11. Voltage, temperature and lifetime dependence of HCI. . . . . . . . . . . . 46
3.12. (a) HCI equivalent circuit for a degraded transistor. VDeg and IDeg depend
on ∆Ion . (b) Output characteristic of an NMOS transistor for altered
values of ∆Ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.13. Inverter gate and waveform. . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.14. NOR gate with two inputs. . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.15. Fan-out-3 structure: All gates in the test structure are identical to the
DUT. The voltage source generates a step function. To have a realistic
input signal at the DUT, the step function has to propagate through two
gates before reaching the DUT. Those two gates and the DUT have to
drive three gates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.16. Supply voltage dependence. . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.17. Temperature dependence. . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.18. Dependence on driving strength and gate type. . . . . . . . . . . . . . . . 51
3.19. Dependence on transistor type and process corner. . . . . . . . . . . . . . 52
3.20. Dependence on input load and output slope. . . . . . . . . . . . . . . . . . 52
3.21. Dependence of output slope degradation on supply voltage and temperature. 53
3.22. Supply voltage and temperature dependence for HCI. . . . . . . . . . . . 53
3.23. Schematic of master-slave flip-flop. . . . . . . . . . . . . . . . . . . . . . . 54
3.24. Plot of sensitivities for setup and hold time. . . . . . . . . . . . . . . . . . 54
3.25. Sequential circuit with setup and hold time. . . . . . . . . . . . . . . . . . 55
3.26. (a) Change of Pshort−circuit by altering Vth . Pshort−circuit decreases for a
rising and a falling input transition. (b) Subthreshold current for a PMOS
transistor (with Vgs = 0 V and Vds = 1.2 V) for altered ∆Vth values. . . . 57
3.27. Vertical electrical field over technologies at nominal supply voltage. . . . . 58
3.28. Transistor drifts due to NBTI and for different technologies at nominal
supply voltage (a) and at a supply voltage of 1.2 V (b). . . . . . . . . . . . 59
3.29. HCI over technology nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.30. Sensitivity of the inverter delay for different technologies. . . . . . . . . . 60
3.31. Degradation of inverter delay for different technologies and use profiles. . 60

4.1. Aging analysis flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65


4.2. An example on calculating signal probabilities . . . . . . . . . . . . . . . . 68
4.3. Degradation of inverter delay by ∆Ion and ∆Vth , respectively. Solid lines
show dependencies calculated with sensitivities and dotted lines show de-
pendencies simulated on transistor level. Analyzing conditions are 27 ◦C,
1.2 V and 15 pF capacitive load. . . . . . . . . . . . . . . . . . . . . . . . 71
4.4. NOR gate with three inputs . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5. Example explaining the signal dependence. . . . . . . . . . . . . . . . . . 73
4.6. OR gate with three inputs and an internal signal int. . . . . . . . . . . . . 75
4.7. Complex gate implementing the logic function z = a · (b + c). . . . . . . . 79

136
List of Figures

4.8. Ring oscillator waveforms of fresh (leading waveform in magenta) and


aged (shifted waveforms in red and blue) simulations. The transistor drifts
for the aged simulations were determined once by the fresh waveform
and the aged waveform. Independent of which waveform was taken to
determine the drifts, the aged waveforms are almost indistinguishable . . 81
4.9. Frequency degradation of a 65 nm inverter ring oscillator stressed for 500 h
at defined stress conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.10. The five slowest output arrival times over lifetime for ISCAS’85 circuit
c880. Individual workloads for the gates were obtained for SP = 0.2 and
T D = 0.2 at primary inputs. Signals 866 and 874 change order with time. 83
4.11. Comparison of analysis with and without individual transistor drifts. . . . 84

5.1. TG annotated with arrival time and delay to sink at every node. . . . . . 87
5.2. Illustration of path delay reduction step. Edge (b, d) can be removed
because the delay of path P is less than the delay of path Pother . . . . . . 89
5.3. Illustration of arrival time reduction step. Edge (d, e) can be removed
because arrival time interval along edge (d, e) is less than the arrival time
at e after the max-operation. . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4. Example for the common edge reduction step. . . . . . . . . . . . . . . . . 91
5.5. Example that shows difference between proposed and exact method for
common edges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.6. Graphical representation of the common edge reduction step cases. Edge
(u, v) can be removed if aged delay of path U is smaller than fresh delay
of path V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.7. Path delay of an inverter chain (10 inverters) with respect to SP at the
input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.8. A general path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.9. Graphical representation of the constraints for the gate types. . . . . . . . 100
5.10. Basic idea for combining aging effects and process variations. . . . . . . . 104
5.11. The dotted circles indicate the aged performances. The circuit fails be-
cause the second adder needs the result before the first adder has finished
its calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.12. The timing graph of the ISCAS’85 circuit c17 is shown in (a). This is
a simplified TG because each net is just represented by one node. An
example of a reduced TG is shown in (b). . . . . . . . . . . . . . . . . . . 107
5.13. Distribution of delay degradation for 1000 workload samples. The dotted
line is the worst-case degradation when it is assumed that all PMOS
transistors degrade maximal. . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.14. Enhanced-scan design. The standard scan design is extended by hold
latches. Thereby, the first delay test vector V1 is latched by the hold
latches while the second delay test vector V2 is read into the scan chain. . 110
5.15. Path-based reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

137
List of Tables

2.1. Execution trace of the k most critical paths algorithm for the five slowest
paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2. Comparison of state-of-the-art gate models with the proposed aging-aware
gate model AgeGate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1. An example for a temperature profile. The lifetime is 10y and Vef f is Vnom . 76
4.2. Degradation of critical path delays for different analyzer settings. . . . . . 83

5.1. Minimal aged circuit delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 114


5.2. Reduction ratios of nodes and edges . . . . . . . . . . . . . . . . . . . . . 115
5.3. Comparison of the proposed approach to the approach from Baba and
Mitra [2009]. Shown are the initial number of gates and paths in the
circuit, the number of PCPs, the improvement in the number of PCPs of
our approach compared to Baba and Mitra [2009] and the runtimes with
and without considering process variations. . . . . . . . . . . . . . . . . . 117
5.4. Number of PCPs over time of circuit c3540 . . . . . . . . . . . . . . . . . 117

B.1. Detailed results for the proposed reduction steps . . . . . . . . . . . . . . 124

139
List of Algorithms

-. Function reset_node(node) . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
-. Function reset_edge(u,v) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1. Circuit delay computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
-. Function update_node(node) . . . . . . . . . . . . . . . . . . . . . . . . . . 21
-. Function update_edge(u,v) . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2. k most critical paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

-. Function update_edge_aged(u,v) . . . . . . . . . . . . . . . . . . . . . . . 66

3. Slack reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88


4. Path delay reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5. Arrival time reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6. Delay to sink reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7. Common edge reduction step . . . . . . . . . . . . . . . . . . . . . . . . . . 93

141
Acronyms
ASTA aging-aware static timing analysis

BDD binary decision diagram


BERT Berkeley reliability tools

CCSM composite current source model


CHC channel hot carrier
CPU central processing unit
CSM current source model

DAG directed acyclic graph


DAHC drain avalanche hot carrier
DUT device under test
DVFS dynamic voltage frequency scaling

ECSM effective current source model


EDA electronic design automation
EM electromigration

FF flip-flop

GDSII Graphic Database System II

HBD hard break down


HCI hot carrier injection
HDL hardware description language
HLS high-level synthesis

IC integrated circuit

LUT look-up table

MSFF master-slave flip-flop

143
Acronyms

NBTI negative bias temperature instability

PBTI positive bias temperature instability


PCP possible critical path
PI primary input
PO primary output

RD reaction diffusion
RTL register transfer level

SBD soft break down


SGHC secondary generated hot carrier
SHC substrate hot carrier
SPICE Simulation Program With Integrated Circuit Empha-
sis
SSTA statistical static timing analysis
STA static timing analysis

TA timing analysis
TDDB time-dependent dielectric breakdown
TG timing graph

VHDL very-high-speed integrated circuits hardware descrip-


tion language

144
List of Symbols
activation energy Ea
aged gate delay daged
aged path delay Daged
aged timing quantity qaged
arrival time AT

branch slack BS

clock frequency fCLK


clock period tCLK
clock-to-Q delay dCLK−to−Q
critical path Pcrit
critical path delay Dcrit
current supply voltage Vcurr
current temperature Tcurr

degradation of drain saturation current ∆Ion


degradation of gate delay ∆d
degradation of timing quantity ∆q
delay to sink D2S
drain current Id
drain saturation current Ion
drain source voltage Vds
duty factor DF

effective supply voltage Vef f


effective temperature Tef f

fresh critical path delay Dcrit,f resh


fresh gate delay df resh
fresh path delay Df resh
fresh timing quantity qf resh

145
List of Symbols

gate current Ig
gate delay d
gate source voltage Vgs

hold time tHLD

input slope sIN

join slack JS

leakage current Ileakage


leakage power Pleakage
lifetime tlif e

maximal circuit delay Dmax


minimal aged path delay Daged,min
minimal circuit delay Dmin
minimum transistor length Lmin

nominal supply voltage Vnom

output load CL
output slope sOU T
oxide thickness tox

parameter drift ∆p
path P
path delay D
probability that transistor is “on” Pon

required time REQT

setup time tSU P


short-circuit power Pshort−circuit
signal probability SP
silicon dioxide SiO2
sink node T
slack SLACK
slope s

146
List of Symbols

source node S
stress probability Pstress
stress probability HCI Pstress,HCI
stress probability NBTI Pstress,N BT I
stress time tstress
substrate current Isub
supply voltage VDD
switching power Pswitching

temperature T
threshold voltage Vth
threshold voltage drift ∆Vth
timing quantity q
transistor length L
transistor width W
transition density TD

147
Index

k most critical paths problem, 24 gate level, 15, 28


gate model, 12, 15, 69
AgeGate, 69 gray-box model, 107
aging analysis, 27
aging effect, 35 high-level synthesis, 106
aging-aware STA, 63 hold time, 15
arrival time, 18 hot carrier injection, 44

block-based, 18 incremental timing analysis, 18


branch slack, 24 integrated circuits, 9
interconnect network, 17
canonical gate model, 69
circuit delay, 18 late mode, 15
circuit level, 15, 27 layout, 15
clock gating, 31 logic synthesis, 11
combinational circuit, 17
controlling node, 19, 24 module, 106
corner case, 9, 103 multi-stage gate, 75
critical path, 18, 85
current source model, 17 negative bias temperature instability, 37

degradation equation, 39, 45, 70 operating conditions, 9


delay to sink, 21, 90 output load, 16
drift-related aging effect, 36 output slope, 63

early mode, 15 path, 18


effective capacitance, 17 path delay, 18
effective supply voltage, 40 path enumeration, 23
effective temperature, 40 path-based, 18
electromigration, 27, 35 pipelining, 22
enhanced-scan design, 109 place & route, 11
positive bias temperature instability, 44
false path, 19 possible critical path, 85
fan-out, 17 process variations, 9, 102
flip-flop, 53
radiation-induced soft errors, 27
gate level, 63 reaction diffusion model, 37

149
Index

required time, 21

scan design, 109


semi-custom design flow, 11
sensitizable, 19
sequential circuit, 22, 53
setup time, 15
sign-off, 11
signal probability, 66
single input switching assumption, 16
single-stage gate, 75
sink, 18
slack, 21
slope, 16
soft errors, 27
source, 18
spacial dependence, 67
SPICE, 15
standard cell library, 16, 77
static sensitization, 98
static timing analysis, 15
statistical static timing analysis, 103
storage elements, 22, 53
stress probability, 65, 71
stress time, 70
synthesis, 15

technology mapping, 11
temporal dependence, 67
time dependent dielectric breakdown, 35
time-complexity, 18
timing analysis, 12, 15
timing arc, 15
timing graph, 17
timing quantity, 19
timing sign-off, 15
transition density, 66
transition time, 16

use profile, 59

variability, 9

workload, 64, 66

150

You might also like