Professional Documents
Culture Documents
On On Line Testing Line Testing On On Line Testing Line Testing
On On Line Testing Line Testing On On Line Testing Line Testing
On On Line Testing Line Testing On On Line Testing Line Testing
CIRCUIT DESIGN TECHNIQUES
University of Ioannina
University of Ioannina
On‐‐Line Testing
On
Dept. of Computer Science and Engineering
Y. Tsiatouhas
CMOS Integrated Circuit Design Techniques
Overview
1. Reliability issues –
Reliability issues – Transient faults
2. Soft errors
3. Timing errors
4. Error tolerance design techniques
VLSI Systems
and Computer Architecture Lab
On-Line Testing 2
1
Reliability in Nanometer Technologies
Soft Error Rate (SER) increases
with technology scaling
SER vs VDD
On-Line Testing 3
Transient Faults and On
Transient Faults and On‐‐Line Testing
• Permanent Faults: faults that permanently affect the circuit operation.
• Temporary Faults: faults that do not affect permanently the circuit
operation and are discriminated to:
Transient Faults: due to random fault
f l generation mechanisms
h l k power
like
supply disturbance, electromagnetic interference, radiation e.t.c.
Intermittent Faults: due to the degradation of the circuit characteristics.
On-Line Testing 4
2
Requirements
We need design techniques and architectures that will guarantee the correct
operation under any circumstances. Techniques that will provide error resiliency
or error detection / correction capabilities.
We need self‐checking and concurrent testing mechanisms. We need self‐healing
architectures that will dynamically react to overcome technology and
environmental related variabilities and which will be capable to recover after an
error generation and will operate correctly even in the presence of failures.
Towards this direction,
direction error tolerance techniques have been proposed:
• error correction codes and self‐checking circuits and checkers
• error mitigation techniques aiming to reduce error rates
• error detection and correction methodologies
On-Line Testing 5
Radiation and Soft Errors (I)
Cosmic Rays Secondary
Cosmic Ray
Primaries
Low Energy
Disappear
Deflected
~25Km
Si
Secondary
Cosmic Rays ~1/cm2 ‐ s
(neutrons & pions)
at sea level Burst of
Electronic Charge
Earth 1M electrons/μm Si
Problem with Soft Errors in VLSI circuits due to Single Event Transients (SETs) and Single
Event Upsets (SEUs) generated by:
• emitted α‐particles by package impurities
• cosmic ray particles (neutrons, protons and pions)
On-Line Testing 6
3
Radiation and Soft Errors (II)
SET Transient pulse attenuation/masking!
0
1 1
1 D Q
0
1
0
0 1
1 0 CLK Transient Pulse
of Duration
SET CLK
Soft Error generation!
1
1 1
0 D
1 D Q
1
0 1
1 CLK
0
Q
Soft Error
On-Line Testing 7
Timing Errors (I)
• Transient faults due to crosstalk, power supply disturbance or ground bounce
and environmental variations (e.g. temperature gradients) contribute to
timing error generation. Device aging (BTI, HCI effects) is also an important
source off timing
ti i errors.
• Timing verification turns to be a hard task. Moreover, the increased path delay
deviations, due to process variations and the statistical behavior of nano‐
devices, as well as the manufacturing defects that affect circuit speed may
result in timing errors that are not easily detectable (in terms of test cost) in
high frequency and high device count ICs. Considering also the huge number
of paths in modern circuit designs along with the complexity of testing, it is
easy to
t realize
li that
th t a significant
i ifi t number b off defective
d f ti ICs IC may pass theth
fabrication tests.
• Modern systems running at multiple frequency and voltage levels may suffer
from timing errors due to environmental and process related (and also data
dependent) variabilities that can affect circuit performance.
On-Line Testing 8
4
Timing Errors (II)
1
Error free case
1 D Q
0
1 CLK
0
1 Timing Error generation! Delayed Pulse
by d
1 D Q
0
1 CLK
0
CLK
d
D
Q
On-Line Testing Timing Error 9
The Scan Design Topology of Intel
The S can Design Topology of Intel
CLK UPDATE
System
C1
C1 C2
Flip‐Flop Logic Stage
Logic Stage
g g Latch PH1
Latch PH2
Latch PH2
D2 Sj+1
Sj D1
D1
D2
CAPTURE D1 SO
C2 Latch Latch
LA LB
D1 C1
SI
C1
Scan Flip‐Flop
SCA SCB
Separated System and Scan Flip‐Flops
Extra cost of one Flip‐Flop
On-Line Testing 10
5
Intel’s Error Resilience Approach
CLK UPDATE
Main
C1
C1 C2
Flip‐Flop Logic Stage
Logic Stage
Logic Stage Latch PH1
Latch PH1
Latch PH2
D2 Sj+1
Sj D1
D1
XOR
XOR D2
D1 Scan_OUT
CAPTURE AND C2 Latch Latch
LA LB
D1 OR C1
Scan_IN C1
Error_L
Scan Flip‐Flop
SCA SCB
• Additional cost of: 1Flip‐Flop, 2 XOR, 1 AND and 1 OR per Flip‐Flop !
• The error detection latency is high ! Univ. of Stanford & Intel
IEEE Computer, 38 (2), 2005
On-Line Testing 11
The BISER Architecture
CLK
BISER Flip‐Flop
Main A
Logic Stage
Logic Stage D Flip‐Flop
Fli Fl C Logic Stage
Element
Sj Sj+1
Q
C
Secondary
Shadow B
Flip‐Flop
Latch
VDD Weak
Keeper
B
Q
B
Muller C‐Element
A
Univ. of Stanford & Intel
ICICDT, 2007
Gnd
On-Line Testing 12
6
The GRAAL Architecture
Φ1 Register
Latch
Logic Stage Logic Stage
0 Main
Sj MUX
Latch Sj+1
1
Error_j
Shadow XOR
Flip‐Flop
Φ2
Error_j+1
Φ1 Φ2
Φ2
Φ1
Φ1 TIMA & iRoc
Φ2 ITC, 2007
On-Line Testing 13
The Razor Topology
CLK
Razor Flip‐Flop
Logic Stage Logic Stage
0 Main
Sj MUX
Flip‐Flop
Fli Fl Sj+1
1
OR
Error_L
Shadow XOR
Latch
Error_Rj
D_CLK
E
Error
Error
Flip‐Flop
D_CLK
Univ. of Michigan & ARM
• Additional cost of: 1 Latch, 1 XOR, and 1 MUX per Flip‐Flop !
IEEE Computer, 37 (3), 2004
On-Line Testing 14
7
The Razor Operation (I)
CLK
CLK
Razor Flip‐Flop
Logic Stage Logic Stage
0 Main
Sj MUX
Flip‐Flop
Fli Fl Sj+1
1
OR 0
Shadow XOR 0 Error_L
Error_Rj
Latch
D_CLK
Error Free Case
0 E
Error
Error
Flip‐Flop
D_CLK
On-Line Testing 15
The Razor Operation (II)
CLK
CLK
Razor Flip‐Flop
Logic Stage Logic Stage
0 Main
Sj MUX
Flip‐Flop
Fli Fl Sj+1
1
OR 1
0
Shadow XOR 1
0 Error_L
Error_Rj
Latch
D_CLK
Erroneous Case
0
1 E
Error
Error
Flip‐Flop
D_CLK
D_CLK
• No error detection latency!
• One clock cycle cost for error correction!
On-Line Testing 16
8
The Time Dilation Topology
CLK TD Register
TD Flip‐Flop
Logic Stage D Logic Stage
0 Main
Sj MUX
M Flip‐Flop Q Sj+1
1
OR
Error_L
XOR
Error_Rj
Error
OR Error
Capture Flip‐Flop
C
Cap_CLK
CLK
Delay
Univ. of Ioannina & Athens
• Additional cost of: 1 XOR, and 1 MUX per Flip‐Flop.
IEEE ICECS, 2006
On-Line Testing 17
The Time Dilation Operation (I)
CLK
CLK TD Register
TD Flip‐Flop
Logic Stage D Logic Stage
0 Main
Sj MUX
M Flip‐Flop Q Sj+1
1
Error_L OR 0
XOR
Error_Rj
0
Error
0 OR Error
Capture Flip‐Flop
Error Free Case
C
Cap_CLK
CLK
Delay
On-Line Testing 18
9
The Time Dilation Operation (II)
CLK
CLK TD Register
TD Flip‐Flop
Logic Stage D Logic Stage
0 Main
Sj MUX
M Flip‐Flop Q Sj+1
1
Error_L OR 10
0
XOR
Error_Rj
1
0
Error
10 OR Error
Capture Flip‐Flop
Erroneous Case Cap
Cap_CLK
CLK
C
Cap_CLK
CLK
Delay
• No error detection latency!
• One clock cycle cost for error correction!
On-Line Testing 19
Timing Diagrams
Cycle i Cycle i+1 Cycle i+2 Cycle i+3
CLK
Valid Data
Timing Fault Delayed Arrival
Valid Data Correct
Arrival Cap_CLK
Capture
On-Line Testing 20
10
The Time Dilation Architecture
CLK
TD FF Reeg.
TD FF Reeg.
TD FF Reeg.
TD FF Reeg.
g
Stage g
Stage g
Stage g
Stage
IF ID EX MEM
Capture
OR
OR Error Error
FF
Cap_CLK
Delay
Pipeline Organization
On-Line Testing 21
Pipeline Recovery
Failing stage Erroneous stage
Time in cycles
IF ID EX MEM MEM
IF ID ~EX EX MEM
IF ID ID EX MEM
Instructions
IF IF ID EX
IF ID
Re‐execution with
correct values at
stage inputs
On-Line Testing 22
11
References
• “Making Typical Silicon Matter with Razor”, T. Austin, D. Blaauw, T. Mudge and K.
Flautner, IEEE Computers, vol. 37, no. 3, pp. 57‐65, 2004.
• “R
“Robust
b System
S D i with
Design i h Built‐In
B il I Soft‐Error
S f E R ili
Resilience”,
” S.
S Mitra,
Mi N Seifert,
N. S if M
M.
Zhang, Q. Shi and K‐S. Kim, IEEE Computers, vol. 38, no. 2, pp. 43‐52, 2005.
• “Built‐In Soft Error Resilience for Robust System Design,” S. Mitra et al,
International Conference on IC Design and Technology, pp. 263‐268, 2007.
• “GRAAL: A New Fault Tolerant Design Paradigm for Mitigating the Flaws of Deep
Nanometric Technologies,” M. Nicolaidis, International Test Conference, p. 4.3,
2007.
• “Testing and Reliable Design of CMOS Circuits”, N. Jha and S. Kundu, Kluwer
Academic Publishers, 1990.
On-Line Testing 23
12