On On Line Testing Line Testing On On Line Testing Line Testing

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

CMOS INTEGRATED 

CIRCUIT DESIGN TECHNIQUES
University of Ioannina
University of Ioannina

On‐‐Line Testing
On

Dept. of Computer Science and Engineering

Y. Tsiatouhas

CMOS Integrated Circuit Design Techniques

Overview
1. Reliability issues –
Reliability issues – Transient faults

2. Soft errors

3. Timing errors

4. Error tolerance design techniques

VLSI Systems
and Computer Architecture Lab

On-Line Testing 2

1
Reliability in Nanometer Technologies
Soft Error Rate (SER) increases 
with technology scaling

The evolution ((scaling)g) of CMOS technology gy


1MB SRAM results in:
• the reduction of the transistor size
• the increment of the operating frequency
• the reduction of the power supply
voltage
64K Dynamic • the increment of the transistors’ number
Logic in a die
IEDM ‐ 1999 which
hi h in
i turn
t affect
ff t the
th circuit
i it noise
i margins
i
and reduce their reliability.

SER vs VDD

On-Line Testing 3

Transient Faults and On
Transient Faults and On‐‐Line Testing
• Permanent Faults: faults that permanently affect the circuit operation.
• Temporary Faults: faults that do not affect permanently the circuit
operation and are discriminated to:
 Transient Faults: due to random fault
f l generation mechanisms
h l k power
like
supply disturbance, electromagnetic interference, radiation e.t.c.
 Intermittent Faults: due to the degradation of the circuit characteristics.

• On‐Line Testing: testing procedures are performed during the circuit


operation.
 Concurrent Testing: testing is performed concurrently to the circuit normal
operation.
ti
 Periodic Testing: testing is performed periodically when the circuit is in the
idle mode of operation.
• Off‐Line Testing: testing is performed when the circuit is not used (e.g.
manufacturing testing).

On-Line Testing 4

2
Requirements
We need design techniques and architectures that will guarantee the correct
operation under any circumstances. Techniques that will provide error resiliency
or error detection / correction capabilities.
We need self‐checking and concurrent testing mechanisms. We need self‐healing
architectures that will dynamically react to overcome technology and
environmental related variabilities and which will be capable to recover after an
error generation and will operate correctly even in the presence of failures.


Towards this direction,
direction error tolerance techniques have been proposed:
• error correction codes and self‐checking circuits and checkers
• error mitigation techniques aiming to reduce error rates
• error detection and correction methodologies

On-Line Testing 5

Radiation and Soft Errors (I)
Cosmic Rays Secondary 
Cosmic Ray
Primaries
Low Energy
Disappear
Deflected
~25Km

Si
Secondary
Cosmic Rays ~1/cm2 ‐ s
(neutrons & pions)
at sea level Burst of 
Electronic Charge
Earth 1M electrons/μm Si
Problem with Soft Errors in VLSI circuits due to Single Event Transients (SETs) and Single
Event Upsets (SEUs) generated by:
• emitted α‐particles by package impurities
• cosmic ray particles (neutrons, protons and pions)
On-Line Testing 6

3
Radiation and Soft Errors (II)
SET Transient pulse attenuation/masking!

0
1 1
1 D Q
0
1
0
0 1
1 0 CLK Transient Pulse
of Duration 

SET CLK
Soft Error generation!
1

1 1
0 D
1 D Q
1
0 1
1 CLK
0
Q
Soft Error
On-Line Testing 7

Timing Errors (I)
• Transient faults due to crosstalk, power supply disturbance or ground bounce
and environmental variations (e.g. temperature gradients) contribute to
timing error generation. Device aging (BTI, HCI effects) is also an important
source off timing
ti i errors.
• Timing verification turns to be a hard task. Moreover, the increased path delay
deviations, due to process variations and the statistical behavior of nano‐
devices, as well as the manufacturing defects that affect circuit speed may
result in timing errors that are not easily detectable (in terms of test cost) in
high frequency and high device count ICs. Considering also the huge number
of paths in modern circuit designs along with the complexity of testing, it is
easy to
t realize
li that
th t a significant
i ifi t number b off defective
d f ti ICs IC may pass theth
fabrication tests.
• Modern systems running at multiple frequency and voltage levels may suffer
from timing errors due to environmental and process related (and also data
dependent) variabilities that can affect circuit performance.

On-Line Testing 8

4
Timing Errors (II)
1
Error free case
1 D Q
0
1 CLK
0

1 Timing Error generation! Delayed Pulse
by d
1 D Q
0
1 CLK
0
CLK

d
D

Q
On-Line Testing Timing Error 9

The Scan Design Topology of Intel
The S can Design Topology of Intel
CLK UPDATE

System 
C1
C1 C2
Flip‐Flop Logic Stage 
Logic Stage 
g g Latch PH1
Latch PH2
Latch PH2
D2 Sj+1
Sj D1
D1

D2
CAPTURE D1 SO
C2 Latch  Latch 
LA LB
D1 C1
SI
C1

Scan Flip‐Flop

SCA SCB

Separated System and Scan Flip‐Flops

Extra cost of one Flip‐Flop

On-Line Testing 10

5
Intel’s Error Resilience Approach
CLK UPDATE

Main 
C1
C1 C2
Flip‐Flop Logic Stage 
Logic Stage
Logic Stage  Latch PH1
Latch PH1
Latch PH2
D2 Sj+1
Sj D1
D1

XOR

XOR D2
D1 Scan_OUT
CAPTURE AND C2 Latch  Latch 
LA LB
D1 OR C1
Scan_IN C1
Error_L
Scan Flip‐Flop

SCA SCB

• Additional cost of:  1Flip‐Flop,  2 XOR,  1 AND  and  1 OR  per  Flip‐Flop !

• The error detection latency is high ! Univ. of Stanford & Intel
IEEE Computer, 38 (2), 2005

On-Line Testing 11

The BISER Architecture
CLK

BISER Flip‐Flop
Main A
Logic Stage
Logic Stage  D Flip‐Flop
Fli Fl C  Logic Stage 
Element
Sj Sj+1
Q
C
Secondary
Shadow B
Flip‐Flop
Latch

VDD Weak 
Keeper

B
Q
B

Muller C‐Element  
A
Univ. of Stanford & Intel
ICICDT, 2007
Gnd

On-Line Testing 12

6
The GRAAL Architecture
Φ1 Register

Latch
Logic Stage  Logic Stage 
0 Main
Sj MUX
Latch Sj+1
1

Error_j
Shadow XOR
Flip‐Flop
Φ2

Error_j+1

Φ1 Φ2

Φ2
Φ1

Φ1 TIMA & iRoc
Φ2 ITC, 2007

On-Line Testing 13

The Razor Topology
CLK

Razor Flip‐Flop
Logic Stage  Logic Stage 
0 Main
Sj MUX
Flip‐Flop
Fli Fl Sj+1
1

OR
Error_L
Shadow XOR
Latch
Error_Rj

D_CLK

E
Error
Error
Flip‐Flop
D_CLK

Univ. of Michigan & ARM
• Additional cost of:   1 Latch, 1 XOR, and 1 MUX  per Flip‐Flop !
IEEE Computer, 37 (3), 2004

On-Line Testing 14

7
The Razor Operation (I)
CLK
CLK

Razor Flip‐Flop
Logic Stage  Logic Stage 
0 Main
Sj MUX
Flip‐Flop
Fli Fl Sj+1
1

OR 0
Shadow XOR 0 Error_L
Error_Rj
Latch

D_CLK
Error Free Case
0 E
Error
Error
Flip‐Flop
D_CLK

On-Line Testing 15

The Razor Operation (II)
CLK
CLK

Razor Flip‐Flop
Logic Stage  Logic Stage 
0 Main
Sj MUX
Flip‐Flop
Fli Fl Sj+1
1

OR 1
0
Shadow XOR 1
0 Error_L
Error_Rj
Latch

D_CLK
Erroneous Case
0
1 E
Error
Error
Flip‐Flop
D_CLK
D_CLK

• No error detection latency! 
• One clock cycle cost for error correction!

On-Line Testing 16

8
The Time Dilation Topology
CLK TD Register

TD Flip‐Flop
Logic Stage  D Logic Stage 
0 Main
Sj MUX
M Flip‐Flop Q Sj+1
1

OR
Error_L
XOR
Error_Rj

Error
OR Error
Capture Flip‐Flop

C
Cap_CLK
CLK

Delay

Univ. of Ioannina & Athens
• Additional cost of:   1 XOR, and 1 MUX  per Flip‐Flop.
IEEE ICECS, 2006

On-Line Testing 17

The Time Dilation Operation (I)
CLK
CLK TD Register

TD Flip‐Flop
Logic Stage  D Logic Stage 
0 Main
Sj MUX
M Flip‐Flop Q Sj+1
1

Error_L OR 0
XOR
Error_Rj
0
Error
0 OR Error
Capture Flip‐Flop
Error Free Case
C
Cap_CLK
CLK

Delay

On-Line Testing 18

9
The Time Dilation Operation (II)
CLK
CLK TD Register

TD Flip‐Flop
Logic Stage  D Logic Stage 
0 Main
Sj MUX
M Flip‐Flop Q Sj+1
1

Error_L OR 10
0
XOR
Error_Rj
1
0
Error
10 OR Error
Capture Flip‐Flop
Erroneous Case Cap
Cap_CLK
CLK
C
Cap_CLK
CLK

Delay

• No error detection latency!
• One clock cycle cost for error correction!

On-Line Testing 19

Timing Diagrams
Cycle i Cycle i+1 Cycle i+2 Cycle i+3

CLK
Valid Data 
Timing Fault Delayed Arrival
Valid Data Correct 
Arrival Cap_CLK

D Data i Data i+1 MUX Latch


Extended 
Memory State
MUX Latch 
Memory State Error

Capture

Q Data i Data i Data i+1

Correct Data Erroneous Data Error  Correct Data


Timing Error Correction

On-Line Testing 20

10
The Time Dilation Architecture
CLK

D0 Q0 Logic D1 Q1 Logic D2 Q2 Logic D3 Q3 Logic D4 Q4


TD FF Reeg.

TD FF Reeg.

TD FF Reeg.

TD FF Reeg.

TD FF Reeg.
g
Stage  g
Stage g
Stage  g
Stage 
IF ID EX MEM

Error_R0 Error_R1 Error_R2 Error_R3 Error_R4

Capture

OR
OR Error Error
FF

Cap_CLK
Delay
Pipeline Organization

On-Line Testing 21

Pipeline Recovery

Failing stage Erroneous stage
Time in cycles
IF ID EX MEM MEM

IF ID ~EX EX MEM

IF ID ID EX MEM
Instructions

IF IF ID EX

IF ID

Re‐execution with 
correct values at 
stage inputs

On-Line Testing 22

11
References

• “Making Typical Silicon Matter with Razor”, T. Austin, D. Blaauw, T. Mudge and K.
Flautner, IEEE Computers, vol. 37, no. 3, pp. 57‐65, 2004.
• “R
“Robust
b System
S D i with
Design i h Built‐In
B il I Soft‐Error
S f E R ili
Resilience”,
” S.
S Mitra,
Mi N Seifert,
N. S if M
M.
Zhang, Q. Shi and K‐S. Kim, IEEE Computers, vol. 38, no. 2, pp. 43‐52, 2005.
• “Built‐In Soft Error Resilience for Robust System Design,” S. Mitra et al,
International Conference on IC Design and Technology, pp. 263‐268, 2007.
• “GRAAL: A New Fault Tolerant Design Paradigm for Mitigating the Flaws of Deep
Nanometric Technologies,” M. Nicolaidis, International Test Conference, p. 4.3,
2007.
• “Testing and Reliable Design of CMOS Circuits”, N. Jha and S. Kundu, Kluwer
Academic Publishers, 1990.

On-Line Testing 23

12

You might also like