Professional Documents
Culture Documents
Self Repair Technology For Logic Circuits: Architecture, Overhead and Limitations
Self Repair Technology For Logic Circuits: Architecture, Overhead and Limitations
Heinrich T. Vierhaus
BTU Cottbus
Computer Engineering Group
Outline
1. Introduction: Nano Structure Problems
2. The Problem of Wear-Out
3. Repair for Memory and FPGAs
4. Basic Logic Repair Strategies & Structures
5. Test and Repair Administration
6. De-Stressing Strategies
7. Cost, Overhead, Single Points of Failure
8. Summary and Conclusions
1. Introduction
Nanoelectronic Problems
Lithography:
The wavelength used to „map“ structural information from
masks to wafers is larger (4 times of more) than the minimum
structural features (193 versus 90 / 65 / 45 nm).
Adaptation of layouts for correction of mapping faults.
Light
source
Layout Correction
Modified layout
for compensation
of mapping faults
Poly-Si
n doping atom
n
p-Substrate
Nanostructure Problems
Individual device characteristics such as Vth are more dependent
on statistical variations of underlying physical features such
as doping profiles.
Primary Relevance: Yield
A significant share of basic devices will be „out or specs“ and needs
a replacement by backup elements for yield improvement after
production. Primary Relevance: Yield
Fault
event HW logic & Typically works
RT-level for transient and universal
detection & permanent faults!
compensation
Field-
Oxide
p n-well p
n n
Gate Metal 1
Oxide
(high-k)
Wear-Out Mechnisms
Metal Migration:
Metal atoms (Al, Cu) tend
to migrate under high current
density and high temperature.
Stress migration:
Migration effects may be enhanced
under mechanical stress conditons.
Effect:
Metal lines and vias may actually
cause line interrupts. The effect is
partly reversible by changing current
directions.
Metal Migration
neighbor
metal -wire under high current density:
new
neighbor
Open-defect
short
Vias are specially prone to such defects
The effect is reversible by reversing the direction of current flow !
Transistor Degradation
Negative Bias Thermal Instability (NBTI): Reduced switching speed
for p-channel MOS transistors that have operated under long-time constant
negative gate bias. The effect is partly reversible.
Hot Carrier Injection (HCI): Reduced switching speed for n-channel MOS
transistors, induced by positive gate bias and frequent switching.
Not reversible.
Gate Oxide Deterioration: Induced by high field strengh. Not reversible
Management of Wear-Out by
„Fault Tolerant Computing?
Built-in fault tolerance and error compensation are needed in nano-
technologies anyway and for the management of transient faults.
Fault in synchronous circuits and systems are detected „by clock cycle“.
Hence the detection does not even recognize if the fault is permanent
or not for many types of fault tolerant architecture.
Unit 3
Data Data
Error
Transmission / correction
Storage
Self Repair?
Software-based Works only
fault detection for transient faults! specific
& compensation
Fault
event HW logic & Typically works
RT-level for transient and universal
detection & permanent faults!
compensation
Self Repair for permanent faults!
Line
address
spare
column
columns
CREDES / ZUSYS / DAAD Summer School 2011, Tallinn
Computer Engineering
spare
column
Memory
BIST columns
controller
... is already state-of-the-art!
CREDES / ZUSYS / DAAD Summer School 2011, Tallinn
Computer Engineering
L W L W L Memory
W L W L W
Applic.
L W L W L Config.
SW &
W L W L W SW
data
L W L W L
row with
CLB CLB CLB CLB
faulty CLB
occupied
CLB CLB CLB CLB
CLBs
FF
Logic SRAM
in
in
M out
Logic U
X FF
Field
out
M
Redudant Row U FF
X
Logic
out
CREDES / ZUSYS / DAAD Summer School 2011, Tallinn SRAM
Computer Engineering
New-Config.
CLB WB CLB WB CLB WB CLB
Program
Memory
CLB WB CLB WB CLB WB CLB
Config.
CLB WB CLB WB CLB WB CLB
Scheme
Virtual CPU
CREDES / ZUSYS / DAAD Summer School 2011, Tallinn
Computer Engineering
Mainframes
Granularity of Replacement
Levels of Repair
Transistors - Switch Level
Replace transistors or transistor groups
Losses by reconfiguration: (switched-off „good“ devices):
Potentially small ( 20 – 50%) for transistor faults
Overhead for test and diagnosis: Very high
Repair overhead
Gate Level will dominate
Replace gates or logic cells reliability!
Losses by reconfiguration:
Medium (60 to 90 %) for single transistor faults
Overhead for test and diagnosis: High
Macro-Block Level
Replace functional macros (ALU, FPU, CPU)
Losses by reconfiguration: High, 99% or more
Overhead for test and diagnosis: Maybe acceptable
CREDES / ZUSYS / DAAD Summer School 2011, Tallinn
Computer Engineering
Load
1
Driver
Load
2
Gate-
short
Block-Level Repair
&
&
SE
& SE
SE
&
Functional Functional
Block 2 Block 2
Functional Functional
Block 3 Block 3
Replace- Replace-
Test in ment Test out Test in ment Test out
Block Block
1 2
Functional Functional
Block 2 Block 2
Functional Functional
Block 3 Block 3
Replace- Replace-
Test in ment Test out Test in ment Test out
Block Block
3 4
3 /4- 2-NAND 12 4 18 24
3 / 4 2-AND 18 6 18 24
3/4 2-XOR 18 6 18 24
H- Adder 36 12 24 30
F- Adder 90 30 30 36
For small basic blocks, the switches make the essential overhead (200%)!
For larger basic blocks,the overhead can be reduced to about 30-50%
... not counting test- and administration overhead!
Overhead
Transistors per RLB (3 functional units)
Switches
Basic Block functional backup Overhead
min. / ext.
Conf. Conf.
Test Generator
RLB RLB
BIST BIST
System
Monitoring
Test Analyzer
Conf.-Unit Conf.-Unit
Global
Control-Unit
Global
Control-Unit
Output Switches
Input Switches
Test is done by comparison
with reference outputs. The system is run
through states of re-configuration with the same Functional
input test pattern applied. Block n
At test, a functional unit is always removed
from normal operation and connected Replace-
to test I / O s. ment
Block
In case of a „fault detect“, Test in Test out
the system is fixed in the current status. Decoder
fix at fault
State Reg.
Such a procedure of self-test
Test clock Fault indicator
and self-reconfiguration can
Self Test Circ. Fault
run at every system start-up, avoiding flag
a central „fault memory“.
Reference
f1
Switches
Switches
scan path
>1
BISR
clock >1
Local Interconnects
The block-based repair scheme so far can not cover faults on wires between
re-configurable blocks.
For small basic blocks (such as logic gates) the majority of
wiring is between re-configurable units and not covered.
For larger (RT-level) basic blocks the majority of wiring
is within basic blocks and covered.
Schemes that can also cover inter-block wiring are possible,
but require FPGA-like configurable switching and complex switching schemes.
6. De-Stressing
Component
failure rates
failure curve
10-1 without de-stressing
failure curve
10-2
with de-stressing
10-3
10-4
t1 t2 t3 t4
System life time
0/1 tr =0
1
tr=1
tr =0
1/2
2
tr=1
tr =0
2/3
3
Switch control
signals
&
Decoder
FSM >1 &
FF
fault
flag
test
FF reset clock
FSM reset
tr
Overhead
Overhead factors:
Cost / Overhead
( 3 functional blocks plus 1 backup in RLB)
Basic Trans. Trans. Switch Contr.* Overhead
Block funct. backup Trans. Unit Tr. %
2-NAND 3* 4 4 30 81 /200 960 / 3600
H- Adder 3 * 12 12 40 81 /200 369 / 700
F- Adder 3 * 30 30 50 81 /200 179 / 311
2-bit ALU 3 * 352 352 140 81 /200 54.2 / 65.5
4-bit ALU 3 * 699 699 180 81 /200 45.8 / 51.5
8-bit ALU 3 * 1367 1367 260 81 /200 41.6 / 44.5
Sources of Overhead
Basic Complexity Overhead in %
Block (trans.) redund. switches control ctrl/destr.
1000
self repair plus de-stressing
self repair
100
33
10
switch
control
switch switch
control control
Compensates „always on“ and
„always off“
switch
control
1 2
Signal Reconfigu-
wiring rable
Logic Block
3
(RLB)
Short
A short condition between the signal input (Usign) and the control
input (Uctrl) may be solved by designing the gate input line (Rbr)
as a fuse. Then one additional transistor is needed as a „power sink“.
Blowing Fuses
CTL in
VDDhigh
n
fuse
gate
sin short n
p sout
Power-Sink-Transistor
Mixed
DSP Memory
Signal / RF
.. only a small fraction of the real system is truly irregular and needs
„expensive“ logic repair !
CREDES / ZUSYS / DAAD Summer School 2011, Tallinn
Computer Engineering
Needs Crtl.-
Logic-BISR Register File
Logic
Add Mult
Multiple parallel
Processing units
Extract obvious
regular blocks
RLB
Control
Random
Logic
Circuitry
done Compose
Find and extract RT-RLBs
regular entities Compose
Estimate
RLB control
Compose Reliability
Random Scheme
Gate-Level
Rest Logic
RLBs
CREDES / ZUSYS / DAAD Summer School 2011, Tallinn
Computer Engineering