Professional Documents
Culture Documents
Embedded Deterministic Test For Low-Cost Manufacturing: IEEE Design and Test of Computers October 2003
Embedded Deterministic Test For Low-Cost Manufacturing: IEEE Design and Test of Computers October 2003
net/publication/3250485
CITATIONS READS
84 2,241
6 authors, including:
Jun Qian
Advanced Micro Devices
12 PUBLICATIONS 498 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Janusz Rajski on 21 July 2015.
Embedded
Deterministic Test for
Low-Cost Manufacturing
Janusz Rajski, Mark Kassab, Nilanjan Mukherjee, Jerzy Tyszer
and Nagesh Tamarapalli Poznan University of Technology
Mentor Graphics
Jun Qian
Cisco Systems
Editor’s note:
Consider an example circuit consist-
You have probably heard that BIST takes too long and its fault coverage is
ing of 10 million gates and 16 scan
low, and that deterministic test requires too many patterns. This article
chains. Typically, the number of scan
shows how on-chip compression and decompression techniques can
provide high fault coverage with low test times. cells is proportional to the design size.
—Rob Aitken, Artisan Components Thus, assuming one scan cell per 20
gates, the total test time to apply 10,000
scan patterns at a 20-MHz scan shift fre-
DFT BASED ON SCAN AND ATPG is a reliable and quency will be roughly 312 million test cycles or equiv-
widely accepted methodology that provides very high alently 15.6 seconds. As designs grow larger,
fault coverage. The automated process of scan insertion maintaining high test coverage becomes increasingly
and test generation guarantees high predictability and expensive because the test equipment must store a pro-
quality of results. Conventional ATPG systems generate hibitively large volume of test data and the test appli-
test sets that guarantee almost complete coverage of sev- cation time increases. Moreover, a very high and
eral types of fault models. Typically, when ATPG targets a continually increasing logic-to-pin ratio creates a test
fault, it determines the content of only a few scan cells and data transfer bottleneck at the chip pins. Accordingly,
fills the remaining positions with random values. Such a the overall efficiency of any test scheme strongly
fully specified pattern is more likely to detect additional depends on the method employed to reduce the
faults and can be stored on a tester. As a result of random amount of test data.1-4
fill, however, the test patterns are grossly overspecified. Several methods for compressing test data exploit the
For large circuits, a growing volume of test data signif- fact that test cubes frequently feature many unspecified
icantly increases test cost because of longer test time and positions.5-7 Unfortunately, these techniques can incur
elevated tester memory requirements. For a scan-based significant area overhead and cause inefficient tester
test, test data volume can be approximately expressed as use. The so-called Illinois scan scheme, which divides
the scan chains into partitions and shifts in the same test
Test data volume ≈ scan cells × scan patterns vector to each scan chain through a single scan input,
can reduce test application time and test data volume.8
Assuming balanced scan chains, the relationship This scheme’s performance strongly relies on the scan
between test time and test data volume is then chain configuration and therefore is not easily scalable.
Another scheme, using an on-product multiple-input
Test time ≈ (scan cells × scan patterns)/(scan signature register (OPMISR), accomplishes a test data
chains × frequency) reduction along with a twofold scan test time reduction.9
58 0740-7475/03/$17.00 © 2003 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
This scheme replaces the
test responses in conven-
Phase shifter
tional scan data with far
Scan
more compact signatures chain
and uses the tester’s repeat
Ring generator
capability to shift the same
values into the scan Design
core
chains for several consec-
Phase shifter
utive cycles to achieve
additional test data com-
pression. SmartBIST, a fur-
ther refinement of the
OPMISR architecture, uses Decompressor Compactor
tester channels to deliver
stimuli, and signatures are
observed only at the end
Compressed Compacted
of an unload cycle.10 The ATE
stimuli responses
test patterns, however, are
delivered in a compressed
form, and an on-chip Figure 1. Embedded deterministic test (EDT) architecture.
decoder expands them
into the actual data
loaded into scan chains. on the solid foundation of scan and ATPG and, at the
Since manufacturing test cost strongly depends on same time, fulfills all the requirements just outlined. (We
test data volume and test time, one of the key require- presented an earlier version of this article at the 2002
ments of a next-generation DFT methodology is to dra- IEEE International Test Conference.11)
matically reduce both factors. This new technology
should provide a way to achieve high quality testing EDT architecture
that generates and applies patterns cost effectively for Our EDT scheme consists of logic embedded on a
any fault model used today, as well as for any new fault chip and a new deterministic test pattern generation
models, and for defect-based testing. technique. As Figure 1 shows, the EDT logic, inserted
The staggering complexity of designs makes it cru- along the scan path outside the design core, consists of
cial to test an embedded complex block with many two main blocks: an on-chip decompressor and an on-
internal scan chains by accessing it through a simple, chip selective compactor.
narrow interface. A DFT solution should have minimal The design core requires no additional inserted logic
impact on present scan design flows and design styles. such as test points or X-bounding logic. Therefore, the
It should not require modification of the current scan EDT logic affects only scan channel inputs and outputs
and ATPG knowledge base and skill set or replacement and not functional paths. The EDT scheme’s primary
of existing test equipment. In other words, the user objective is to drastically reduce tester memory require-
should not have to make major changes as a precondi- ments, not to eliminate them altogether. Minimizing
tion for obtaining the test data volume and test time memory use reduces test time and increases tester
reduction benefits. Finally, the new methodology must throughput while maintaining test quality.
take long-term scalability into account. Thus, it must The ratio of internal scan chains to tester scan chan-
include compression technology that can deliver nels usually sets the maximum compression level. The
increasingly higher levels of compression and fit the design in Figure 1 has two scan channels and many
compressed test data into the current tester memory for short internal scan chains. From the tester’s viewpoint,
at least one decade, with no upgrades. however, the design appears to have two short scan
Here, we present an embedded deterministic test chains. In every clock cycle, the tester applies 2 bits to
(EDT) scheme that offers a high-quality test, ease of use, the decompressor inputs (1 bit on each input channel),
and broad applicability, with no design impact. It builds while the decompressor outputs load all internal scan
September–October 2003
59
Special ITC Section
Decompressor
Compactor
2,238 coverage as that of ATPG.
16 1,600 patterns,
16 The scan chain length
channels patterns, 2,400
channels
16 scan scan decreased almost 150
chains chains times. The effective reduc-
tion of scan test data vol-
ume and scan test time
(a) (b)
was 101 times. The actual
compression is lower than
Figure 2. Design example: ATPG (a) versus EDT (b). the target compression
because the maximum
number of specified bits is,
chains. Compared with ATPG, this EDT design has the in some cases, greater than a decompressor’s encoding
same number of scan channels interfacing with the capacity, and thus more test patterns are required.
tester but far more scan chains. Because the scan chains Moreover, handling a large number of X sources affects
are balanced, they are also much shorter. compression.
EDT pattern generation is truly deterministic. For a It is worth comparing how tester memory is used in
given testable fault, as with conventional ATPG, the EDT the two cases. One ATPG scan pattern occupies 11,292
scheme generates a pattern to satisfy the ATPG con- locations of tester memory, whereas one EDT pattern
straints and to avoid bus contention. The patterns gen- occupies 76 locations of the same memory. The mem-
erated and stored on the tester are the stimuli applied ory required for one ATPG scan pattern can store 148
to the decompressor and the responses observed on the EDT patterns. Moreover, in the time required to apply
compactor outputs. The application of one pattern one ATPG pattern, almost 148 EDT patterns can be
involves sending a compressed stimulus from the ATE applied. However, because EDT requires more patterns
to the decompressor. The continuous-flow decompres- than ATPG does to achieve the same fault coverage, the
sor receives the data on its inputs every clock cycle, and total time to apply all EDT vectors is effectively 101 times
it produces the necessary values in the scan chains to less than that of ATPG.
guarantee fault detection. The random fill is a decom-
pression side effect that occurs after the compressed Decompressor structure
pattern goes through the decompressor. The decompressor plays a crucial role in determin-
As in conventional testing, the tester directly controls ing the effectiveness of EDT test data compression, so it
and observes the functional I/O pins. In fact, to the tester must satisfy several requirements:
an EDT pattern looks exactly the same as an ATPG pat-
tern except that it is much shorter. The responses cap- ■ very low linear dependency in its outputs,
tured in the scan cells are compacted by the selective ■ very high operation speed,
compactor, shifted out, and compared with their golden ■ very low silicon area, and
references on the tester. To ensure that there is no alias- ■ high design modularity.
ing and that unknown states do not mask the fault
effects, the decompressor controls the compactor so The EDT on-chip decompressor deploys a new and
that only selected scan chains can stream their contents original architecture called a ring generator, a distinct
to the compactor if necessary. form of linear finite-state machine. To perform on-chip
Figure 2 shows a real design that further illustrates EDT decompression, the ring generator operates in con-
compression. For ATPG (Figure 2a), we configured the junction with a segmented linear phase shifter. The
design into 16 scan chains with a maximum length of phase shifter is necessary to drive a large number of
11,292 scan cells. The test required 1,600 patterns. For scan chains and to reduce linear dependencies
EDT (Figure 2b), we configured the same design into between sequences entering the scan chains. In addi-
16 17 + 18 19 + 20 21 + 22 + 23 24 + 25 + 26 27 28 29 + 30 31
tion, the phase shifter’s design guarantees balanced use mutually displace the produced sequences in various
of all memory elements in the ring generator. scan paths. The EDT phase shifter consists of XOR gates
with a few inputs (called XOR taps) to reduce propaga-
Ring generator. Figure 3 shows an example 32-bit ring tion delays. We obtained all results presented in this arti-
generator. We developed it by applying transformations cle assuming that the number of XOR taps equals three.
to the external-feedback LFSR featuring feedback poly-
nomial x32 + x18 + x14 + x9 + 1.12 The structure has three Decompression
main benefits over conventional linear-feedback shift The decompressor operates as follows. At the begin-
registers or cellular automata. First, the propagation ning of every pattern, the ATE shifts the first group of
delay introduced by the feedback logic is significantly data, called initial variables, into the ring generator. Data
less. In the worst case, in fact, only one two-input XOR shifted out of the decompressor into the scan chains dur-
gate lies between any pair of memory elements. ing those clock cycles does not become part of the
Second, the maximum internal fan-out is limited to only decompressed test pattern. Subsequently, the ATE scans
two devices fed by any stem in the ring generator. the next group of variables for decompression. Loading
Finally, the structure can drastically reduce the total the scan chains occurs in parallel with continuous injec-
length of feedback lines. tions of new variables into the ring generator. The total
The circuit in Figure 3, for instance, features only very number of shift cycles equals the number of initial
short connections that cause no frequency degradation. cycles, in addition to the length of the longest scan
Generally, a ring generator has fewer logic levels than a chain. Comprehensive experiments indicate that we can
corresponding external-feedback LFSR and a smaller maximize the probability of successful compression if
fan-out than the original internal-feedback LFSR. Hence, we choose the number of initial variables to equal 0.75D,
it can operate at higher speeds than conventional solu- where D is the decompressor size (the number of mem-
tions, and it meets layout and timing requirements. ory elements comprising the ring generator).
Injectors. The compressed test data go into the decom- Compression of test stimuli
pressor through input channels connected to ring gen- The concept of continuous-flow decompression rests
erator taps by additional XOR gates, called injectors, on the fact that deterministic test vectors typically have
placed between the memory elements. Each input only 1% to 5% of their bits specified. In fact, the average
channel usually splits into several internal injectors pro- number of specified bits is often well below 1%. The
viding the same test data to multiple memory elements remaining bits are randomly filled with 0s and 1s. We
at the same time. For instance, four external channels define test vectors with partially specified bit positions as
feed the ring generator in Figure 3, each having two test cubes. Compressing test cubes significantly reduces
injectors. Their locations ensure that test data quickly the volume of test data stored on the tester. The fewer the
reaches all the ring generator memory elements. specified bits, the better our ability to encode the infor-
mation into a compressed form. We exploit this ability by
Phase shifter. The linear phase shifter, added to the out- having a few inputs drive the circuit, whereas the circuit’s
puts of the ring generator memory elements in the form of memory elements are configured into a relatively large
a segmented XOR network, allows the ring generator to number of short scan chains. Consequently, a tester that
September–October 2003
61
Special ITC Section
September–October 2003
63
Special ITC Section
shifter pseudoexhaustively.
Scan Scan
channel channel Shared Functional Functional
We also thoroughly test all
inputs outputs pins outputs inputs the compactor’s masking
modes, including the selec-
Output
Output tion logic and the spatial
sharing
sharing
compactors. We test the
EDT optional bypass logic, if
Scan chain in synthesized, in the EDT
Decompressor mode with a sequential
pattern, as we simulate the
Compactor Core
Scan chain out bypass mode of operation
for several clock cycles.
Bypass These additional test pat-
terns guarantee more than
99% stuck-at coverage for
all the EDT logic. Finally,
before test pattern sign-off,
Figure 6. EDT logic for a given core. we can verify the generat-
ed patterns with a timing-
based simulator.
with the functional logic so that the design requires no
additional pins for EDT. Figure 6 shows a complete Experimental results
block diagram of the EDT logic along with the original We tested the EDT scheme on many industrial
core. The output-sharing logic consists of multiplexers designs. Here we present results for nine representative
that facilitate pin sharing. designs, ranging in size from 543,000 gates to 10.9 mil-
The EDT logic’s architecture depends primarily on the lion gates. For all the designs, we performed conven-
number of internal scan chains and the number of exter- tional ATPG as well as EDT by fixing the number of scan
nal scan channels in the given design. Only logic that lies channels and using internal scan chains. All the designs
at the interface of the EDT hardware and the scan chains have up to several thousand sources of observable
depends on the clocking of the first and last scan cells in unknown states. They are also random-pattern resistant.
every scan chain. The EDT logic is therefore pattern inde- Hence, none of them, without substantial modification,
pendent and in most cases need not be regenerated if a would work with logic BIST or any other scheme that
design changes. Once the EDT components are instanti- uses a MISR for response compaction. The circuits rep-
ated, we can insert the boundary scan logic and I/O pads. resent different design styles and scan methodologies.
On the other hand, if a netlist contains preinserted I/O The experiments used every available software pattern
pad cells, we place the EDT logic between the I/O pad compression technique for both ATPG and EDT, includ-
cells and the original core. Subsequently, we synthesize ing dynamic and static compaction. Thus, we com-
the EDT logic at the gate level along with the boundary pared EDT with the best conventional ATPG
scan logic and I/O pad cells. technology. We also included the patterns that tested
the EDT logic in the EDT pattern set.
Pattern generation and EDT logic test Table 1 summarizes the results of the experiments. For
The next step is the generation of the final set of EDT each circuit, the table presents the following information:
test patterns. The final test pattern set also contains a few
scan patterns (typically around 20 for a 10-time compres- ■ number of gates;
sion) that test the EDT logic and the scan chains’ integrity. ■ number of scan cells;
In addition to guaranteeing very high test coverage of the ■ ATPG test coverage (percentage of testable faults
EDT logic, these patterns help in debugging simulation detected by a conventional ATPG tool);
mismatches. The tester thoroughly exercises and observes ■ test coverage after generating compressed scan pat-
the ring generator as it shifts all the patterns through the terns using the EDT scheme;
decompressor into the scan chains. We test the phase ■ effective compression (the ratio of scan data volume
Parameter C1 C2 C3 C4 C5 C6 C7 C8 C9
Gates (millions) 0.543 0.576 1.2 1.2 1.5 2.1 2.6 3.8 10.9
Scan cells (thousands) 45 41 70 62 86 181 129 216 297
ATPG test coverage (%) 98.89 98.78 97.04 99.89 99.07 98.92 92.00 95.53 94.39
EDT coverage (%) 98.70 98.65 96.90 99.79 99.01 98.78 91.85 95.49 94.34
Compression factor 51 53 83 50 60 120 21 26 31
Channels/chains 16/1,600 16/1,600 16/1,600 16/1,600 8/1,600 12/2,400 2/157 2/146 1/104
Maximum specified bits 505 463 816 676 466 940 1,430 3,174 2,462
X state sources 139 61 376 259 1,943 556 14,483 2,611 33,432
Decompressor size (bits) 64 64 64 64 40 48 20 20 20
between the EDT and ATPG pattern sets, which also gates, conventional ATPG required 14.2 million test
represents the ratio of the number of scan cycles cycles to obtain 99.2% stuck-at test coverage. For the
between the two pattern sets and, therefore, the same circuit, EDT required 3.5 million test cycles to
reduction in test application time); obtain 99.2% test coverage for stuck-at faults and 93.6%
■ number of scan channels and number of internal for transition faults and generated 1,500 path delay pat-
scan chains, terns for testing critical paths. In other words, EDT
■ maximum number of specified bits; enabled the use of two additional fault models and still
■ number of sources of unknown states; and provided a compression rate four times that of ATPG tar-
■ on-chip decompressor size. geted for stuck-at faults only.
September–October 2003
65
Special ITC Section