Professional Documents
Culture Documents
Scan Test Bandwidth Management For Ultralarge-Scale System-on-Chip Architectures
Scan Test Bandwidth Management For Ultralarge-Scale System-on-Chip Architectures
Scan Test Bandwidth Management For Ultralarge-Scale System-on-Chip Architectures
6, JUNE 2015
Abstract— This paper presents several techniques employed with different power requirements and multiple power-supply
to resolve problems surfacing when applying scan bandwidth voltage levels. Many SoC-based test schemes proposed so
management to large industrial multicore system-on-chip (SoC) far utilize dedicated instrumentation, including test access
designs with embedded test data compression. These designs pose
significant challenges to the channel management scheme, flow, mechanisms (TAMs) and test wrappers [7], [9], [22]. TAMs
and tools. This paper introduces several test logic architectures are typically used to transfer test data between the SoC
that facilitate preemptive test scheduling for SoC circuits with pins and embedded cores, whereas test wrappers form the
embedded deterministic test-based test data compression. The interface between the core and SoC environment. Solutions
same solutions allow efficient handling of physical constraints in involving both TAMs and wrappers accomplish such tasks as
realistic applications. Finally, state-of-the-art SoC test scheduling
algorithms are rearchitected accordingly by making provisions optimizing test interface architecture [26] or control logic [20]
for: 1) setting up time-effective test configurations; 2) optimiza- while addressing routing and layout constraints or hierarchy
tion of SoC pin partitions; 3) allocation of core-level channels of cores [2], [5], scheduling test procedures [1], [18], [19],
based on scan data volume; and 4) more flexible core-wise [21], [31], and minimizing power consumption [8], [16], [30].
usage of automatic test equipment channel resources. A detailed Techniques proposed in [4], [9], [10], and [25] attempt to
case study is illustrated herein with a variety of experiments
allowing one to learn how to tradeoff different architectures and minimize SoC test time. The integrated scheme of [9] reduces
test-related factors. the test time by optimizing dedicated TAMs and pin-count-
Index Terms— Bandwidth management, embedded determin- aware test scheduling. Packet-switched networks-on-chip [25]
istic test (EDT), scan-based test, test access mechanism (TAM), can replace dedicated TAMs in testing of SoC by delivering
test application time, test compression, test scheduling. test data through an on-chip communication infrastructure.
There are techniques addressing synergistically TAM and
I. I NTRODUCTION wrapper design as well as test data compression [3], [11],
[12], [27]–[29]. In fact, test compression is now becoming
T ODAY’S multicore chip architectures require no trivial
test solutions imposed by the relentless miniaturization
of semiconductor devices, which have become much faster
an integral part of SoC-based design-for-test (DFT) schemes.
For example, the approach of [17] encodes tests for
and less power hungry than their predecessors. This trend has each core separately through linear feedback shift
given rise to the growing popularity of system-on-chip (SoC) register (LFSR) reseeding with time-multiplexed automatic
designs because of their ability to encapsulate many disparate test equipment (ATE) channels delivering data to successive
types of complex IP cores running at different clock rates cores. Similarly, a scheme of [32] utilizes static reseeding to
test SoCs, where all cores share a single LFSR. The XOR
Manuscript received February 19, 2014; revised May 14, 2014; accepted test logic performs compression in [6], which further guides
June 9, 2014. Date of publication July 16, 2014; date of current version a TAM design process.
May 20, 2015. The work of J. Janicki and J. Tyszer was supported by the
Semiconductor Research Corporation under Grant 2011-TJ-2117. ATE channel bandwidth management for SoC designs can
W.-T. Cheng, Y. Huang, M. Kassab, G. Mrugalski, N. Mukherjee, play a key role in increasing test data compression with
and J. Rajski are with Mentor Graphics Corporation, Wilsonville, no visible impact on test application time. The approach
OR 97070 USA (e-mail: wu-tung_cheng@mentor.com; yu_huang@
mentor.com; mark_kassab@mentor.com; grzegorz_mrugalski@mentor.com; presented in [13] encompasses: 1) a solver capable of using
nilanjan_mukherjee@mentor.com; janusz_rajski@mentor.com). input and output channels dynamically; 2) test scheduling
Y. Dong and G. Giles are with Advanced Micro Devices Inc., Sunnyvale, algorithms; and 3) TAM design schemes, all devised for
CA 94088 USA (e-mail: yan.dong@amd.com; grady.giles@amd.com).
J. Janicki and J. Tyszer are with the Faculty of Electronics and Telecommu- the embedded deterministic test (EDT) environment [24].
nications, Poznan University of Technology, Poznan 60-965, Poland (e-mail: It is assumed that all cores in the SoC are either hetero-
jjanicki@et.put.poznan.pl; jerzy.tyszer@put.poznan.pl). geneous modules, or wrapped testable units, and they come
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. with their individual EDT-based compression logic, which
Digital Object Identifier 10.1109/TVLSI.2014.2332469 is subsequently interfaced with ATE through an optimized
1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
CHENG et al.: SCAN TEST BANDWIDTH MANAGEMENT FOR ULTRALARGE-SCALE SoC ARCHITECTURES 1051
The output DSR (Fig. 1) interfaces all core outputs with the cores. Let L be a sorted list of setup classes with a class x
ATE by reducing the number of test response streams so that having the largest value of c · P(x) seating in the front.
they fit into the number of output ATE channels. It is made We begin by assigning the first element of L to the current
up of a number of multiplexers such that each multiplexer result b of merging to form a base. Then, the algorithm iterates
serves to connect several cores with a designated ATE output over all the remaining setup classes and expands the base by
channel. Again, the address registers specify which cores are one class at a time, always taking the first class from L that
to be observed for a given group of test patterns. The output satisfies all constraints, including availability of ATE channels
channel mapping is carried out in a manner similar to that of and the ability of DSRs to serve the base.
the In DSR. Note that all core outputs feature a user-defined The process of forming the base class terminates when
fan-out in order to increase flexibility of selecting observation either there are no more setup classes complementary with
points connected to the multiplexers. It is also assumed that the base, or one of the constraints cannot be satisfied. The
all core outputs need to be observed when tested. merging algorithm removes then the first element from list L
It is worth noting that a flexible use of ATE channels results and attempts to form another base cluster until the list of
in increased compression and elevated encoding efficiency. setup classes becomes empty, in which case the scheduling
Moreover, testing many cores in parallel by dynamically algorithm returns a list of base classes that determines the
allocating ATE channels to different cores in accordance with actual schedule, i.e., an order according to which cores will be
their needs may shorten test application time. Finally, having tested, as well as actual channel allocations. It can be further
individual decompressors running in appropriate time slots reordered to group tests for the same core in contiguous time
makes it easier to reduce the number of external channels intervals, finally forming test configurations indicating which
compared with the total number of EDT In featured by all ATE channels are to connect with which EDT channels of
cores together. In preparation for the actual test session, the which cores and for how long.
following steps are carried out (a detailed description of this
procedure can be found in [13]).
III. C ONTROL DATA D ELIVERY
1) For each core, automatic test pattern generation (ATPG)-
produced test cubes are passed to a solver that merges The approach summarized in Section II does not make any
and encodes them to arrive with final compressed test specific provisions for the way control data is delivered to
patterns; furthermore, the solver determines the minimal SoC test logic in order to setup test configurations. It appears,
number of EDT In channels needed to compress a given however, that the number of test configurations, and hence
test pattern. the amount of control data one needs to employ and transfer
2) For each test pattern at the core level, information between the ATE and DSR address registers, may visibly
regarding the minimal number of EDT In channels is impact test scheduling and the resultant test time. Conse-
paired with data concerning the required EDT output quently, we begin this paper by analyzing three alternative
channels. schemes that can be used to upload control bits and show
3) All test patterns are clustered to form groups (classes) how they determine the final SoC test logic architecture.
of patterns having identical both In channels and obser-
vation points.
4) Upon completion of the above operations for each core, A. Using IJTAG
all classes are passed to a test scheduler; given architec- The IEEE 1687 is a proposed standard for accessing
ture of both test access networks and various constraints on-chip test and debug features via the IEEE 1149.1 test
(ATE channels, DSR architecture, power consumption, access port (TAP). The purpose of this internal Joint Test
and others), it yields the final allocation of ATE In/output Action Group (IJTAG) standard is to automate the way one
channels to selected cores when applying successive test can manage on-chip instruments, and to describe a language
patterns. for communicating with them via the IEEE 1149.1 test data
Given both DSRs, one can now schedule application of test registers (TDRs). If there is an IJTAG network available on the
patterns. Every test pattern t is characterized by its descriptor SoC, and the total number of test configurations is relatively
comprising module m that is to be exercised when applying small, one can use it to deliver the control data, as shown in
test t, and the channel capacity c, i.e., the number of EDT In Fig. 2.
and outputs needed for this purpose. All test patterns having The SoC design of Fig. 2 has a single TAP and three
the same descriptor form a setup class x represented by its different blocks: 1) two cores (C1 and C2 ) under test and
PC P(x). Every setup class can be split into multiple segments 2) the DSR interfacing ATE with C1 and C2 . TAP can be
such that test patterns from the same class are applied in instructed to enable a test path via the IEEE 1687 segment
disjoint time intervals. The ability to preempt a class may insertion bits (SIBs). Every SIB is used to either enable or
improve the ATE channel utilization, shorten the total test disable the inclusion of an instrument into the path from a
time, and reduce the volume of control data. test data In to a test data output. The TDR in C1 or C2 can
In principle, test scheduling merges complementary setup be either bypassed or loaded with data putting both cores into
classes to form clusters of cores that can be tested in parallel specific test modes. The TDR in DSR receives the control
given constraints imposed by ATE channels and DSRs. Setup data indicating which core and which of its test channels are
classes are complementary if they comprise disjoint subsets of connected to which ATE channels.
CHENG et al.: SCAN TEST BANDWIDTH MANAGEMENT FOR ULTRALARGE-SCALE SoC ARCHITECTURES 1053
TABLE I
T OP PC C ORES
The above analysis clearly indicates that adjusting DSR bottleneck, which prevents test scheduling from reaching solu-
networks in accordance with the SDV statistics can maintain tions close to a theoretical boundary. Fortunately, as resizing
the central role of test scheduling in reducing test application of a decompressor does not impact SDV [14], increasing the
time. Thus, our approach groups cores with similar SDV number of In channels can visibly and safely reduce individual
numbers in order to handle every group—through properly EDT-based PCs due to advanced dynamic compaction methods
adapted DSR networks—in an individual manner determined that allow one to produce very compact test sets scaling almost
by cores’ test data demands. As can be seen in Fig. 7, linearly with the number of In.
141 cores are partitioned into three groups, each comprising Clearly, the cost of ATPG recomputing can be high, espe-
47 blocks. The accumulated SDVs (represented by the area cially for designs with hundreds of cores. Furthermore, it
below the red curve) for groups A, B, and C are equal may be impossible to increase EDT interface for every single
to 73%, 25%, and 2% of the total data volume, respec- core. Therefore, choosing top cores with excessive PCs is the
tively. The core groups that emerge from this partitioning solution. Here, the idea is to exclusively rebuild EDT interface
get DSR wires proportionally to their SDV requests. As a of these cores in such a way that the PC of the resultant test
result, cores with larger SDVs will have access to more ATE set is below the lower bound determined by the optimal test
channels than cores with less demanding requests. Back to schedule. In particular, the number of decompressor In should
Fig. 7—the group of cores that needs 73% of the total SDV be increased as indicated by a ratio between the number of
is likely to be served by a part of DSR networks that offer patterns produced for a baseline configuration and number of
the most comprehensive connectivity within the ATE interface. desired vectors. Table I presents stuck-at test pattern data for
For example, this approach may let every single core of group the top seven cores of the SoC design analyzed earlier. Let
A use one out of three ATE channels per In, while having us assume that the number of channels feeding these cores
cores of groups B and C constrained to two and one ATE doubles. For each core, successive columns report the number
channel, respectively. of EDT In, test coverage (TC), the PC for the baseline (left
The use of SDV figures in designing a DSR is remarkable part of the table) and resized (right part) EDT interfaces, and
on its own, especially when all SoC cores have their ATPG finally the ratio between the resultant PCs. As can be seen,
patterns ready. Still, the exact PC for each core may not always all cores reduce their PCs approximately by half, whereas the
be available at the DSR design stage. In this scenario, one can TC remains unaffected. As a result, the PCs for all cores are
run a compression analyzer [23] to estimate the total PC and now below the lower bound and testing can be completed in
corresponding test data volume. A costly ATPG for a large an optimal time.
number of cores can deploy in this case a fault sampling
instead of working with a complete fault list. These actions V. I NDUSTRIAL T EST C ASE
are feasible as the key parameter needed here is just an SDV
The schemes proposed in this paper were tested on a large
distribution among all cores rather than the accurate individual
industrial SoC design comprising 281 isolated cores. This
PC for each of them.
number includes 121 unique cores plus 20 groups of identical
cores, each having eight modules (note that the number
D. Selective Elevation of Pin Counts at Core Level of functionally different cores is thus 141—some statistics
A large number of patterns that are needed by a small regarding this design have already been used in Section IV).
fraction of cores is another factor that may severely impact When testing identical cores in one group, test stimuli are
the actual test time. Consider, for example, the core-level broadcasted to the In of these eight identical cores while test
PCs (Fig. 8) for the SoC design of Fig. 7. The total responses from all outputs of eight cores are observed together.
SDV—here 3 269 098—and 151 available channels indicate The core gate counts vary from a few million to more than
that the best schedule possible could reduce the test time 10M, whereas the core PCs ranges from a few thousands to
to 3 269 098/151 = 21 650 patterns. However, there are sev- over 36k.
eral cores that require more than 21 650 patterns (including The trend of producing such complex designs adds another
module 1 coming with more than 35 000 tests). They form a constraint to the already complicated test scheduling process—
CHENG et al.: SCAN TEST BANDWIDTH MANAGEMENT FOR ULTRALARGE-SCALE SoC ARCHITECTURES 1057
TABLE II
S O C C HARACTERISTICS
TABLE IV
E XPERIMENTAL R ESULTS - I
TABLE V
E XPERIMENTAL R ESULTS - II
[26] A. Sehgal, V. Iyengar, and K. Chakrabarty, “SOC test planning using Grady Gilles (M’90) received the B.S. degree
virtual test access architectures,” IEEE Trans. Very Large Scale Integr. in physics from Texas A&M University, College
(VLSI) Syst., vol. 12, no. 12, pp. 1263–1276, Dec. 2004. Station, TX, USA, in 1978.
[27] M. Tehranipoor, M. Nourani, and K. Chakrabarty, “Nine-coded com- He has been privileged to practice design for
pression technique for testing embedded cores in SoCs,” IEEE Trans. testability (DFT) on some of the most commercially
Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 6, pp. 719–731, successful microprocessors, first with Motorola,
Jun. 2005. Schaumburg, IL, USA, and is currently with
[28] Z. Wang, K. Chakrabarty, and S. Wang, “Integrated LFSR reseeding, Advanced Micro Devices, Sunnyvale, CA, USA,
test-access optimization, and test scheduling for core-based system-on- where he has been since 1999. He holds numerous
chip,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 28, patents and publications related to DFT. His current
no. 8, pp. 1251–1263, Aug. 2009. research interests include di/dt mitigation techniques
[29] Q. Xu and N. Nicolici, “Multi-frequency test access mechanism design and multiple orthogonal strategies to better utilize tester bandwidth.
for modular SOC testing,” in Proc. 13th Asian Test Symp. (ATS), 2004, Mr. Gilles was a co-recipient of the 1991 and 2008 Best Paper Awards at
pp. 2–7. the IEEE International Test Conference. He helped to develop two test-related
IEEE Standards, 1149.1 and 1500.
[30] D. Zhao and S. Upadhyaya, “Dynamically partitioned test scheduling
with adaptive TAM configuration for power-constrained SoC testing,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 6,
pp. 956–965, Jun. 2005.
[31] Q. Zhou and K. J. Balakrishnan, “Test cost reduction for SoC using
a combined approach to test data compression and test schedul- Yu Huang (M’99–SM’12) received the B.S. degree
ing,” in Proc. Des., Autom. Test Eur. Conf. Exhibit. (DATE), 2007, in electronic science and the M.S. degree in
pp. 39–44. semiconductor devices and technology from Nankai
[32] W. Zou, S. M. Reddy, I. Pomeranz, and Y. Huang, “SOC test scheduling University, Tianjin, China and the Ph.D. degree in
using simulated annealing,” in Proc. 21st VLSI Test Symp. (VTS), electrical and computer engineering from the Uni-
May 2003, pp. 325–330. versity of Iowa, Iowa City, IA, USA.
He is a Senior Research and Development
Staff Member with the Division of Silicon Test
Systems, Mentor Graphics Corporation, Wilsonville,
OR, USA. He has authored more than 100 research
papers, and holds seven U.S. patents and 16 patents
pending. His current research interests include VLSI testing, test compression,
and fault diagnosis.
Grzegorz Mrugalski (M’06–SM’13) received the Janusz Rajski (A’87–SM’10–F’11) received the
M.S. and Ph.D. degrees in electrical engineering M.S. and Ph.D. degrees in electrical engineering
from Poznań University of Technology, Poznań, from the Technical University of Gdańsk, Gdańsk,
Poland, in 1995 and 2002, respectively. Poland, and Poznań University of Technology,
He joined Mentor Graphics Corporation, Poznań, Poland, in 1973 and 1982, respectively.
Wilsonville, OR, USA, in 2002. Previously, He was a Faculty Member of the Poznań Uni-
he was with the Institute of Electronics and versity of Technology from 1973 to 1984. In 1984,
Telecommunications, Poznań University of he joined McGill University, Montreal, QC, Canada,
Technology. Since 2009, he has been the Manager where he became an Associate Professor in 1989. In
of the Research and Development Laboratory 1995, he was a Chief Scientist at Mentor Graphics
for Design-for-Test products at Mentor Graphics Corporation, Wilsonville, OR, USA. He has authored
Polska, Poznań, Poland. He has co-authored over 40 technical papers in the more than 150 research papers in these areas, and is a co-inventor of 84 U.S.
area of test, and holds 21 U.S. patents. His current research interests include patents. He is also the Principal Inventor of the Embedded Deterministic
computer-aided design of digital circuits, design for testability, built-in Test Technology used in the first commercial test compression product,
self-test, and test compression. TestKompress. His current research interests include the testing of VLSI
Dr. Mrugalski was a co-recipient of the 2011 Best Paper Award at the systems, design for testability, built-in self-test, and logic synthesis.
IEEE European Test Symposium and the 2012 IEEE International Test Dr. Rajski was the Guest Co-Editor of the special issue of the IEEE C OM -
Conference Most Significant Paper Award. MUNICATIONS M AGAZINE devoted to testing of telecommunication hardware
in 1999. He was also the Associate Editor of the IEEE T RANSACTIONS ON
C OMPUTER -A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS , the
IEEE T RANSACTIONS ON C OMPUTERS , and the IEEE D ESIGN AND T EST
OF C OMPUTERS M AGAZINE. He has served on technical program committees
of various conferences. He was a co-recipient of the 1993 Best Paper Award
for the paper on logic synthesis published in the IEEE T RANSACTIONS ON
C OMPUTER -A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS , the
1995 and 1998 Best Paper Awards at the IEEE VLSI Test Symposium, the
1999 and 2003 Honorable Mention Awards at the IEEE International Test
Conference, the 2006 IEEE Circuits and Systems Society Donald O. Pederson
Outstanding Paper Award recognizing the paper on embedded deterministic
test published in the IEEE T RANSACTIONS ON C OMPUTER -A IDED D ESIGN
OF I NTEGRATED C IRCUITS AND S YSTEMS , the 2009 Best Paper Award at the
VLSI Design Conference, the 2011 Best Paper Award at the IEEE European
Test Symposium, and the 2012 IEEE International Test Conference Most
Significant Paper Award.