Scan Test Bandwidth Management For Ultralarge-Scale System-on-Chip Architectures

1050 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO.
6, JUNE 2015
Scan Test Bandwidth Management for

Ultralarge-Scale System-on-Chip
Architectures
Wu-Tung Cheng, Fellow, IEEE, Yan Dong, Grady Gilles, Member, IEEE, Yu Huang, Senior Member, IEEE,
Jakub Janicki, Mark Kassab, Senior Member, IEEE, Grzegorz Mrugalski, Senior Member, IEEE,
Nilanjan Mukherjee, Senior Member, IEEE, Janusz Rajski, Fellow, IEEE,
and Jerzy Tyszer, Fellow, IEEE
Abstract— This paper presents several techniques employed with different power requirements and multiple power-supply
to resolve problems surfacing when applying scan bandwidth voltage levels. Many SoC-based test schemes proposed so
management to large industrial multicore system-on-chip (SoC) far utilize dedicated instrumentation, including test access
designs with embedded test data compression. These designs pose
significant challenges to the channel management scheme, flow, mechanisms (TAMs) and test wrappers [7], [9], [22]. TAMs
and tools. This paper introduces several test logic architectures are typically used to transfer test data between the SoC
that facilitate preemptive test scheduling for SoC circuits with pins and embedded cores, whereas test wrappers form the
embedded deterministic test-based test data compression. The interface between the core and SoC environment. Solutions
same solutions allow efficient handling of physical constraints in involving both TAMs and wrappers accomplish such tasks as
realistic applications. Finally, state-of-the-art SoC test scheduling
algorithms are rearchitected accordingly by making provisions optimizing test interface architecture [26] or control logic [20]
for: 1) setting up time-effective test configurations; 2) optimiza- while addressing routing and layout constraints or hierarchy
tion of SoC pin partitions; 3) allocation of core-level channels of cores [2], [5], scheduling test procedures [1], [18], [19],
based on scan data volume; and 4) more flexible core-wise [21], [31], and minimizing power consumption [8], [16], [30].
usage of automatic test equipment channel resources. A detailed Techniques proposed in [4], [9], [10], and [25] attempt to
case study is illustrated herein with a variety of experiments
allowing one to learn how to tradeoff different architectures and minimize SoC test time. The integrated scheme of [9] reduces
test-related factors. the test time by optimizing dedicated TAMs and pin-count-
Index Terms— Bandwidth management, embedded determin- aware test scheduling. Packet-switched networks-on-chip [25]
istic test (EDT), scan-based test, test access mechanism (TAM), can replace dedicated TAMs in testing of SoC by delivering
test application time, test compression, test scheduling. test data through an on-chip communication infrastructure.
There are techniques addressing synergistically TAM and
I. I NTRODUCTION wrapper design as well as test data compression [3], [11],
[12], [27]–[29]. In fact, test compression is now becoming
T ODAY’S multicore chip architectures require no trivial
test solutions imposed by the relentless miniaturization
of semiconductor devices, which have become much faster
an integral part of SoC-based design-for-test (DFT) schemes.
For example, the approach of [17] encodes tests for
and less power hungry than their predecessors. This trend has each core separately through linear feedback shift
given rise to the growing popularity of system-on-chip (SoC) register (LFSR) reseeding with time-multiplexed automatic
designs because of their ability to encapsulate many disparate test equipment (ATE) channels delivering data to successive
types of complex IP cores running at different clock rates cores. Similarly, a scheme of [32] utilizes static reseeding to
test SoCs, where all cores share a single LFSR. The XOR
Manuscript received February 19, 2014; revised May 14, 2014; accepted test logic performs compression in [6], which further guides
June 9, 2014. Date of publication July 16, 2014; date of current version a TAM design process.
May 20, 2015. The work of J. Janicki and J. Tyszer was supported by the
Semiconductor Research Corporation under Grant 2011-TJ-2117. ATE channel bandwidth management for SoC designs can
W.-T. Cheng, Y. Huang, M. Kassab, G. Mrugalski, N. Mukherjee, play a key role in increasing test data compression with
and J. Rajski are with Mentor Graphics Corporation, Wilsonville, no visible impact on test application time. The approach
OR 97070 USA (e-mail: wu-tung_cheng@mentor.com; yu_huang@
mentor.com; mark_kassab@mentor.com; grzegorz_mrugalski@mentor.com; presented in [13] encompasses: 1) a solver capable of using
nilanjan_mukherjee@mentor.com; janusz_rajski@mentor.com). input and output channels dynamically; 2) test scheduling
Y. Dong and G. Giles are with Advanced Micro Devices Inc., Sunnyvale, algorithms; and 3) TAM design schemes, all devised for
CA 94088 USA (e-mail: yan.dong@amd.com; grady.giles@amd.com).
J. Janicki and J. Tyszer are with the Faculty of Electronics and Telecommu- the embedded deterministic test (EDT) environment [24].
nications, Poznan University of Technology, Poznan 60-965, Poland (e-mail: It is assumed that all cores in the SoC are either hetero-
jjanicki@et.put.poznan.pl; jerzy.tyszer@put.poznan.pl). geneous modules, or wrapped testable units, and they come
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. with their individual EDT-based compression logic, which
Digital Object Identifier 10.1109/TVLSI.2014.2332469 is subsequently interfaced with ATE through an optimized
1063-8210 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
CHENG et al.: SCAN TEST BANDWIDTH MANAGEMENT FOR ULTRALARGE-SCALE SoC ARCHITECTURES 1051
number of channels. As a result, test scheduling and TAMs

can assign a fraction of the ATE interface capacity to each
core. It increases compression ratios and allows tradeoffs
between the test application time, volume of test data, test
pin count, and interface design complexity. The scheme of
[13] is applicable to any SoC-based test data reduction scheme
capable of working with a varying number of In and output
channels.
Implementing a hierarchical DFT methodology for designs
with a large number of cores poses significant challenges.
First of all, the number of chip-level pins is limited and
does not suffice to drive all cores in parallel. Given the pin
limitations, it is impossible to determine the optimal allocation
of pins to cores for the best compression. Furthermore, since
a particular core can be reused in multiple designs, an optimal
number of channel pins for this core when embedded in one
design may invalidate test reuse in other designs. Under such
circumstances, the chip integrators collect data for all indi-
vidual cores, examine the data along with all constraints for Fig. 1. SoC test environment with on-chip compression.
the design, and then manually determine test schedules. This
may result in suboptimal test data volume and compromised
test application time, especially because of some outlier blocks based on the control data produced by a test scheduler. Since
having large pattern counts (PCs). the scan routing paths from the chip-level test pins to the
Bandwidth management mitigates the dependence of core core-level test pins are dynamically selected by patterns, this
channels on the number of available chip-level pins, allows interconnection network is also referred to as a dynamic scan
automatic scheduling of tests by making it transparent to the router (DSR). Identical modules may share the same test
users, and significantly improves test planning at the core- data in the broadcast mode. In addition to individual EDT
level. It also arbitrates the sharing of the chip-level channel decompressors, each core features X-masking logic protecting
pins, thereby guaranteeing the best data volume and test time its response compactor against unknown states and connecting
reductions for the overall design. In this paper, we present a the core with an output-switching network. This network
bandwidth management scheme for hierarchical designs that allows the compressed output streams from successive cores
lets a designer tradeoff fixed and flexible channel allocations to reach an output channel i (OCi ), and to be sent back
per core as well as physical constraints to minimize the to the ATE. In order to facilitate test pattern reuse, test
routing overhead of the TAM-based networks. Furthermore, wrappers isolate all cores so that they are independent of
several techniques to deliver the control data during test are each other.
examined altogether with a new scheduling algorithm that As shown in Fig. 1, the In DSR consists of demultiplexers
allows changing the In and output channel allocations when whose number matches the number of ATE In channels. Given
switching the channel configurations. a group of test patterns, each demultiplexer connects the
The remainder of this paper is organized as follows. After corresponding channel to one of several destination cores, as
recalling principles of the EDT-based bandwidth management indicated by the content of address register. The number of
in Section II, Section III describes cost-effective schemes used ATE In channels cannot be smaller than the capacity of the
to deliver control data, and subsequently to setup SoC test con- largest single core in terms of its EDT In. Clearly, in the
figurations. Section IV introduces a conditional merging-based worst case, we can still test the largest cores, one at a time.
test scheduler. It paves the way for more advanced test time Typically, low-order In of each core are used more frequently
reduction techniques based on balanced I/O pins partitioning, than others [13]. Hence, OR gates are deployed to assure that
priority-based routing, and selective adjusting of interface these In can receive data from more than a single ATE channel
throughput. Partitioning of SoC designs and handling their pin to increase flexibility of a test scheduler.
layout constraints are among key factors characterizing a large Given the ATE In channels, the associated demultiplexers,
industrial SoC test case introduced in Section V. Section VI and all cores with OR gates driving their EDT In, the actual
details experimental results obtained for this design. This paper connections between these terminals are arranged as follows.
concludes with Section VII. The EDT In (alternatively, OR gate In, if any) are connected
with the demultiplexers in such a way that n EDT In of a given
II. P RIOR W ORK core are linked with n different ATE In channels, and each
The SoC test environment (Fig. 1) of this paper comprises ATE channel serves approximately the same number of cores.
two switching networks, as introduced in [13]. An external This method yields the actual size of the In demultiplexers and
ATE In channel i (ICi ) feeds an In-switching network that their control registers. Some final adjustments within a single
reroutes compressed test data to different cores (in the remain- module are also possible to simplify the resultant DSR layout
ing parts of this paper a given core k is denoted as Ck ) and avoid costly and long connections.
1052 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 6, JUNE 2015
The output DSR (Fig. 1) interfaces all core outputs with the cores. Let L be a sorted list of setup classes with a class x
ATE by reducing the number of test response streams so that having the largest value of c · P(x) seating in the front.
they fit into the number of output ATE channels. It is made We begin by assigning the first element of L to the current
up of a number of multiplexers such that each multiplexer result b of merging to form a base. Then, the algorithm iterates
serves to connect several cores with a designated ATE output over all the remaining setup classes and expands the base by
channel. Again, the address registers specify which cores are one class at a time, always taking the first class from L that
to be observed for a given group of test patterns. The output satisfies all constraints, including availability of ATE channels
channel mapping is carried out in a manner similar to that of and the ability of DSRs to serve the base.
the In DSR. Note that all core outputs feature a user-defined The process of forming the base class terminates when
fan-out in order to increase flexibility of selecting observation either there are no more setup classes complementary with
points connected to the multiplexers. It is also assumed that the base, or one of the constraints cannot be satisfied. The
all core outputs need to be observed when tested. merging algorithm removes then the first element from list L
It is worth noting that a flexible use of ATE channels results and attempts to form another base cluster until the list of
in increased compression and elevated encoding efficiency. setup classes becomes empty, in which case the scheduling
Moreover, testing many cores in parallel by dynamically algorithm returns a list of base classes that determines the
allocating ATE channels to different cores in accordance with actual schedule, i.e., an order according to which cores will be
their needs may shorten test application time. Finally, having tested, as well as actual channel allocations. It can be further
individual decompressors running in appropriate time slots reordered to group tests for the same core in contiguous time
makes it easier to reduce the number of external channels intervals, finally forming test configurations indicating which
compared with the total number of EDT In featured by all ATE channels are to connect with which EDT channels of
cores together. In preparation for the actual test session, the which cores and for how long.
following steps are carried out (a detailed description of this
procedure can be found in [13]).
III. C ONTROL DATA D ELIVERY
1) For each core, automatic test pattern generation (ATPG)-
produced test cubes are passed to a solver that merges The approach summarized in Section II does not make any
and encodes them to arrive with final compressed test specific provisions for the way control data is delivered to
patterns; furthermore, the solver determines the minimal SoC test logic in order to setup test configurations. It appears,
number of EDT In channels needed to compress a given however, that the number of test configurations, and hence
test pattern. the amount of control data one needs to employ and transfer
2) For each test pattern at the core level, information between the ATE and DSR address registers, may visibly
regarding the minimal number of EDT In channels is impact test scheduling and the resultant test time. Conse-
paired with data concerning the required EDT output quently, we begin this paper by analyzing three alternative
channels. schemes that can be used to upload control bits and show
3) All test patterns are clustered to form groups (classes) how they determine the final SoC test logic architecture.
of patterns having identical both In channels and obser-
vation points.
4) Upon completion of the above operations for each core, A. Using IJTAG
all classes are passed to a test scheduler; given architec- The IEEE 1687 is a proposed standard for accessing
ture of both test access networks and various constraints on-chip test and debug features via the IEEE 1149.1 test
(ATE channels, DSR architecture, power consumption, access port (TAP). The purpose of this internal Joint Test
and others), it yields the final allocation of ATE In/output Action Group (IJTAG) standard is to automate the way one
channels to selected cores when applying successive test can manage on-chip instruments, and to describe a language
patterns. for communicating with them via the IEEE 1149.1 test data
Given both DSRs, one can now schedule application of test registers (TDRs). If there is an IJTAG network available on the
patterns. Every test pattern t is characterized by its descriptor SoC, and the total number of test configurations is relatively
comprising module m that is to be exercised when applying small, one can use it to deliver the control data, as shown in
test t, and the channel capacity c, i.e., the number of EDT In Fig. 2.
and outputs needed for this purpose. All test patterns having The SoC design of Fig. 2 has a single TAP and three
the same descriptor form a setup class x represented by its different blocks: 1) two cores (C1 and C2 ) under test and
PC P(x). Every setup class can be split into multiple segments 2) the DSR interfacing ATE with C1 and C2 . TAP can be
such that test patterns from the same class are applied in instructed to enable a test path via the IEEE 1687 segment
disjoint time intervals. The ability to preempt a class may insertion bits (SIBs). Every SIB is used to either enable or
improve the ATE channel utilization, shorten the total test disable the inclusion of an instrument into the path from a
time, and reduce the volume of control data. test data In to a test data output. The TDR in C1 or C2 can
In principle, test scheduling merges complementary setup be either bypassed or loaded with data putting both cores into
classes to form clusters of cores that can be tested in parallel specific test modes. The TDR in DSR receives the control
given constraints imposed by ATE channels and DSRs. Setup data indicating which core and which of its test channels are
classes are complementary if they comprise disjoint subsets of connected to which ATE channels.
selector pin of each demultiplexer is controlled by a signal

CFS. It is worth noting that if n ≥ m, then n control chains
suffice to supervise both In and output sides. If n < m, one
chain can be used to control one In channel and multiple output
channels for the number of control bits per channel is typically
much smaller than the pattern shift cycles.
When CFS is set to 0, a test pattern is only used to deliver
the control data for demultiplexers and multiplexers of the In
and output DSR, respectively. This particular pattern does not
have to be observed, and hence it only features an upload
phase. Typically, however, the number of shift cycles for
control patterns has to match the shift cycles of conventional
tests as many ATEs may not distinguish between control
patterns and regular test vectors. If CFS is equal to 1, patterns
are applied as regular test vectors. The DSR, which has already
Fig. 2. Using IJTAG network to transfer control data. been configured, connects then the respective ATE channels
with the core test channels as indicated by the content of the
address registers.
As can be seen in Fig. 3, CF is the pipelining flip-flop at the
beginning of one of the In channels. CFS is the shadow register
of CF, and it is updated at the end of each test pattern upload.
CF and CFS are initialized to 0, so the first test pattern must be
a control pattern. Once the control data is shifted in, flip-flop
CF becomes asserted, so does CFS. As a result, the regular
scan test patterns can be applied. Given a test configuration,
all patterns but the last one set CF and CFS to 1. The last
scan pattern of a test configuration sets CF back to 0 and
subsequently CFS is deasserted. In the next pattern, it uses the
deasserted CFS to upload the new control data into the control
chain and to setup a new test configuration. The remaining
configurations and test patterns are applied in a similar manner
Fig. 3. Dedicated control chain-based architecture.
by toggling the values of CF and CFS. The advantage of this
architecture is its ability to support as many test configurations
as desired since the control chain shift frequency is the same
The advantage of using the IJTAG network to deliver the
as that of conventional scan chains.
DSR control data is a simple and easy way to implement
flow as the network is frequently used to set the cores TDRs.
However, such an approach can only support a limited number
of configurations. This is because the IJTAG shift clock is C. Using Pipelines
typically 10 to 20 times slower than the scan shift clock. Deliv-
One can also use the regular scan channels to deliver
ering a large volume of control data can incur an unacceptable
controls through pipelining stages, as shown in Fig. 4. For
total test time overhead. Consequently, this architecture can
each channel, this approach concatenates n + m control bits,
be used for a relatively small number of test configurations.
where n and m are the numbers of control bits used by
If the TDRs work with parallel update registers and many
the In demultiplexers and output multiplexers, respectively.
patterns use the same configuration, a low throughput IJTAG
Moreover, each control bit is shadowed to avoid distorting
control can be mitigated. If one changes the control state
test configurations in the middle of test data shifting. The
rather seldom, the next configuration vector can be shifted
shadow registers are updated at the end of each pattern
in coincidently with application of test patterns, followed by
upload. Thus, when a test pattern launches a new test con-
updating TDR when ready to switch to the new configuration.
figuration, the corresponding control data need to be loaded
It requires more DFT logic, though.
with its predecessor. Clearly, the first vector is exclusively a
setup one.
B. Dedicated Control Chain The architecture of Fig. 4 supports as many test configura-
The SoC design of Fig. 3 uses two dedicated control chains tions as required. However, the control data is always uploaded
(shown as red squares) to deliver the control data. In princi- through the ATE channels as an integral part of a test vector.
ple, these structures are obtained by daisy chaining address Hence, given a test configuration, the same control data is
registers of both DSRs. Let the design have n In test pins and repeated for all test patterns. The amount of control data
m output test pins at the chip-level. One can insert n control is small, though, as the number of control bits per channel
chains driven by n In test pins through n demultiplexers. The is typically a tiny fraction of the test pattern shift cycles.
Fig. 4. Pipeline architecture.
IV. T EST T IME R EDUCTION

The tester bandwidth management in present large SoC
designs and future kilo-core architectures with considerable
diversity in the design styles and test needs of individual
modules requires solutions beyond the capabilities of state-
of-the-art DFT schemes. In particular, a test scheme will have
to dynamically adapt to particulars of a given SoC architecture
to assure an optimal utilization of its interface bandwidth.
In this section, we identify the key reasons for existing Fig. 5. Conditional merging cases. (a) No merging, Tc < t1 , t2 , t3 .
(b) Merging, Tc < t1 , t2 , t3 . (c) No merging, Tc > t1 , t2 , t3 . (d) Merging,
solutions’ inability to further reduce test application time and Tc > t1 , t2 , t3 .
demonstrate how rearchitecting a test scheduler and DSR
networks can mitigate it. Finally, we show how these methods
reduce the test time in a global bandwidth management one may observe performance degradation due to unutilized
scheme. (idle) ATE channels.
Consider a bin-packing diagram representing a part of a
hypothetical test schedule, as shown in Fig. 5(a). Here, two
A. Conditional Merging cores, C1 (white boxes) and C2 (gray boxes), are to be tested.
The test-scheduling algorithm of Section II attempts (in a Core C1 requires three channels to apply t1 test patterns
greedy fashion) to optimize the total test application time, (it forms a setup class a), two channels to deliver t2 tests
whereas the number of test configurations remains virtually (setup class b), and then a single channel for the remaining t3
unlimited. Under certain circumstances, however, minimizing patterns (class c). Core C2 needs two channels to deliver t3
the number of test configurations may decrease the actual test test patterns (class d), and a single channel to apply t2 patterns
time. Indeed, considering one would need to transfer control (setup class e). Let t1 , t2 , and t3 be much longer than the
data necessary to reconfigure DSRs many times, reducing duration Tc of test configuration setup. If classes b and e as
this activity may translate into some significant test time well as classes c and d are merged as shown in Fig. 5(b),
savings. The additional test-scheduling step taking care of this then extra time is needed to reconfigure the network three
phenomenon will be further referred to as conditional merging, times. More importantly, however, we reduce the actual test
and it can be detailed as follows. time since C2 is tested in parallel with C1 . On the other hand,
Assume we know the total number of control bits needed if these classes are not merged, then two test configurations
to reconfigure the In DSR(s). Let us also assume that we suffice to complete tests of C1 and C2 . Unfortunately, test of
know the frequency of a shift clock used when moving C2 is delayed, thus increasing visibly the total test application
the control data. As a result, one can easily assess time time. As shown in Fig. 5(a), this scenario can be characterized
Tc spent on updating test configurations. The resulting new by some unused space (in the schedule diagram) whose area
test scheduling comprises an additional verification step that A can be easily computed. Let k be the total number of SoC
precedes a possible merging (or bin-packing) of a new setup In channels. Then, the total test time increase τ caused by
class with the existing base. It estimates the resultant impact not merging some of the setup classes can be estimated as
on test time, if we decide to proceed with merging. Clearly, if τ = A/k. If τ < Tc , then one can reduce the total test time
the setup class is not merged with the base, it may reduce test by actually not merging some of the classes. Consequently, the
time as there is no need to send extra control data forming a above condition can be used to dynamically decide whether
new test configuration. On the other hand, as part of tradeoff, given setup classes should be merged when running test
Fig. 7. Core-level test data volume statistics.
Fig. 6. Test data volume for industrial design.

estimate SDV for the In and output side, as indicated in the
figure. SDV for the remaining cores of a given design can be
scheduling. This is illustrated in Fig. 5(c), where we assume computed based on similarly shaped curves. By summing up
that t1 , t2 , and t3 are shorter than reconfiguration time Tc . over all individual SDVs, one can arrive with the final ratio
Without merging [Fig. 5(c)], test of C2 is going to be delayed. of data volumes on the In and output sides. For example, the
Nevertheless, this approach is still recommended as setting up ratio for one of examined industrial designs was equal to 0.44.
an additional test configuration [Fig. 5(d)] will take longer Subsequently, all SoC-level channels are assigned to the In and
than testing C2 separately. output DSRs in accordance with the ratio determined earlier.
In the aforementioned test case, we would have two outputs
per one In, on the average. Occasionally, the desired ratio can
B. SoC In–Output Pin Partition also be determined using the upper bound on the number of
Optimizing SoC pin allocation based on scan data In channels assigned to a given core, especially when channel
volume (SDV) at the In and output sides can further improve profiles similar to those of Fig. 6 are not readily available. The
test scheduling. As typical SoC designs pack hundreds of same design would have then the ratio 0.51.
cores—interfaced by strictly limited channel resources—into
a space of a single chip, it is not unusual that the number
of channels assigned to a single core as well as the ratio of C. Priority-Based Routing
EDT In to EDT outputs vary significantly. The same ratio It is worth recalling that DSR networks are generic and
also depends on a test pattern as the solver always selects regular structures that remain independent of optimized SoC
the minimal number of In that suffice to encode a given pin partitions and SDV requirements of individual cores.
vector while still observing all outputs. This disproportion is In particular, their OR gates (Fig. 1) help increase flexibility
even more pronounced when several identical cores receive of a test scheduler as proportionally more ATE channels can
the same data in the broadcast mode, whereas their outputs be deployed to feed cores with larger fan-ins. However, the
must be observed independently to guarantee high quality of number of EDT In channels and SDV do not correlate well.
fault detection and diagnosis. Consider, for example, a circuit For example, test PCs of two cores with the same number
comprising eight identical cores with five In and outputs. Such of EDT channels may differ significantly. Consequently, the
a design would require supplying different subsets of five In number of EDT channels may not suffice to accurately guide
channels while uninterruptedly monitoring all 40 outputs. test resource partitioning. It applies primarily to those indus-
The above observations clearly indicate that the DSRs trial multicore designs whose crucial components (typically
should provide each core with channel resources adequate to cores running key functions) have aggressive data and test
its bandwidth requirements. Although the total number of ATE time requests compared with the remaining modules of the
channels is typically fixed, most of SoC pins are bidirectional same chip.
and can be configured either as In or outputs. One can exploit Consider, for example, core-level SDV statistics for a single
this flexibility; ultimately, it is the optimal proportion between SoC design with 141 heterogeneous cores, as shown in Fig. 7.
the In and outputs that will count in reducing test time or These numbers clearly follow the law of the vital few—the
improving bandwidth management. major part of SDV is used to test only a few cores, whereas
According to our findings, the most suitable partitions of the minority of tests is delivered to a wide spectrum of the
ATE channels can be obtained by comparing the corresponding remaining blocks. It is, therefore, unreasonable to arrange
SDVs on both sides. Consider a channel profile for a single the same DSR connections—with many wires and inflated
industrial core, as shown in Fig. 6. It gives, for every test control data—for both groups of cores. On the other hand, if
pattern, the number of In (black curve) and output (red line) routing constraints permit, these additional wires can be used
channels required to apply this test. Areas below both curves to increase connectivity of the most demanding cores.
TABLE I
T OP PC C ORES
Fig. 8. Core-level PC statistics.
The above analysis clearly indicates that adjusting DSR bottleneck, which prevents test scheduling from reaching solu-
networks in accordance with the SDV statistics can maintain tions close to a theoretical boundary. Fortunately, as resizing
the central role of test scheduling in reducing test application of a decompressor does not impact SDV [14], increasing the
time. Thus, our approach groups cores with similar SDV number of In channels can visibly and safely reduce individual
numbers in order to handle every group—through properly EDT-based PCs due to advanced dynamic compaction methods
adapted DSR networks—in an individual manner determined that allow one to produce very compact test sets scaling almost
by cores’ test data demands. As can be seen in Fig. 7, linearly with the number of In.
141 cores are partitioned into three groups, each comprising Clearly, the cost of ATPG recomputing can be high, espe-
47 blocks. The accumulated SDVs (represented by the area cially for designs with hundreds of cores. Furthermore, it
below the red curve) for groups A, B, and C are equal may be impossible to increase EDT interface for every single
to 73%, 25%, and 2% of the total data volume, respec- core. Therefore, choosing top cores with excessive PCs is the
tively. The core groups that emerge from this partitioning solution. Here, the idea is to exclusively rebuild EDT interface
get DSR wires proportionally to their SDV requests. As a of these cores in such a way that the PC of the resultant test
result, cores with larger SDVs will have access to more ATE set is below the lower bound determined by the optimal test
channels than cores with less demanding requests. Back to schedule. In particular, the number of decompressor In should
Fig. 7—the group of cores that needs 73% of the total SDV be increased as indicated by a ratio between the number of
is likely to be served by a part of DSR networks that offer patterns produced for a baseline configuration and number of
the most comprehensive connectivity within the ATE interface. desired vectors. Table I presents stuck-at test pattern data for
For example, this approach may let every single core of group the top seven cores of the SoC design analyzed earlier. Let
A use one out of three ATE channels per In, while having us assume that the number of channels feeding these cores
cores of groups B and C constrained to two and one ATE doubles. For each core, successive columns report the number
channel, respectively. of EDT In, test coverage (TC), the PC for the baseline (left
The use of SDV figures in designing a DSR is remarkable part of the table) and resized (right part) EDT interfaces, and
on its own, especially when all SoC cores have their ATPG finally the ratio between the resultant PCs. As can be seen,
patterns ready. Still, the exact PC for each core may not always all cores reduce their PCs approximately by half, whereas the
be available at the DSR design stage. In this scenario, one can TC remains unaffected. As a result, the PCs for all cores are
run a compression analyzer [23] to estimate the total PC and now below the lower bound and testing can be completed in
corresponding test data volume. A costly ATPG for a large an optimal time.
number of cores can deploy in this case a fault sampling
instead of working with a complete fault list. These actions V. I NDUSTRIAL T EST C ASE
are feasible as the key parameter needed here is just an SDV
The schemes proposed in this paper were tested on a large
distribution among all cores rather than the accurate individual
industrial SoC design comprising 281 isolated cores. This
PC for each of them.
number includes 121 unique cores plus 20 groups of identical
cores, each having eight modules (note that the number
D. Selective Elevation of Pin Counts at Core Level of functionally different cores is thus 141—some statistics
A large number of patterns that are needed by a small regarding this design have already been used in Section IV).
fraction of cores is another factor that may severely impact When testing identical cores in one group, test stimuli are
the actual test time. Consider, for example, the core-level broadcasted to the In of these eight identical cores while test
PCs (Fig. 8) for the SoC design of Fig. 7. The total responses from all outputs of eight cores are observed together.
SDV—here 3 269 098—and 151 available channels indicate The core gate counts vary from a few million to more than
that the best schedule possible could reduce the test time 10M, whereas the core PCs ranges from a few thousands to
to 3 269 098/151 = 21 650 patterns. However, there are sev- over 36k.
eral cores that require more than 21 650 patterns (including The trend of producing such complex designs adds another
module 1 coming with more than 35 000 tests). They form a constraint to the already complicated test scheduling process—
TABLE II
S O C C HARACTERISTICS
Fig. 9. Pin layout constraints. (a) Baseline. (b) Partitioned.
certain cores can only use a subset of I/O pins because of

layout congestions. As shown in Fig. 9(a), all design pins and
cores can be then partitioned into a number (here four) of
disjoint subsets. A single core can only be interfaced through
a subset of pins hosted by the corresponding group. This
approach alleviates many practical problems, such as routing
complexity, aggressive timing parameters, signal integrity, or
power dissipation requirements. However, it impacts test logic
architecture and test scheduling algorithms. In particular, one
has to employ a number of routers [Fig. 9(b)] driven by subsets
of In test channels and feeding subsets of output test channels. TABLE III
Although the test scheduler is aware of these constraints, it S O C G ROUP S ETUP
may not produce as optimal test scenarios as those of single-
partition SoC designs due to their reduced flexibility (not
every channel can be connected freely to a desired core) and
unbalanced test requirements between successive groups. The
latter feature represents scenarios when, for example, the ratio
of In and output channels allocated to a given group does
not match the corresponding ratio of In and output test data
volumes, as discussed earlier, leading to underutilization of
some test resources within one group while compromising test
time within another partition.
Following the general scheme presented above, each core
or a group of identical cores in our design employ only a
subset of pins. There are 477 out of 497 SoC digital pins
that were used for testing. Similar to Fig. 9, the design was
partitioned into four quadrants Q 1 –Q 4 with the corresponding
allocation of cores and I/O pins into individual quadrants, as For example, the cores of Fig. 9(a) could be tested as follows:
listed in Table II. The same table provides some vital statistics 1) G 1 = {C1 , C5 , C9 , C13 }; 2) G 2 = {C2 , C6 , C10 , C14 };
of the design that include the gate count, number of scan cells, 3) G 3 = {C3 , C7 , C11 , C15 }; and 4) G 4 = {C4 , C8 , C12 , C16 }.
total number of EDT In (it would define the interface size if Once test scheduling arrives at the four test configurations,
no scheduling was used), total number of EDT outputs, and we use JTAG to setup each configuration. The results obtained
corresponding ratios of In to outputs: the actual one and the this way (Table III) form a baseline that can be compared
most preferable, obtained as shown in Section IV based on against our dynamic SoC bandwidth management outcomes.
the In test data volume and the output test data volume given The second row of Table III indicates the number of cores
in the table for both stuck-at and transition fault tests, with a that ended up within each configuration while the next row
further data breakdown with respect to successive quadrants. provides the number of scan shift cycles per pattern (pattern
All cores were manually partitioned into four groups length). The next three rows show, for each group, the number
G 1 –G 4 . One group corresponds to one configuration where of EDT In, the number of EDT outputs, and their sum,
all cores are tested in parallel, whereas different configurations respectively. Finally, the table reports the total number of
are tested in series. The core partitioning procedure picks one test patterns (determined by the largest PC of the largest
core from each quadrant and schedules these four cores to be core in a group) and the total number of shift cycles. The
tested in parallel within one configuration. If there are still SoC JTAG shift cycles deployed to setup each configuration are
pins available, it picks more cores from each quadrant until ignored because of only four configurations. The resultant
either there are no more SoC pins or all cores are selected. test schedule for patterns targeting stuck-at faults is illustrated
TABLE IV
E XPERIMENTAL R ESULTS - I
run our dynamic SoC bandwidth management. Given stuck-at

fault and transition fault test sets, the experimental results are
summarized in Table IV for eight test cases, out of which four
correspond to a fixed core-level channel count scenario where
a given core is always assigned SoC pins to all of its EDT
channels regardless of the actual test pattern requirements,
whereas the next four cases represent a scalable core-level
channel count scenario with the minimal number of SoC
pins allocated to a given core for a particular test pattern.
Furthermore, we consider four test setups, as described in
Section III: two of them use the IJTAG architecture with a shift
clock being 10 and 20 times slower than the scan shift clock,
whereas the remaining two solutions deploy dedicated control
chains and control bits attached to test patterns, respectively.
In all experiments, the total number of internal wires within
all (eight) DSRs, as shown in Fig. 1, was equal to 1795 (In)
Fig. 10. Baseline bin-packing diagram for stuck-at patterns. and 4200 (outputs). Furthermore, the same DSRs deployed
exclusively In demultiplexers and output multiplexers driven
in Fig. 10 as a bin-packing diagram for both the In (the
by 4-bit address registers. In other words, the amount of
upper part) and output (the lower part) sides. Different colors
control data per either In or output channel, whenever it needs
correspond to different cores or identical core groups. White
to be provided, is equal to 4 bit.
areas within eight boxes represent time slots when certain
For each test case, successive columns of Table IV report
channels (oriented horizontally) are unused. It is worth noting
the following performance-related statistics of the proposed
that the baseline is a fully static channel assignment, as
schemes:
connections between core test channels and SoC test pins
(within each configuration) are fixed. 1) the total number of test patterns and the resultant number
of base classes;
VI. E XPERIMENTAL R ESULTS 2) the channel fill rate indicating the actual usage of the
The test architectures of Sections III and IV as well as ATE channels; it assumes the value of 1 only if all
the SoC design of Section V were subsequently employed to channels are used uninterruptedly till the end of test;
TABLE V
E XPERIMENTAL R ESULTS - II
Fig. 11. Test time reduction.

Fig. 12. SoC complexity versus test time reduction.
3) the test application time reported as the total number of
shift cycles needed to deliver all test data in terms of the by means of demultiplexers to the same number (given by
actual test patterns and the accompanying control bits; the Setup column) of different ATE channels. The next three
moreover the data to control ratio of both quantities is cases correspond to a priority-based configuration with the
also reported here; connecting wires allocated individually per core In and for
4) the test time reduction ratio relative to the original shift every group of cores (as shown again by the Setup column,
cycles given in Table III. which lists the number of wires used by each core In of
As indicated in Table IV, it appears that one can reduce up to each group). For example, in the first case (3, 2, and 1),
3.26 times the test application time compared with the baseline there are three groups in total (A, B, and C), and each
and still be able to deliver all necessary test patterns, thus core In in A, B, and C is connected with 3, 2, and 1 ATE
preserving the original test quality. In particular, the largest test channels, respectively. For each test case, Table V reports
application time reduction is observed for test scenarios based performance statistics similar to those of Table IV. As can be
on per pattern dynamic channel allocation while employing seen, the test time can be significantly reduced if appropriate
dedicated chains to deliver all required control test data. These interconnection networks are used. For example, compare the
results are even more evident for transition fault test sets. last test cases reported for both scenarios. It appears that the
Further experimental results, not presented in Table IV, reveal test application time for the priority-based configuration is
an intrinsic relation between the ratio of In and output channels reduced up to 3.41 times, i.e., it outperforms the corresponding
and the resultant test time reduction. This is illustrated in solution for the baseline configuration by 12% with the number
Fig. 11. As can be seen, the test time reduction clearly of connecting wires reduced from 2760 to 2382.
depends on the aforementioned ratio, and hence its careful An interesting tradeoff between the SoC complexity and
selection, as shown in Section IV, may guarantee the shortest test time reduction is presented in Fig. 12. Each group of
test application time. bars corresponds to a given number of control bits used per a
Additional results, reported in Table V, try to capture single ATE output, whereas each color indicates the number
the impact of a priority-based routing (as described in of control bits per an ATE In. As can be seen, one can reduce
Section IV-C) on test time reduction. The experiments were the test time by increasing the amount of either In or output
run for stuck-at fault test sets by deploying scalable channel control bits. However, the most prominent time savings occur
counts and dedicated control scan chains. The proposed test when both factors go up in parallel. For example, with two
scheduling techniques have produced test scenarios for six control bits per output and a single control bit per In, one
test cases. In the first three ones, all core In are connected can reduce test duration 1.57 times. The same two output bits
[3] A. Chandra and K. Chakrabarty, “A unified approach to reduce SOC test

data volume, scan power and testing time,” IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., vol. 22, no. 3, pp. 352–362, Mar. 2003.
[4] S. K. Goel and E. J. Marinissen, “Effective and efficient test architecture
design for SOCs,” in Proc. Int. Test Conf. (ITC), 2002, pp. 529–538.
[5] S. K. Goel, E. J. Marinissen, A. Sehgal, and K. Chakrabarty, “Testing
of SoCs with hierarchical cores: Common fallacies, test access opti-
mization, and test scheduling,” IEEE Trans. Comput., vol. 58, no. 3,
pp. 409–423, Mar. 2009.
[6] P. T. Gonciari and B. M. Al-Hashimi, “A compression-driven test access
mechanism design approach,” in Proc. 9th IEEE Eur. Test Symp. (ETS),
May 2004, pp. 100–105.
[7] Y. Huang et al., “Optimal core wrapper width selection and SOC test
scheduling based on 3-D bin packing algorithm,” in Proc. Int. Test Conf.
(ITC), 2002, pp. 74–82.
[8] V. Iyengar and K. Chakrabarty, “System-on-a-chip test scheduling with
precedence relationships, preemption, and power constraints,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 9,
Fig. 13. Optimized test schedule. pp. 1088–1094, Sep. 2002.
[9] V. Iyengar, K. Chakrabarty, and E. J. Marinissen, “Test wrapper and
test access mechanism co-optimization for system-on-chip,” J. Electron.
paired with three control bits per In yield about a 2.55x test Test., Theory Appl., vol. 18, pp. 213–230, Apr. 2002.
time reduction. Interestingly, adding more control bits on the [10] V. Iyengar, K. Chakrabarty, and E. J. Marinissen, “Efficient test access
In side has virtually no performance impact. Eventually, the mechanism optimization for system-on-chip,” IEEE Trans. Comput.-
best test time reduction is observed when five control bits per Aided Design Integr. Circuits Syst., vol. 22, no. 5, pp. 635–643,
May 2003.
every In and output ATE channel are used. [11] V. Iyengar, K. Chakrabarty, and E. J. Marinissen, “Test access mecha-
Fig. 13 pictures the complete and optimized test schedule nism optimization, test scheduling, and tester data volume reduction for
for In and output sides obtained by allocating test channels in system-on-chip,” IEEE Trans. Comput., vol. 52, no. 12, pp. 1619–1632,
Dec. 2003.
a dynamic manner for 145 In and 332 output test channels.
[12] V. Iyengar, A. Chandra, S. Schweizer, and K. Chakrabarty, “A unified
As can be seen, there is a well-pronounced difference between approach for SoC testing using test data compression and TAM opti-
the test schedules of Figs. 10 and 13—note that the time axis mization,” in Proc. Des., Autom. Test Eur. Conf. Exhibit. (DATE), 2003,
of Fig. 10 has been largely squeezed compared with the same pp. 1188–1189.
[13] J. Janicki, M. Kassab, G. Mrugalski, N. Mukherjee, J. Rajski, and
axis of Fig. 13. J. Tyszer, “EDT bandwidth management in SoC designs,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 31, no. 12,
pp. 1894–1907, Dec. 2012.
VII. C ONCLUSION [14] J. Janicki, G. Mrugalski, J. Rajski, and J. Tyszer, “Bandwidth-aware test
compression logic for SoC designs,” in Proc. 17th IEEE Eur. Test Symp.
As Moore’s law continues to provide smaller devices, (ETS), May 2012, pp. 14–19.
designs with a range of core counts, capability per core, and [15] J. Janicki et al., “EDT bandwidth management—Practical scenarios for
energy per core make a dramatic impact on SoC design and large SoC designs,” in Proc. IEEE Int. Test Conf. (ITC), Sep. 2013,
test procedures. As shown in this paper, the I/O resources no. 4.3, pp. 1–10.
[16] X. Kavousianos, K. Chakrabarty, A. Jain, and R. Parekhji, “Test schedule
provided by a tester can be dynamically allocated to selected optimization for multicore SoCs: Handling dynamic voltage scaling and
cores, whereas the total number of channels in use may remain multiple voltage islands,” IEEE Trans. Comput.-Aided Design Integr.
unchanged. This paradigm clearly calls for efficient schemes Circuits Syst., vol. 31, no. 11, pp. 1754–1766, Nov. 2012.
minimizing the overall test application time, while taking into [17] A. B. Kinsman and N. Nicolici, “Time-multiplexed test data decom-
pression architecture for core-based SOCs with improved utilization of
account physical constraints, in particular, SoC pin allocations. tester channels,” in Proc. Eur. Test Symp. (ETS), 2005, pp. 196–201.
Assuming that all SoC cores are wrapped testable [18] S. Koranne, “Formulation of SOC test scheduling as a network trans-
units, this paper studies several practical issues regarding portation problem,” IEEE Trans. Comput.-Aided Design Integr. Circuits
Syst., vol. 21, no. 12, pp. 1517–1525, Dec. 2002.
SoC-based testing that deploys on-chip test data compres-
[19] E. Larsson and H. Fujiwara, “System-on-chip test scheduling with
sion with the ability to dynamically use ATE channels. The reconfigurable core wrappers,” IEEE Trans. Very Large Scale Integr.
proposed solutions include methods used to deliver control (VLSI) Syst., vol. 14, no. 3, pp. 305–309, Mar. 2006.
data and test scheduling algorithms minimizing the overall [20] E. Larsson and Z. Peng, “An integrated framework for the design and
optimization of SOC test solutions,” J. Electron. Test., Theory Appl.,
test application time. Experimental results obtained for a large vol. 18, pp. 385–400, Aug./Oct. 2002.
industrial SoC design confirm feasibility of the proposed [21] E. Larsson, J. Pouget, and Z. Peng, “Defect-aware SOC test schedul-
schemes and their ability to trade-off the number of test pins, ing,” in Proc. 22nd IEEE VLSI Test Symp. (VTS), Apr. 2004,
pp. 359–364.
design complexity of the TAM, and test application time.
[22] E. J. Marinissen, S. K. Goel, and M. Lousberg, “Wrapper design for
embedded core test,” in Proc. Int. Test Conf. (ITC), 2000, pp. 911–920.
R EFERENCES [23] Scan Insertion, ATPG, and Diagnosis—Reference Manual, Mentor
Graphics Corporation, Wilsonville, OR, USA, 2014.
[1] K. Chakrabarty, “Test scheduling for core-based systems using mixed- [24] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, “Embedded deter-
integer linear programming,” IEEE Trans. Comput.-Aided Design Integr. ministic test,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,
Circuits Syst., vol. 19, no. 10, pp. 1163–1174, Oct. 2000. vol. 23, no. 5, pp. 776–792, May 2004.
[2] K. Chakrabarty, V. Iyengar, and M. D. Krasniewski, “Test planning [25] M. Richter and K. Chakrabarty, “Test pin count reduction for NoC-based
for modular testing of hierarchical SOCs,” IEEE Trans. Comput.-Aided test delivery in multicore SOCs,” in Proc. Design, Autom. Test Eur. Conf.
Design Integr. Circuits Syst., vol. 24, no. 3, pp. 435–448, Mar. 2005. Exhibit. (DATE), 2012, pp. 787–792.
[26] A. Sehgal, V. Iyengar, and K. Chakrabarty, “SOC test planning using Grady Gilles (M’90) received the B.S. degree
virtual test access architectures,” IEEE Trans. Very Large Scale Integr. in physics from Texas A&M University, College
(VLSI) Syst., vol. 12, no. 12, pp. 1263–1276, Dec. 2004. Station, TX, USA, in 1978.
[27] M. Tehranipoor, M. Nourani, and K. Chakrabarty, “Nine-coded com- He has been privileged to practice design for
pression technique for testing embedded cores in SoCs,” IEEE Trans. testability (DFT) on some of the most commercially
Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 6, pp. 719–731, successful microprocessors, first with Motorola,
Jun. 2005. Schaumburg, IL, USA, and is currently with
[28] Z. Wang, K. Chakrabarty, and S. Wang, “Integrated LFSR reseeding, Advanced Micro Devices, Sunnyvale, CA, USA,
test-access optimization, and test scheduling for core-based system-on- where he has been since 1999. He holds numerous
chip,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 28, patents and publications related to DFT. His current
no. 8, pp. 1251–1263, Aug. 2009. research interests include di/dt mitigation techniques
[29] Q. Xu and N. Nicolici, “Multi-frequency test access mechanism design and multiple orthogonal strategies to better utilize tester bandwidth.
for modular SOC testing,” in Proc. 13th Asian Test Symp. (ATS), 2004, Mr. Gilles was a co-recipient of the 1991 and 2008 Best Paper Awards at
pp. 2–7. the IEEE International Test Conference. He helped to develop two test-related
IEEE Standards, 1149.1 and 1500.
[30] D. Zhao and S. Upadhyaya, “Dynamically partitioned test scheduling
with adaptive TAM configuration for power-constrained SoC testing,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 6,
pp. 956–965, Jun. 2005.
[31] Q. Zhou and K. J. Balakrishnan, “Test cost reduction for SoC using
a combined approach to test data compression and test schedul- Yu Huang (M’99–SM’12) received the B.S. degree
ing,” in Proc. Des., Autom. Test Eur. Conf. Exhibit. (DATE), 2007, in electronic science and the M.S. degree in
pp. 39–44. semiconductor devices and technology from Nankai
[32] W. Zou, S. M. Reddy, I. Pomeranz, and Y. Huang, “SOC test scheduling University, Tianjin, China and the Ph.D. degree in
using simulated annealing,” in Proc. 21st VLSI Test Symp. (VTS), electrical and computer engineering from the Uni-
May 2003, pp. 325–330. versity of Iowa, Iowa City, IA, USA.
He is a Senior Research and Development
Staff Member with the Division of Silicon Test
Systems, Mentor Graphics Corporation, Wilsonville,
OR, USA. He has authored more than 100 research
papers, and holds seven U.S. patents and 16 patents
pending. His current research interests include VLSI testing, test compression,
and fault diagnosis.
Wu-Tung Cheng (S’84–M’85–SM’90–F’00)

received the B.S. and M.S. degrees in electrical
engineering from National Taiwan University,
Taipei, Taiwan, in 1978 and 1982, respectively, Jakub Janicki received the M.S. and Ph.D. degrees
and the Ph.D. degree in computer science from in telecommunications from Poznań University of
the University of Illinois at Urbana-Champaign, Technology, Poznań, Poland, in 2009 and 2014,
Champaign, IL, USA, in 1985. respectively.
He was with AT&T Bell Laboratories, Princeton, He is currently with Mentor Graphics Polska,
NJ, USA, from 1985 to 1990, where he was Poznań. He has authored several technical papers
developing a test pattern generation system for in the IEEE journals and conferences. He is a
digital circuits. From 1990 to 1993, he was the co-inventor of two U.S. patent applications. His
Co-Founder of CheckLogic System, Inc., Dayton, OH, USA, and the current research interests include the design for
Vice President of Engineering, where he was leading a team to develop testability, testing of system-on-chip designs, low-
commercial automatic test-generation products, which included FastScan, power embedded test, and test compression.
FlexTest, and DFTAdvisor. In 1993, CheckLogic was merged with Mentor
Graphics Corporation, Wilsonville, OR, USA, where he has held various
technical and management positions in design for test area since 1993. He is
currently a Chief Scientist and the Director of Advanced Test Research,
involved in research on new design-for-test solutions for future semiconductor Mark Kassab (M’91–SM’14) received the M.S.
quality and yield issues. He has authored over 130 research papers in these and Ph.D. degrees in electrical engineering from
areas, and is the co-inventor of 40 patents. McGill University, Montreal, QC, Canada, in 1994
and 1996, respectively.
He joined the Design-for-Test Group at Mentor
Graphics Corporation, Wilsonville, OR, USA, in
1996, where he is currently the Engineering Direc-
tor. He is a co-inventor of the EDT compression
technology, the Lead Developer and Architect of
TestKompress, and a Co-Designer of the Tessent
platform. He has authored several papers, and is a
Yan Dong received the B.S. and M.S. degrees co-inventor of 37 patents. His current research interests include the design for
in electrical engineering from Zhejiang University, testability, improving test quality, test cost reduction, design automation, and
Hangzhou, China, in 1984 and 1987, respectively. design of high-performance CAD tools.
He has started his career as an ASIC Design Engi- Dr. Kassab was a co-recipient of the Best Paper Award at the VLSI Test
neer. His current research interests include network Symposium in 1995 and the Honorable Mention Award at the International
computing processors, embedded CPUs, and graphic Test Conference in 1999. Two of the papers he published were included by
chip designs. He has been involved in DFX features ITC 2004 among the most significant papers published in the 35 years of
design, including scan, memory BIST, and built-in ITC. He was also a co-recipient of the IEEE Circuits and Systems Society
functional tests since joining ATI, Markham, ON, Donald O. Pederson Outstanding Paper Award recognizing the paper on
Canada, in 2004, which became a part of Advanced embedded deterministic test in the IEEE T RANSACTIONS ON C OMPUTER -
Micro Devices, Sunnyvale, CA, USA, in 2006. He is A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS in 2006. He was
currently a Senior Design Manager leading DFX IP design, flow, and also a recipient of the 2012 Most Significant Paper Award at the International
methodology development for GPU and APU. Test Conference.
Grzegorz Mrugalski (M’06–SM’13) received the Janusz Rajski (A’87–SM’10–F’11) received the
M.S. and Ph.D. degrees in electrical engineering M.S. and Ph.D. degrees in electrical engineering
from Poznań University of Technology, Poznań, from the Technical University of Gdańsk, Gdańsk,
Poland, in 1995 and 2002, respectively. Poland, and Poznań University of Technology,
He joined Mentor Graphics Corporation, Poznań, Poland, in 1973 and 1982, respectively.
Wilsonville, OR, USA, in 2002. Previously, He was a Faculty Member of the Poznań Uni-
he was with the Institute of Electronics and versity of Technology from 1973 to 1984. In 1984,
Telecommunications, Poznań University of he joined McGill University, Montreal, QC, Canada,
Technology. Since 2009, he has been the Manager where he became an Associate Professor in 1989. In
of the Research and Development Laboratory 1995, he was a Chief Scientist at Mentor Graphics
for Design-for-Test products at Mentor Graphics Corporation, Wilsonville, OR, USA. He has authored
Polska, Poznań, Poland. He has co-authored over 40 technical papers in the more than 150 research papers in these areas, and is a co-inventor of 84 U.S.
area of test, and holds 21 U.S. patents. His current research interests include patents. He is also the Principal Inventor of the Embedded Deterministic
computer-aided design of digital circuits, design for testability, built-in Test Technology used in the first commercial test compression product,
self-test, and test compression. TestKompress. His current research interests include the testing of VLSI
Dr. Mrugalski was a co-recipient of the 2011 Best Paper Award at the systems, design for testability, built-in self-test, and logic synthesis.
IEEE European Test Symposium and the 2012 IEEE International Test Dr. Rajski was the Guest Co-Editor of the special issue of the IEEE C OM -
Conference Most Significant Paper Award. MUNICATIONS M AGAZINE devoted to testing of telecommunication hardware
in 1999. He was also the Associate Editor of the IEEE T RANSACTIONS ON
C OMPUTER -A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS , the
IEEE T RANSACTIONS ON C OMPUTERS , and the IEEE D ESIGN AND T EST
OF C OMPUTERS M AGAZINE. He has served on technical program committees
of various conferences. He was a co-recipient of the 1993 Best Paper Award
for the paper on logic synthesis published in the IEEE T RANSACTIONS ON
C OMPUTER -A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS , the
1995 and 1998 Best Paper Awards at the IEEE VLSI Test Symposium, the
1999 and 2003 Honorable Mention Awards at the IEEE International Test
Conference, the 2006 IEEE Circuits and Systems Society Donald O. Pederson
Outstanding Paper Award recognizing the paper on embedded deterministic
test published in the IEEE T RANSACTIONS ON C OMPUTER -A IDED D ESIGN
OF I NTEGRATED C IRCUITS AND S YSTEMS , the 2009 Best Paper Award at the
VLSI Design Conference, the 2011 Best Paper Award at the IEEE European
Test Symposium, and the 2012 IEEE International Test Conference Most
Significant Paper Award.
Jerzy Tyszer (M’91–SM’96–F’13) received the

M.S. and Ph.D. degrees in electrical engineering
from Poznań University of Technology, Poznań,
Poland, in 1981 and 1987, respectively, and the
Nilanjan Mukherjee (S’87–M’89–SM’14) received Dr. Hab. degree in telecommunications from the
the B.Tech. (Hons.) degree in electronics and electri- Technical University of Gdańsk, Gdańsk, Poland, in
cal communication engineering from IIT Kharagpur, 1994.
Kharagpur, India, in 1989 and the Ph.D. degree He was a Faculty Member of the Poznań Uni-
from McGill University, Montreal, QC, Canada, in versity of Technology from 1982 to 1990. In 1990,
1996. he joined McGill University, Montreal, QC, Canada,
He is currently the Software Development Director where he was a Research Associate and an Adjunct
of the Design-to-Silicon Division at Mentor Graph- Professor. In 1996, he was a Professor with the Faculty of Electronics and
ics Corporation, Wilsonville, OR, USA. Prior to Telecommunications, Poznań University of Technology. He has authored eight
joining Mentor Graphics Corporation, he was with books, more than 120 research papers in the above areas, and is a co-
Lucent Bell, Holmdel, NJ, USA. He is a co-inventor inventor of 55 U.S. patents. His current research interests include the design
of the EDT technology, and was a Lead Developer for the leading test automation and testing of very large-scale integration (VLSI) systems, design
compression tool in the industry, TestKompress. He has authored more than 45 for testability, built-in self-test, embedded test, and computer simulation of
technical articles, and is a co-inventor of 39 U.S. patents. His current research discrete event systems.
interests include next-generation test methodologies for deep submicrometer Dr. Tyszer was the Guest Co-Editor of the special issue of the IEEE
designs, test data compression, test synthesis, memory testing, and fault C OMMUNICATIONS M AGAZINE devoted to testing of telecommunication
diagnosis. hardware in 1999. He has served on technical program committees of various
Dr. Mukherjee was a co-recipient of the Best Paper Award at the 1995 conferences. He was a co-recipient of the 1995 and 1998 Best Paper Awards
IEEE VLSI Test Symposium, the Best Paper Award at the 2009 VLSI Design at the IEEE VLSI Test Symposium, the 2003 Honorable Mention Award at
Conference, the Best Student Paper Award at the 2001 Asian Test Symposium, the IEEE International Test Conference, the 2006 IEEE Circuits and Systems
the 2006 IEEE Circuits and Systems Society Donald O. Pederson Outstanding Society Donald O. Pederson Outstanding Paper Award recognizing the paper
Paper Award recognizing the paper on embedded deterministic test in the on embedded deterministic test published in the IEEE T RANSACTIONS ON
IEEE T RANSACTIONS ON C OMPUTER -A IDED D ESIGN OF I NTEGRATED C OMPUTER -A IDED D ESIGN OF I NTEGRATED C IRCUITS AND S YSTEMS , the
C IRCUITS AND S YSTEMS , and the 2012 IEEE International Test Conference 2009 Best Paper Award at the VLSI Design Conference, the 2011 Best
Most Significant Paper Award. He served on the program committees of Paper Award at the IEEE European Test Symposium, and the 2012 IEEE
several IEEE conferences. International Test Conference Most Significant Paper Award.

Scan Test Bandwidth Management For Ultralarge-Scale System-on-Chip Architectures

Uploaded by

Copyright:

Available Formats

You might also like

Scan Test Bandwidth Management For Ultralarge-Scale System-on-Chip Architectures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Scan Test Bandwidth Management For Ultralarge-Scale System-on-Chip Architectures

Uploaded by

Copyright:

Available Formats

1050 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO.

Scan Test Bandwidth Management for

number of channels. As a result, test scheduling and TAMs

selector pin of each demultiplexer is controlled by a signal

Fig. 4. Pipeline architecture.

IV. T EST T IME R EDUCTION

Fig. 7. Core-level test data volume statistics.

Fig. 6. Test data volume for industrial design.

Fig. 8. Core-level PC statistics.

Fig. 9. Pin layout constraints. (a) Baseline. (b) Partitioned.

certain cores can only use a subset of I/O pins because of

run our dynamic SoC bandwidth management. Given stuck-at

Fig. 11. Test time reduction.

[3] A. Chandra and K. Chakrabarty, “A unified approach to reduce SOC test

Wu-Tung Cheng (S’84–M’85–SM’90–F’00)

Jerzy Tyszer (M’91–SM’96–F’13) received the

You might also like