Professional Documents
Culture Documents
Xstat: Statistical X-Filling Algorithm For Peak Capture Power Reduction in Scan Tests
Xstat: Statistical X-Filling Algorithm For Peak Capture Power Reduction in Scan Tests
Satya Trinadh Adireddy1, Seetal Potluri2, Shankar Balachandran3, Ch. Sobhan Babu4 and
V. Kamakoti5*
1
Indian Institute of Technology Hyderabad, Hyderabad – 502 205, India, Email: cs11p1001@iith.ac.in
2
Indian Institute of Technology Madras, Chennai - 600 036, India, Email: potluri6@gmail.com
3
Indian Institute of Technology Madras, Chennai - 600 036, India, Email: shankar@cse.iitm.ac.in,
4
Indian Institute of Technology Hyderabad, Hyderabad – 502 205, India, Email: sobhan@iith.ac.in
5
Indian Institute of Technology Madras, Chennai - 600 036, India, Email: kama@cse.iitm.ac.in, Telephone:
+91-44-22574368, Fax: +91-44-22574352
Address:
Indian Institute of Technology Madras
Department of Computer Science and Engineering
Chennai – 600 036.
Office: +91-44-22574368
Fax : +91-44-22574352
Email : kama@cse.iitm.ac.in
Date of Receiving: to be completed by the Editor
Date of Acceptance: to be completed by the Editor
XStat: Statistical X-filling Algorithm for Peak Capture Power
Satya Trinadh Adireddy1, Seetal Potluri2, Shankar Balachandran3, Ch. Sobhan Babu4 and
V. Kamakoti5*
Abstract — Excessive power dissipation can cause high voltage droop on the power grid, leading
to timing failures. Since test power dissipation is typically higher than functional power, test
peak power minimization becomes very important in order to avoid test induced timing
failures. Test cubes for large designs are usually dominated by don't care bits, making X-
leveraging algorithms promising for test power reduction. In this paper, we show that X-bit
statistics can be used to reorder test vectors on scan based architectures realized using toggle-
masking flip flops. Based on this, the paper also presents an algorithm namely balanced X-
filling that when applied to ITC'99 circuits, reduced the peak capture power by 7.4% on the
average and 40.3% in the best case. Additionally XStat improved the running time for
Test Vector Ordering and X-filling phases compared to the best known techniques.
Keywords — Peak Capture-Power, X-bit Statistics, Test Vector Ordering (TVO), Scan-based
According to Dennard’s scaling, power density should remain constant even with increasing device
densities. But exponential increase in sub-threshold leakage with threshold voltage scaling caused
leakage power to dominate total power consumption. Due to this, threshold voltage scaling and
Dennard’s scaling came to an end below 100nm [1], causing power density to rise exponentially with
successive technology generations. Today aggravated power densities and hot spots have become one
of the most important concerns in the nanoscale circuit design. Additionally power dissipation for test
patterns are several times higher than that of functional patterns [2]. Since power grid is designed for
functional patterns, the excessive test power dissipation can cause excessive voltage droop [3], [4],
causing timing failures. Since such elevated power levels are not observed during regular operation,
such timing failures are categorized as false failures, which reduce the yield of a product.
When using a normal scan flip-flop1 , the transitions at the flip-flop outputs enter the combinational
logic during scan-shift. This causes huge power dissipation in the combinational logic that contributes
to more than 60% of the total power dissipation in the circuit during scan-shift [2]. In [2], [5], [6] a
novel scan flip-flop was proposed, which masks the scan-shifting activity from entering the
combinational logic, wherein an alternate slave-latch was provided for scan-shift so that the state of
combinational logic is unaltered during scan-shift. It can be seen that the slave latch is disabled during
scan-shift, but it latches onto the corresponding bit of the previous scan vector. And an alternate slave
latch is provided for the purpose of scanning bits during scan-shift. But the response will alter the state
of the combinational logic during capture phase. To ensure that the state of combinational logic is
By normal scan flip-flop, we mean a flip-flop converted to a scan flip flop by adding a multiplexer at its data input.
1
used. But this enhanced scan flip flop comes with huge area and physical design overhead, due to
extra clocks that need to be routed to each flip flop. In [22], a new technique called First Level
Holding scheme was introduced, which has 33% lesser area than enhanced scan scheme, yet
preserving the advantages of enhanced scan. Although these techniques were originally proposed for
the purpose of delay test, they are equally applicable for stuck-at test. In this paper, we assume that
the FLH scheme is used and under this scenario, the circuit states are preserved from one scan vector
to another. Thus, if we consider the successive capture cycles, it is as good as applying the scan
There were several works in the past on peak power minimization during shift [7], [8] as well as
capture [10]–[17] phases of scan based testing. If scan-shift toggles are blocked from entering
combinational logic, about 70% of scan-shift power consumption [2] is saved. Thus peak power
problem primarily occurs in the capture cycles [12], [19]. As we have already described, for the flip
flop under consideration, capture power is dependent on application of a pair of test vectors − the
previous test vector followed by the current test vector. In [7], it was shown how Test Vector
Ordering (TVO) for average capture power minimization problem maps to the well known Least Cost
Hamiltonian Path Problem, which is NP-Hard. In the same paper, a 2-approximation algorithm for
TSP was used to achieve reasonably good solutions. Similarly in [19], it was shown how test vector
ordering for peak capture power minimization problem maps to Bottleneck Hamiltonian Path Problem,
which is also NP-Hard. In the same paper, an efficient heuristic was deployed to get optimal solutions
As far as running time is concerned, apart from the difficulty in searching optimal Hamiltonian cycles,
it is very time consuming to compute all edge weights and create the input complete graph. This is
because the computational complexity of creating complete graph is O(n2), and each edge weight
computation requires a full blown power simulation which is very time consuming. Thus using power
simulations and then solving the test vector ordering problem, is practically infeasible for large circuits.
In [7], Input Switching Activity (ISA) was shown to correlate well with power dissipation inside the
circuit. Since primary input count is typically much smaller than total gate count, ISA cost function can
Although ISA can be quickly computed, still O(n2 ) computations are necessary. Additionally it is well
known that large circuits have more than 70-80% X-bits [14] in them. Both of these reasons motivate
to come up with a more efficient and X-bit leveraging algorithm for test vector ordering. In this paper,
we propose XStat, that uses X-bit statistics to meet both of these objectives. XStat utilizes X-bits very
effectively to order the test vectors for reducing peak capture power, and the same
time its computational complexity is O(n × log(n)), as opposed to O(n2 ) of the earlier method. For
the largest ITC’99 circuit, XStat achieved 34.2% peak capture power reduction and 1.6× reduction in
runtime. The details of the algorithm and the results thus obtained are explained in the following
sections.
The impact of any X-filling algorithm depends on the percentage of X-bits in the test cubes. Table
IV shows the average percentage of X-bits (over all the test cubes) for each benchmark circuit. It can
be seen that as circuit size increases, the contribution of X-bits also increases, motivating the need
for an effective and efficient X-filling algorithm to reduce the peak capture consumption during
scan test. The benchmarks b15, b17, b18 and b19 have more than 80% of X-bits in their test cubes, on
an average. Now in order to exploit this properly, it is also interesting to see how many of the
test vectors have large number of X-bits and how many have very few X-bits. This distribution is
shown in Fig. 1. By interspersing test vectors with large number of X-bits with test vectors with few
number of X-bits, scope for X-filling increases. Thus test vector ordering can improve the impact of
X-filling in reducing the peak capture switching activity. The steps involved in test vector ordering
A. Motivation
By definition, the X-bits (don’t care bits) in the test cubes generated by ATPG tool can be filled with
0 or 1, without affecting fault coverage. Usually these X-bits are filled in such a way that capture
power can be minimized [14]. We also know that test vector ordering has the potential to effectively
reduce peak capture power [19]. But none of the existing methods address the simultaneous filling of
X-bits and test vector ordering in a single framework. In this paper, we develop a technique called
XStat, which uses the X-bit statistics to order the test vectors for capture power minimization. Unlike
[7], where X-bits need to be filled prematurely to perform the the ordering process, XStat preserves
Given a set consisting of non-specified test cubes and an ordering given by the tool, we form a
binary matrix by placing the test cubes in columns, as shown in Fig. 2. Each row corresponds to a
Primary Input (PI) or Pseudo Primary Input (PPI), i.e., output of a scan flip flop. Hence each row is
denoted with the label (P)PI, and the corresponding index as subscript.
For a given test vector ordering, peak power is dependent on one of the test vector pairs in the
ordering. Since we are having non-specified test cubes, we need to analyze all the test cube pairs in the
ordering before performing a detailed X-filling to achieve optimal reduction in peak capture power.
We introduce a new statistic called X-base to analyze the adjacency X-bit distribution in test cube
pairs, for a given test vector ordering.
Fig. 2 shows how to compute the X-base for adjacent test cubes TC1 and TC2 . For every pair of test
cubes, the X-base is initialized to zero. Each row in the sliding window contains 2 bits and all rows are
sequentially visited to increment X-base. When a row is visited, even if one among the 2 bits is an X-
bit, the X-base is incremented by 1, before visiting next row. For TC1, TC2 pair shown in Fig. 2, we
encounter 4 cases of XX and 1 case of X0 , making the X-base settle at 4+1=5. If we analyze the same
for all adjacent column pairs in the binary matrix, we get a distribution for X-base.
Fig. 3 shows the minimum (denoted by MIN) and maximum (denoted by MAX) statistics of X-base
for different benchmarks. It is interesting to note that the more the X-base value, the more there is
scope for X-filling to reduce capture power. Since peak capture power computation requires a
consideration of all the test vector pairs in the ordering, the MIN column is especially significant since
it creates a bottleneck as to how much the peak power can be reduced by X-filling. It can be seen that
for larger circuits, MIN is an order of magnitude smaller than MAX, showing a very weak
scope for minimizing peak power. This motivates the need to reorder the test vectors to bring down
the difference between MIN and MAX, thereby reducing peak power. This concept is similar to
balancing logic between pipeline stages to maximize the processor clock frequency.
We first sort the test cube set {TC1 , TC2 , TC3 . . . TCn } in increasing order of the number of the X-
bits they contain. Let the new ordering be π = {TC1 , TC2 , TC3 . . . TCn }. Since TC1 contains fewest
number of X-bits, we bring TCn in between TC1 and TC2 to increase the scope for X-filling. In the next
iteration, we bring TCn−1 in between TC2 and TC3 . We continue this process until all the test cubes are
exhausted. It is observed that by performing test vector ordering in this fashion, the MAX value
remains as high as before but the MIN value approaches very close to MAX value. This is a very good
sign, since bringing MIN close to MAX signifies maximizing scope for peak power reduction. As
explained before, this is similar to balancing pipeline stages in a processor, which translates to making
the longest pipeline stage equal to smallest pipeline stage, which maximizes the clock frequency. Fig. 3
shows the gap before reordering. Fig. 4 shows how after reordering the gap between MIN and MAX
is reducing greatly, as desired. This motivates that X-based test vector 2 ordering has the potential to
After performing test cube ordering, we need to perform X-filling to convert test cubes to test
vectors. The X-filling is done in such a way that peak capture power is reduced. The MT-fill technique
[20] attempts to reduce adjacent toggles between patterns. As a result, it achieves total toggle
reduction. But Fig. 5 shows a scenario where MT-Fill performs poorly in reducing peak toggles. This
example shows how this situation occurs due to the greedy strategy of MT-Fill. On the contrary, B-Fill
reduces peak toggles effectively because it keeps track of 1→0 and 0→1 transitions and performs X-
filling to balance toggles from both sides of the test cube. Table. I shows the look-up table for
performing B-Fill technique. In this table, the first seven rows which resemble MT-Fill are applied to
the test cubes in the first iteration. After first iteration is over, the last four rows in the table are applied
to the test cubes in such a way that the toggles on both sides of the test cube is equal or almost equal.
This is done by filling 50% of the rows are filled with the left side values and remaining 50% of the
rows with right side values. This second iteration ensures that peak switching activity at the inputs
The details of the entire procedure of X-based ordering, integrated with B-Fill technique are given in
Algorithm 1. In Section V, we motivated how X-based test vector has potential to reduce peak power.
In this section, we have motivated how after ordering is decided, B-Fill technique is
It is to be noted that Figures 3 and 4 are having Y-axis in logarithmic scale, making the reductions very significant.
2
effective in further minimizing peak power. The details of the experimental setup and effectiveness
We have considered the ITC’99 benchmark suite to validate our algorithms. A 45nm standard library
is used for synthesis and placement. DesignCompilerTM , TetraMaxTM and SoCEncounterTM are used
for Synthesis, ATPG and Place-And-Route (PAR) phases respectively. After PAR, using
SoCEncounterTM interconnect capacitances are extracted to compute actual power values. The peak
switching activity and peak power reductions obtained using X-based test vector ordering are
summarized in Tables II and III. It can be seen here that X-based test vector ordering performs well
with increase in circuit size, although there are some outliers. In Tables II and III, the best fill for each
type of ordering is shaded, and the best among all of them is marked with a circle. For example,
considering b19 benchmark in Table II, best fill with TetraMax TM ordering, ISA ordering and X-based
orderings for least peak switching activity are MT-Fill, 0-Fill and MT-Fill respectively with the
corresponding peak toggle counts being 42663, 39820 and 29157 respectively. Thus for b19’s row in
this table, MT-Fill under TetraMaxTM ordering, 0-fill under ISA ordering and MT-Fill under X-based
ordering are shaded. Further, X-based ordering had the least peak switching activity amongst the
three. Thus for b19’s row, cell for MT-Fill under X-based ordering is not only shaded but also circled.
Similar markings can be observed for all the remaining benchmarks. Thus circled ones are the final best
Tables IV and V show all the orderings validated with different types of X-filling techniques. It is
interesting to note that the X-based test vector ordering supplemented with B-Fill outperforms all the
other possibilities for circuits with gate count > 1K.
The only comparison that is left to be verified is about the performance of B-Fill with different
orderings. Figures 6(a) and 6(b) show the normalized peak switching activity and normalized peak
power of B-Fill technique with different orderings discussed previously. The normalization is done
with respect to TetraMaxTM Ordering + B-Fill. It can be seen that both peak switching activity as well
It can be noted that Tables II and III give different results in many cases. This is due to the effect of
interconnect capacitances taken into account to compute actual power in Table III. The difference seen
between Tables IV and V is also understood as the effect of interconnect being considered
In Algorithm 1, step 6 shows how the ordering is done by placing the last vector (TCn ) between first
two (TC1 and TC2 ), second last vector (TCn−1 ) between second and third (TC2 and TC3 ) etc., to
produce the ordering π ={TC1 , TCn , TC2 , TCn−1 , TC3 , . . .}. It is to be noted that in step 6, we
chose to insert a test vector with large number of X-bits (TCn ) between two test vectors (TC1 and TC2
) both of which have relatively fewer number of X-bits, so that chances for
X-filling are maximized. Let us call this Forward Insertion strategy. We the reverse the strategy and
insert a test vector with few number of X-bits (TC 1 ) between two test vectors (TC n−1 and TCn ) both
of which have relatively larger number of X-bits, so that chances for X-filling are maximized. We call
this as Reverse Insertion strategy. The peak toggles and peak power values obtained while using
XStat with these Forward Insertion and Reverse Insertion strategies are compared in
Table VI. It can be seen that for some benchmarks, Forward Insertion is better and for some
Reverse Insertion is better. Overall, both of these techniques perform similarly.
Since computational complexity of XStat based TVO process is O(n × log(n)), while X-fill is O(n),
runtime of Xstat is primarily dependent on the ordering process. Table. VII shows how in all
thebenchmarks considered, X-base consistently performs better than ISA. This is understandable
because computational complexity of XStat based TVO is O(n × log(n)), which is better than ISA
based TVO which is O(n2 ). Thus, apart from reducing peak capture power, Xstat is also
computationally efficient. For different ITC’99 circuits, the achieved speed-up is in the range 1.6× −
542.8×. Although the running time of XStat is increasing with input size in almost all cases, the speed-
up is in a wide range and not increasing with input size. This uncertainty is because of the uncertainty
in searching for least cost Hamiltonian cycle as part of the ISA based test vector ordering [7] 3.
VIII. CONCLUSIONS
In this paper, we showed that X-bit statistics can be used to obtain peak capture-power efficient
test-cube orderings. The X-based test vector ordering method supplemented with balanced X-filling
technique is shown to be very effective in reducing peak capture power, compared to other existing
test vector ordering and X-filling techniques. The proposed algorithm XStat performs better with
increase in circuit size, in terms of power reduction. XStat is also computationally more efficient
References
3
Note that the least cost Hamiltonian cycle computation is not only dependent on complete graph size but also on the
edge weight distribution. For example, if all edge weights are equal, then any arbitrary Hamiltonian\cycle is a least cost
Hamiltonian cycle and the running time is very short.
[1] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, A. R. LeBlanc, “Design of ion-
implanted MOSFET’s with very small physical dimensions”, IEEE Journal of Solid-State Circuits,
[2] S. Gerstendorfer and H. J. Wunderlich, “Minimized Power Consumption for Scan-based BIST”,
Filling for Effective IR-Drop Reduction in At-Speed Scan Testing”, Design Automation Conference,
Optimization Framework for Power-Safe Scan Test”, VLSI Test Symposium, IEEE , 2007, pp. 167-
172.
[5] C. Thomas Glover and M. Ray Mercer. “A Method of Delay Fault Test Generation”, Design
[6] N. Parimi and Sun Xiaoling, “Toggle-masking for test-per-scan VLSI circuits”, International
Symposium on Defect and Fault Tolerance in VLSI Systems, IEEE, 2004, pp. 332-338.
test application by test vector ordering”, International Symposium on Circuits and Systems, IEEE,
Dissipation in Scan and Combinational Circuits During Test Application”, IEEE Transactions on
Computer Aided Design of Integrated Circuits and Systems, Vol. 17, No. 2, 1998, pp. 1325-1333.
[9] K. J. Lee, T. C. Haung and J. J. Chen, “Peak-power reduction for multiple scan circuits during test
BIST design with minimized peak power consumption”, Asian Test Symposium, IEEE, 1999, pp. 89-
94.
[11] R. Sankaralingam, N. A. Touba, “Controlling peak power during scan testing” VLSI Test
[12] J. Saxena, K. Butler, V. Jayaram, S. Kundu, N. Arvind, P. Sreeprakash and M. Hachinger, “A case
study of IR-drop in structured at-speed testing”, International Test Conference, IEEE, 2003, pp. 1098-
1104.
[13] W. Li, S. M. Reddy, I. Pomeranz, “On reducing peak current and power during test”, Annual
[14] S. Remersaro, X. Lin, Z. Zhang, S. M. Reddy, I. Pomeranz and J. Rajski, “Preferred Fill: A
Scalable Method to Reduce Capture Power for Scan Based Designs”, International Test Conference,
[15] S. Almukhaizim and O. Sinanoglu, “Peak Power Reduction Through Dynamic Partitioning of
[16] J. Tudu, E. Larsson, V. Singh, H. Fujiwara, “Scan Cells Reordering to Minimize Peak Power
during Scan Testing of SoC”, Workshop on RTL & High Level Testing, IEEE, 2009.
[17] J. Tudu, E. Larsson, V. Singh, “Graph Theoretic Approach for Scan Cell Reordering to Minimize
Peak Shift Power”, Great Lakes Symposium on VLSI, IEEE, 2010, pp. 73-78.
[18] P. Shanmugasundaram, V. D. Agrawal, “Dynamic scan clock control for test time reduction
maintaining peak power limit”, VLSI Test Symposium, IEEE, 2011, pp. 248-253.
[19] S. T. Adireddy, S. Potluri, Ch. S. Babu and V. Kamakoti, “An Efficient Heuristic for Peak
Capture-Power Minimization during Scan-based Test”, ASP Journal of Low Power Electronics, Vol. 9,
No. 2, 2013.
scan vector power dissipation”, VLSI Test Symposium, IEEE, 2000, pp. 35-40.
overhead Delay Testing Technique for Arbitrary Two-Pattern Test Application' ',
Design Automation and Test in Europe, IEEE, 2005, pp. 1136 – 1141.
FIGURES AND TABLES
Fig. 1 X axis: Percentage of X-bits in Test Cubes, Y axis: Number of Test Vectors
A. Satya Trinadh1 is a PhD student in Computer Science and Engineering from the Indian Institute of
Technology Hyderabad. His areas of specialization include Low Power Digital VLSI Design and Test.
Seetal Potluri2 is a PhD student in Electrical Engineering from the Indian Institute of Technology
Madras. His areas of specialization include Low Power Digital VLSI Design and Test.
Institute of Technology Madras. His areas of specialization include VLSI Design Automation,
Ch. Sobhan Babu 3 is an Assistant Professor in the Department of Computer Science and Engineering
at Indian Institute of Technology Hyderabad. His areas of specialization include Graph Theory, Game
Institute of Technology Madras. His areas of specialization include Computer Architecture, VLSI