Professional Documents
Culture Documents
Thermal-Aware Testing of Network-on-Chip Using Multiple-Frequency Clocking
Thermal-Aware Testing of Network-on-Chip Using Multiple-Frequency Clocking
Thermal-Aware Testing of Network-on-Chip Using Multiple-Frequency Clocking
Computer and Electronic Engineering University of Nebraska-Lincoln Omaha, NE 68182, USA chunshengliu@unlnotes.unl.edu
IBM Microelectronics 1000 River Road, Building 863B Essex Jct, VT 05452, USA vikrami@us.ibm.com
Abstract
Chip overheating due to excessive and unbalanced power dissipation has become a critical problem during test of complex core-based systems. In this paper, we address the overheating problem in network-on-chip systems by using on-chip multiple-frequency clocking. We control the core temperatures during test scheduling by varying the test clock frequency assigned to each core, so that the power dissipation of each core during test can be adjusted individually and thermal balance is achieved. We present a heuristic where the optimization process can be integrated with test scheduling. Experimental results for NoC benchmarks show that the proposed method can guarantee thermal safety and yield better thermal balance.
1 Introduction
Network-on-chip (NoC) has been proposed as the preferred interconnection scheme for the next generation complex VLSI [1, 17] systems to replace the traditional systemon-Chip (SoC) methodology. This new paradigm relies on a packet-switching network implemented on the chip to provide high performance interconnection to embedded cores. Compared to traditional SoC, testing for NoC-based systems poses considerable challenges due to the existence of highly complex network components [17]. Figure 1 shows the implementation of the system d695 [12] in an NoC architecture. Thermal management during the design process has been studied using layout redesign and thermal placement [2, 4, 6]. However, efcient thermal management during test remains a challenge. High power dissipation during test can cause high power density, which forms hot spots. The problem becomes even more acute for core-based systems, since embedded cores can have a large variation in die size and power dissipation. In an ad hoc test schedule, cores having lower power dissipation or larger die size may remain cool, while those having higher power dissipation or smaller die size can overheat and cause damage. In addition, thermal re-placement of cores is impossible because layout is optimized for functional operation and is already xed at the time of test. Finally, switching activity across the chip differs considerably between functional operation and test. For example, cores that operate concurrently in functional mode may not be tested in parallel, which further aggravates thermal imbalance on the chip. A simple strategy is to use a slower test clock to reduce
power dissipation and to guarantee thermal safety. However, it is inefcient because thermal balance among cores cannot be achieved. As a result, the cores generating less heat are unnecessarily cooled down, which will adversely affect test time and increase test cost. Prior work attempts to reduce the excessive test power by using power constrained scheduling. However, it has been shown that power constraints cannot guarantee thermal safety [11, 14]. Recent work has attempted to achieve thermal safety for traditional SoC testing through test scheduling [11, 14, 15]. These methods are based on the use of dedicated test access mechanisms (TAMs) and the results are tightly related to the TAM design. In an NoC-based system, however, the implementation of network components (routers, channels, etc.,) has already imposed a considerable amount of area overhead. Therefore, recent advances tend to reuse the existing on-chip network for test data transportation without introducing new overhead [5, 10]. Therefore, existing thermal-aware testing methods for SoCs are not directly applicable to reused-based NoC testing. In this paper, we propose a new method for thermal-aware test scheduling in NoC-based systems. It is based on the use of multiple-frequency on-chip clocking [10]. This is specically designed for NoC-based systems because here cores are globally asynchronous, and they communicate by sending and receiving messages in the form of packets via the network. In the proposed method, each core can receive one of several test clock frequencies generated by on-chip logic. During test application, a core can vary its power dissipation, and hence temperature, by choosing a different clock frequency based on the test control information carried in test packets. Slower clocks are used to reduce temperature while faster clocks are used to reduce test time. This dynamic clock frequency scaling scheme can not only guarantee thermal safety but also achieve thermal balance and optimized test time. We present a heuristic algorithm by which the assignment of variable clock frequencies to each core can be optimized to achieve thermal balance. This can eventually reduce hot spot temperature and also reduce test time. The effectiveness of the method is corroborated by experimental results on several NoC benchmarks. Note that this scheme is different from the one proposed in [15], because the proposed method does not require variable tester clock frequencies during test application and cores scheduled simultaneously can use different on-chip clock fre-
Proceedings of the 24th IEEE VLSI Test Symposium (VTS06) 0-7695-2514-8/06 $20.00 2006
IEEE
peratures. All these approaches are designed for SoC systems based on dedicated TAM, but cannot be directly applied to NoC-based systems. A most recent work [18] attempts to obtain thermal safety and reduced test time simultaneously by using both faster and slower on-chip clocking. However, the thermal optimization process is applied on top of the scheduling and the test clock frequency assigned to a core cannot be varied during test application.
3 Multiple-frequency clocks
Figure 1. System d695 implemented in NoC architecture. quencies. Moreover, the test clock frequency of a core at a specic time is selected from several available on-chip clock frequenciess but cannot be scaled arbitrarily as in [15]. In this paper we use the term NoC to denote an on-chip interconnection network of routers and channels that may be implemented on an SoC, and the term NoC-based system to denote the system including NoC and embedded cores. The rest of the paper is organized as follows. In Section 2, we review some related prior work. In Section 3, we introduce the use of multiple-frequency on-chip clocking in testing NoC-based system and the use of thermal constraints in this work. In Section 4, we present a heuristic that integrates the assignment of test clocks to cores in test scheduling. Finally in Section 5, we present experimental results on several NoC benchmarks. Here, we introduce the use of multiple-frequency on-chip clocking in testing NoC-based system and the use of thermal constraints.
2 Prior work
Thermal management and hot spot removal during design process for functional operation have been proposed [2, 4, 6]. These methods rely on either layout replacement or task management to achieve an even thermal distribution on the chip. However, they are optimized for functional operation but not for testing. Therefore, these approaches are not suitable for thermal management in testing. Most prior work deals with the overheating problem by using power constraints. Test scheduling algorithms for SoCs with power constraints have been extensively investigated [8, 9, 13]. More recently, power-aware test scheduling algorithms have been presented for NoC-based systems [5, 10]. However, it has been shown that using power constraints cannot guarantee thermal safety [11, 14]. This will be further corroborated in our experimental results. Recently, some thermal-aware test scheduling methods have been proposed [11, 14, 15, 18]. In [14], a test session thermal model is used to determine heat transfer characteristics during test, and a heuristic is used to obtain a schedule without violating a temperature constraint. In [15], variable tester clock frequencies are used to control the power dissipation in different test sessions to guarantee thermal safety. A disadvantage of this approach is the requirement of variablefrequency clocks on the tester. In [11], heuristics using layout information and progressive weighting are proposed to achieve thermal balance on the chip and reduced hot spot tem-
Proceedings of the 24th IEEE VLSI Test Symposium (VTS06) 0-7695-2514-8/06 $20.00 2006
IEEE
Mux Test data Core A tested using slow clock Wrapper Core A 4 Router 4 Router
Wrapper Core B
Figure 2. Test architecture using on-chip clocking [10]. Figure 5. Clock frequency selection in NoC for core testing. data (carried in payload). Note that this does not require additional hardware since all hardware is implemented for functional operation and reused in testing [5, 10]. One eld in test control can be specied as test clock frequency selection, which is used to select one of the on-chip clock frequencies for testing the core. It can be seen that in essence each test vector can be applied using a different clock frequency. However, test scheduling in NoC is NP-complete [10] and frequently switching test clock frequencies can signicantly affect the efciency of the test scheduling algorithm and yield compromised test time. Therefore, in this paper we only switch test clock frequencies when a change occurs in scheduling, i.e., whenever the test of a core is nished or a new test is started. It can be concluded that in an NoC-based system, cores that are being scheduled simultaneously can be tested using different frequency clocks. Moreover, the test clock frequency on each core during the scheduling process can be varied. Note that this scheme is different from the clock scaling scheme used in [15], which is designed for traditional SoC system and a variable tester clock is required. Here we do not require variable-rate tester clocks during test. Instead, variable-rate clocks are generated on chip. Moreover, in [15] cores being tested use the same test clock frequency, while the proposed scheme takes the advantage of NoC architecture and can apply different clock frequencies on different cores. Finally, in [15] the tester clock frequency can be scaled arbitrarily. But in this paper the clock frequency only needs to be selected from the available on-chip clock frequencies.
on the chip. Using a slower clock can reduce hot spot temperature while using a faster clock can reduce test time, as long as the thermal safety is guaranteed. If the original tester clock diffrequency is , we can assume there are a total of ferent on-chip clock frequencies. E.g. for , the set of available clock frequencies can be . However, using a xed test clock frequency (either lower or higher than the original clock frequency ) for a core throughout its test application may not be efcient in all the cases. This is illustrated in Figure 4 where three cores being scheduled. Cores are represented by rectangles and the heights of the rectangles correspond to their test clock frequency (and hence power dissipation and temperature). In Figure 4(a), after the test of Core 1 is nished, Core 3 can be scheduled on the available network resource (details shown in Section 4). However, it is possible that core 3 is a relatively cool core and the overall chip temperature is well under the thermal safe constraint. Therefore, we can increase the test clock frequency of Core 2 from this time, as long as the thermal safe constraint is not violated. As a result, the overall test time is reduced from to . In Figure 4(c), after Core 1 is nished, scheduling Cores 2 and 3 simultaneously will cause a high core temperature (hot spot) that violates the thermal constraint. Therefore, we decrease the test clock frequency of Core 2. Note that although the test time of Core 2 is increased accordingly, the overall test time may not be compromised, in . Note that this change may occur whenthis example ever the schedule is changed, hence the test clock frequency of a core may vary several times during its test. It can be seen that this variable-rate clocking scheme (or clock scaling) can reduce test time and achieve thermal safety. A possible hardware architecture is shown in Figure 5. Test packet routed to the core should be rst unpacked to obtain test control information (carried in packet header) and test
Proceedings of the 24th IEEE VLSI Test Symposium (VTS06) 0-7695-2514-8/06 $20.00 2006
IEEE
of the system d695 in such a network where Cores 8 and 10 are being tested using dedicated routing paths. We assume in this work that the NoC itself (routers, channels etc.) is already tested as fault free and we focus on testing the cores. We handcraft a oorplan for each benchmark. We also assume that the router associated with each core is integrated with the core in layout so that the oorplan of a core represents the core, the router and the interconnections between them. The power generated by a core during testing is calculated based on its complexity, e.g. the number of ip-ops, I/Os, test vectors of the average core etc. Router power is assumed to be power. We neglect the details due to the lack of space.
, F,CLK )
/*Create ordered list of various on-chip clocks*/ ,F,CLK); 1. Thermal integrity check(C, P 2. Set initial clock assignment to cores; 3. Sort cores in decreasing order of test time under current clock assignment; 4. Permute the combinations of I/O pairs; 5. For each permutation 6. While there are unscheduled cores 7. For each unscheduled core 8. Find a free I/O pair; 9. If no free I/O pair , F, CLK); 10. Clock adjustment(C,N,P 11. Update temperatures and current time, repeat from Line 6; 12. Else 13. If NoC check path=PATH BLOCK 14. If all cores have been attempted , F, CLK ); 15. Clock adjustment(C,N,P 16. Update temperatures and current time, repeat from Line 6; 17. Else 18. Try next core in the list, repeat from Line 13; 19. Else 20. Assign core to path, update time tags.
in Line 5 for each permutation. We try to assign the rst core in the list to the rst available I/O pair. If no I/O is available, it indicates that all network resources have been utilized and no more cores can be scheduled at current time, i.e. current schedule is determined. We then invoke a Clock adjustment procedure in Line 10, trying to obtain thermal safety and reduced test time by adjusting the clock frequency assignment to cores currently being scheduled, which will be shown in Figure 7. After clock frequencies are adjusted, we update the temperatures of cores by thermal simulation as well as the current time in Line 11. The scheduling will then repeat at the new time. Otherwise, if a free I/O pair is found, a tentative routing path is created, and the subroutine NoC check path is used to check if there is any resource (network channels and I/Os) conict on this path, see Line 13. We neglect the details for path checking due to the lack of space. A time tag is maintained on every network resource to indicate its availability for routing. If due to the resource conict no core can be scheduled, then Clock adjustment is invoked again in Line 15 and the temperatures and current time are updated in Line 16. If no path conict is detected in Line 13, the core is scheduled at current time and next core will be attempted. The Clock adjustment routine is shown in Figure 7. It is invoked when the schedule at a specic time is determined, hence we only consider the cores that is currently being scheduled, included in set . We rst run thermal simulation to determine the core temperatures under current clock frequency assignments in Line 3. If the hottest core temperature ex, the current clock frequency assignment violates ceeds the thermal safe constraint. We save the assignment in a list L such that it will not be attempted in the subsequent optimization process again to save simulation time. We then invoke the Adjust clk process to slow down the clock frequency on a hot core to obtain thermal safety in Line 9. If is not
Proceedings of the 24th IEEE VLSI Test Symposium (VTS06) 0-7695-2514-8/06 $20.00 2006
IEEE
, F, CLK ) ;
1. Find set of cores currently being tested , 2. While 1 /*adjust clock assignment*/ 3. Thermal simulation to obtain core temperatures; 4. Find core with the highest temperature ; and 5. If 6. break; /*adjustment nished*/ 7. If /*Thermal violation*/ 8. Save assignment in list L; 9. If Adjust clk( , slow, L, CLK)=FAIL 10. Return FAIL; 11. Else /*Thermal safety met*/ 12. If Adjust clk( , fast, L, CLK)=FAIL 13. Return FAIL; 14. Update test time and all time tags, . 15.
because they require many optimization runs and the current Hotspot tool for thermal simulation is still computational intensive. The proposed heuristic is fast and efcient, as corroborated by the experimental results in Section 5.
5 Experimental Results
In this section, we present experimental results for NoC benchmarks: four are crafted from four ITC02 SoC Test Benchmarks [12], and three are created using complex industrial cores from IBM [7]. The IBM benchmarks include processor cores, digital logic cores, embedded PowerPC register arrays and fast serial links. We created hypothetic , layout for each of them. We set and ambient temperature is set to . All simulations can be concluded on a Sun Blade2000 workstation with 1.2G CPU in less than 5 minutes. Before experiments, we set up the parameters for thermal simulation, including layers (silicon, interface material, heat spreader, heat sink), thickness and thermal resistances of all layers and materials, chip dimensions, convection capability of heat sink etc. We omit the details due to the lack of space. Note that the HotSpot thermal model takes into account both lateral and vertical heat conduction and convection. In experiment, we will compare the proposed scheme with the method using power constraint in [10]. We rst show that power constraints cannot guarantee thermal safety. We perform test scheduling with thermal simulation using the powerconstrained test scheduling in [10]. We set the power conand of the sum of all cores power, straint to be corresponding to loose and tight power constraints, respectively. And we obtain the corresponding highest core temper(hot spot) during the schedule (in ) and the ature system test time (in clock cycles). We also calculate the av, the maximum variation of erage of core temperatures (temperature difference between core temperatures the hottest and coolest cores) and the average variation of core temperatures (average difference between core temper) . These values reect the thermal ature and balance characteristic on the chip. Results are shown in Tables 1 and 2.
NoCs d695 g1023 p22810 p93791 IBM-1 IBM-2 IBM-3 Time 12917 11624 144802 523224 7873606 781424 3277504 264.0 298.9 311.0 256.9 292.0 276.3 241.0 134.8 158.2 184.1 130.3 192.6 158.5 148.0 191.8 201.4 221.3 191.3 220.7 215.0 148.5 129.2 140.6 126.8 126.6 99.4 117.8 93.0
1. If /*use slower clock*/ in decreasing order of temperatures; 2. Sort cores in 3. For each core in 4. If slower clock CLK and not assignment L 5. Return clock frequency assignment; 6. Return FAIL; 7. Else /*use faster clock*/ 8. Sort cores in in increasing order of temperatures; 9. For each core in 10. If faster clock CLK and not assignment L 11. Return clock frequency assignment; 12. Return FAIL.
exceeded, we use Adjust clk to apply a faster clock on a cool core to reduce test time in Line 12. If Adjust clk is success, we need to update the test time of the core and all time tags in the NoC. This is because after clock frequency adjustment, not only has the test time of the core need to be updated, but the time tags on all network resources and the bandwidth in the time division scheme (if a slower clock is used) may also need to be updated. These changes will affect all the subsequent scheduling. The process will quit when a maximum is reached and the number of optimization runs thermal safety is obtained. The process of Clock adjustment is outlined in Figure 8. We simply check if the new slower or faster clock is available in CLK, note that this new clock frequency should have passed the thermal integrity checking to guarantee that the core can be scheduled. In addition, the new clock frequency assignment should not appear in L, otherwise the thermal safety will be violated. We note that the optimization process at each point of the schedule is similar to a simulated annealing process, where while hot spot temperatures are gradually reduced below test time is reduced by increasing test clock frequency on cool cores. The optimization of multiple-frequency clocks during scheduling will eventually achieve thermal balance over the chip. We do not directly apply simulated annealing algorithms
power constraints.
It can be seen that although stringent power constraints ( ) are used to restrict power dissipation, the maximum temperature is still far over the thermal safe threshold of . Note that under power constraint, no schedule can be generated for d695 and p93791 because the schedule of a single core will violate the constraint. We also observe that using a tighter constraint will cause a signicant increase
Proceedings of the 24th IEEE VLSI Test Symposium (VTS06) 0-7695-2514-8/06 $20.00 2006
IEEE
[2] G. Chen and S. Sapatnekar. Partition-driven standard cell thermal placement. Proc. Int. Symp. on Physical Design, pp. 75 80, 2003. [3] V. Chickermane, P. Gallagher, S. Gregor and T. St.Pierre. A building block BIST methodology for SOC designs: A case study. Proc. Int. Test Conf., pp. 111120, 2001. [4] C. N. Chu and D. F. Wong. A matrix synthesis approach to thermal placement. IEEE Trans. on CAD, vol. 17, pp. 1166 1174, Nov 1998. [5] E. Cota, L. Carro and M. Lubaszewski. Reusing an On-Chip Network for the Test of Core-based Systems. ACM Trans. on Design Automation of Electronic Systems, vol. 18, pp. 471 499, 2004. [6] W. Hung et al. Thermal-aware IP virtualization and placement for networks-on-chip architecture. Proc. Int. Conf. on Computer Design, pp. 430437, 2004. [7] ASICs Test Methodology, IBM Corporation, Essex Junction, VT 05403. [8] V. Iyengar and K. Chakrabarty. System-on-a-chip test scheduling with precedence relationships, preemption, and power constraints. IEEE Trans. on CAD, vol. 21, pp. 10881094, Sep 2002. [9] E. Larsson, K. Arvidsson,H. Fujiwara, and Z. Peng. Efcient test solutions for core-based designs. IEEE Trans. on CAD, vol. 23, pp.758775, 2004. [10] C. Liu, V. Iyengar, J. Shi, and E. Cota. Power-aware test scheduling in network-on-chip using variable-rate on-chip clocking. Proc. VLSI Test Symp., pp.349354, 2005. [11] C. Liu, K. Veeraraghavan and V. Iyengar. Thermal-aware test scheduling and hot spot temperature minimization for corebased systems. Proc. Int. Symp. DFT, pp. 552560, 2005. [12] E. J. Marinissen, V. Iyengar, and K. Chakrabarty. A set of benchmarks for modular testing of SOCs. Proc. Int. Test Conf., pp. 521528, 2002. [13] M. Nourani and J. Chin. Test scheduling with power-time tradeoff and hot-spot avoidance using MILP. Proc. IEE Computer and Digital Techniques, vol. 151, pp. 341355, 2004. [14] P. Rosinger, B. Al-Hashimi and K. Chakrabarty. Rapid generation of thermal-safe test schedules. Proc. Design, Automation and Test in Europe (DATE) Conf., pp. 840845, 2005. [15] E. Tafaj, P. Rosinger and B. Al-Hashimi. Improving thermalsafe test scheduling for core-based system-on-chip using shift frequency scaling. Proc. Int. Symp. DFT, pp. 544551, 2005. [16] K. Skadron et al. Temperature-aware microarchitecture. Proc. Int. Symp. on Computer Architecture, pp. 213, 2003. [17] B. Vermeulen, J. Dielissen, K. Goossens, and C. Ciordas. Bringing communication networks on-chip: the test and verication implications. IEEE Communications Mag., vol. 41, pp. 7481, 2003. [18] C. Liu and V. Iyengar. Test scheduling with thermal optimization for network-on-chip systems using variable-rate on-chip clocking. Proc. Design Automation and Test in Europe Conf., 2006, to appear.
power constraints.
121.3 126.7 (-161.3) 124.9 (-178.6) 126.6 126.6 (-102.5) 124.1 (-77.3) 120.0 (-120.0)
101.7 112.8 (-25.9) 111.3 (-54.4) 116.1 116.6 (-32.9) 91.3 (-28.4) 109.9 (-39.4)
50.1 35.0 (-168.7) 56.5 (-180.7) 26.7 34.8 (-129.3) 61.6 (-83.1) 29.6 (-118.5)
19.6 13.9 (-135.4) 13.6 (-124.2) 10.4 10.0 (-69.7) 32.7 (-49) 10.1 (-80.3)
on test time, but does not necessarily reduce temperatures. Therefore, thermal constraint must be used instead of power constraint to guarantee thermal safety. In Tables 3, we present the results when the proposed method is used for thermal safety and thermal optimization. We also show the reduction on test time (in percentage) and the temperatures, compared to the results in Table 2. It can be seen that compared to Tables 1 and 2, the proposed algorithm can signicantly reduce core temperatures and achieve thermal safety. Meanwhile, the temperature variations are also substantially reduced, indicating a much better thermal balance is achieved. Moreover, in most cases (with the only exception of IBM-2) the test times are also signicantly reduced ). These results corroborate the effectiveness (by up to of the proposed thermal-aware scheduling method.
6 Conclusions
We have addressed the thermal-aware test scheduling in NoC system using on-chip multiple-frequency clocking. We proposed to assign test clock frequencies to cores during test scheduling to dynamically adjust core temperatures. We presented a heuristic where the thermal optimization process is integrated with test scheduling. Experimental results for NoC benchmarks show that the proposed method can guarantee thermal safety, yield better thermal balance and reduce test time.
References
[1] L. Benini and G. D. Micheli. Networks on chips: a new SoC paradigm. IEEE Computer, vol. 35, pp. 7078, 2002.
Proceedings of the 24th IEEE VLSI Test Symposium (VTS06) 0-7695-2514-8/06 $20.00 2006
IEEE