Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Dual-Vt Design of FPGAs for Subthreshold Leakage Tolerance

Akhilesh Kumar and Mohab Anis Department of ECE, University of Waterloo 200 University Ave. W., Waterloo, ON, Canada N2L 3G1 email: {a5kumar,manis}@vlsi.uwaterloo.ca Abstract
Logic Block Programmable Routing Switch Short Wire Segment

In this paper we propose a dual-Vt FPGA architecture for reduction of subthreshold leakage power. A CAD ow has been proposed based on the dual-Vt assignment algorithm and placement for realizing the dual-Vt FPGA architecture. Logic elements within the logic blocks are the candidates for dual-Vt assignment. We propose an architecture in which there are two kinds of logic blocks, one with all high-Vt logic elements and another with a xed percentage of high-Vt logic elements. These two kinds of logic blocks are then placed in such a way that the FPGA architecture remains regular. Results indicate that in the ideal case of dual-Vt assignment, over 95% of the logic elements can be assigned high-Vt. Results show that leakage savings of 55% can be achieved. Design tradeoffs for various ratios of the two kinds of logic blocks are investigated. The dual-Vt FPGA CAD ow is intended for development and evaluation of dual-Vt FPGA architectures.

Connection Block

Long Wire Segment Programmable Connection Switch Switch Block

Figure 1: Targeted FPGA Architecture


component, called gate induced drain leakage. Motivated by the above mentioned limitations of the previous works, this paper proposes the following: (1)dual-Vt FPGA architecture,(2)dual-Vt FPGA CAD ow for designing the dual-Vt FPGA and, (3) algorithms associated with the dual-Vt FPGA CAD ow. The dual-Vt technique uses two types of transistors in the same design,i.e. one with high threshold voltage and the other with low threshold voltage. The high-Vt transistors have less subthreshold leakage but the increased delay as compared to low-Vt transistors. The dual-Vt FPGA design that we propose has the following advantages : 1) Reduces both active leakage power and standby leakage power, 2) provides a CAD framework for developing and evaluating a dual-Vt FPGA implementation, 3) the inherent area penalty in using sleep transistors is not present in this design technique and, 4) it does not require any modication in the existing placement and routing tools from the users perspective.

Introduction

As the FPGAs evolved, more attention was given to dynamic power reduction and performance and area improvement over the leakage power reduction. It was shown in [3] that a 90nm FPGA consumes too much leakage power to be successfully used in mobile applications. There have been very few works targeted at reducing the leakage power in FPGAs. The work in [2] used a technique based on the property that the leakage power consumed by a CMOS circuit is dependent on the sate of its inputs and used the signal statistics to alter the state of the inputs in order to reduce the leakage power in such a way that the functionality of the circuit does not change. However this work addressed only active leakage. Further, this work addressed leakage reduction only the used parts of the FPGA. The work in [4] approached the problem of reducing leakage power by dividing the FPGA fabric into small regions with each of the regions being controlled by a sleep transistor which would be turned on/off depending upon whether that region is being used or not, thus reducing the leakage power. This technique leads to reduction of only standby leakage power. The work in [5] explored the dual-Vt, body biasing and gate biasing of nMOS pass transistors for reducing leakage power. The dual threshold voltage technique used in [5] was based on varying the percentage of high threshold voltage elements in the routing resources of the FPGA and studying its effect for a number of benchmarks. It did not specify the leakage savings obtained, and no detailed evaluation of several possible routing architectures was presented. Body-biasing technique requires control circuitry and generation of voltages for body biasing. Further, this technique leads to reduction in leakage savings with technology scaling [9]. The negative gate biasing of nMOS pass transistors has implementation issues and also leads to additional leakage current

Targeted FPGA Architecture

The FPGA architecture has two main components - logic blocks (CLBs) and routing resources as shown in Fig. 1. The logic blocks implement the functionality of the given circuit while the routing resources provide the connectivity for implementing the logic. We use the SRAM based FPGA architecture proposed in [7]. The logic block of the SRAM based FPGA is composed of LUTs (lookup-table) which form the basis for basic logic elements (BLEs). LUT is an array of SRAM cells. Each BLE consists of a k-input LUT, ip-op and a multiplexer for selecting the output either directly from the output of LUT or the registered output value of the LUT stored in the ip-op. In the cluster based logic block, the logic block is made up of N BLEs. There are I inputs to the logic block such that each input can connect to all the BLEs. We consider each subblock to be made up of BLE and the corresponding LUT input multiplexers.

Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED06) 0-7695-2523-7/06 $20.00 2006

IEEE

High-Vt subblock CLB (Type 2)

Low-Vt Subblock

FPGA

CLB (Type 1)

Figure 2: Proposed FPGA architecture with two kinds of logic blocks


The routing resources have the island-based architecture. The routing resources form a mesh like structure with the horizontal and vertical routing channels. These are connected by switch boxes which are programmable and thus provide the exibility in making the connections. The logic blocks are connected to the routing channels through the connection boxes. The clock distribution is assumed to have a dedicated network.

of benchmarks on the proposed dual-Vt FPGA CAD ow, and trying to nd out the number of subblocks which can be high-Vt for maximizing leakage savings and reducing the delay penalties. This, then is the philosophy of the dual-Vt FPGA CAD ow. It should be noted that the physical resources of an FPGA are xed during fabrication and the applications are mapped later on. The dual-Vt FPGA CAD ow that we outlined above is meant for the actual physical design of the FPGA and later on after the physical design of the FPGA is complete, for the design of associated CAD tools. This dual-Vt FPGA CAD ow is to be used (along with other CAD tools) during the design of the FPGA, where number of benchmarks are evaluated before the physical design for an FPGA is nalized. This CAD ow is not meant for mapping of an application on to an FPGA chip, as all the logic blocks and routing resources would then be xed, which would be a routine job using the FPGA CAD tools. However, in this case the CAD tools might be enhanced for the dual-Vt FPGA architecture. Therefore, the dual-Vt FPGA CAD ow is meant for evolving a dual-Vt FPGA architecture, and the various steps in this CAD ow might need to be changed a little to suit a specic architecture. For comparing the results obtained from the proposed CAD ow, the baseline implementation is assumed to be single low-Vt implementation.

Proposed Dual-Vt FPGA Architecture

3.2

Implementation

In this work we consider the subblocks within the CLBs as the candidates for high threshold voltage assignment for reduction of leakage power. Since the routing wire and switch delays are much greater than the logic block delays, a high percentage of the logic elements can be assigned high threshold voltage. Hence we selected logic blocks as the candidates for the high threshold voltage assignment. The architecture that we propose in this work is one having two kinds of CLBs throughout the array. The rst type of CLBs has subblocks, all of which have high-Vt transistors and the second type has a certain number of high-Vt subblocks. These two kinds of logic blocks are distributed uniformly throughout the array in such a way that the array is regular. If a logic block cluster size is N (N subblocks in a logic block), then the proposed FPGA architecture would have two kinds of logic blocks distributed uniformly. One of those two kinds would have all N high-Vt subblocks (type1), whereas the second kind of logic blocks would have M high-Vt subblocks (M < N ), and N M low-Vt subblocks. For example, Fig. 2 shows the distribution of the two kinds of the logic blocks in the FPGA, such that 50% of the logic blocks are of type1, whereas the other 50% of the logic blocks are of type2. We selected such an architecture because the number of subblocks assigned high-Vt was very high and so a large percentage of logic blocks would have all high-Vt subblocks. Other architectures that we investigate are the ones in which all the CLBs were of type1, and all the CLBs were of type2. This has the advantage that only one kind of CLB is required in FPGA.

In this section we describe the implementation of the different stages of the proposed dual-Vt FPGA CAD ow. Stage 1: This stage generates the necessary input les required for the placement, routing and power estimation of the circuit. Stage 2: The second stage is the delay estimation stage through all the paths of the circuit. This delay estimation is based on the actual placement and routing of the benchmark and hence we get an accurate estimation of the delays. In this stage it is assumed that all the subblocks are low-Vt and the delay calculation is done based on the low-Vt delay values for the subblocks. Power estimation for the single low-Vt benchmark is also done in this stage. We have used an industrial 0.13m technology node for the technology dependent data required by the VPR for placement and routing and for power estimation. The methodology proposed in [7] was used to extract the required technology dependent data. For extracting the delay data for the logic blocks, HSPICE simulation was done. We have assumed that metal 3 is used for the routing wires. We have used the regular MOS models for low threshold voltage implementation. The V dd for this technology is 1.2V and V t is 0.2V . The high threshold voltage value was chosen as 100 mV above the threshold voltage for the regular MOS models. Stage 3:Initially it is assumed that all the subblocks are low-Vt. Based on the delays of various paths of the circuit, computed in the previous stage, the dual-Vt assignment algorithm assigns high-Vt to various subblocks. The dual-Vt assignment algorithm is shown in Algorithm 1 [1].

3.1

Proposed Dual-Vt FPGA CAD Flow

Algorithm 1 Dual-Vt assignment [1]


Using timinggraph in VPR (after Place and Route) for each subblock do Compute the slack available if slack > 0 then Assign high-Vt to the subblock Recompute the slack if slack < 0 then Re-assign low-Vt to the subblock end if end if end for

Developing the FPGA architecture as outlined above would require a CAD ow which can perform the dual-Vt assignment to the subblocks, re-cluster the subblocks and perform the placement for the two kinds of logic blocks. The typical FPGA CAD ow within the framework of VPR, for placing and routing a given netlist, and the proposed generic dual-Vt FPGA CAD ow are shown in the Fig. 3. The proposed dual-Vt FPGA CAD ow has been developed using the widely used academic research tools VPR and T-Vpack [7], and a power model for FPGAs [8]. The idea behind the proposed dual-Vt FPGA CAD ow is to nd suitable FPGA architectures for leakage reduction. The method of developing and evaluating these architectures is by using a number

The algorithm works on the timing graph created by the VPR for placement and routing. High-Vt assignment to a particular sub-

Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED06) 0-7695-2523-7/06 $20.00 2006

IEEE

Circuit Logic Optimization Technology map to LUTs

Initial Step

Stage 1

.blif format netlist of LUTs and Flip-flops

Estimate the delays in various paths of the circuit

Stage 2

Logic Blocks Parameters

T-Vpack: Pack FFs and LUTs into Logic Blocks Algorithm to assign high/low threshold voltage to subblocks Stage 3

.net format netlist of logic blocks

FPGA Architecture description file

VPR Place circuit or read in an existing placement

Recluster the BLEs into two kinds of logic blocks.

Stage 4

Constrained Placement. Routing Existing placement or placement from another CAD tool Perform either global or combined global/detailed routing Estimate the leakage power savings Placement and Routing output files, Placement and Routing statistics.

Stage 5

Stage 6

(a)

(b)

Figure 3: (a) Typical FPGA CAD ow within VPR and T-Vpack framework. (b) Proposed generic dual-Vt FPGA CAD ow
block is based on the slack available at the output pin of the subblock. The assignment of high-Vt would change the delays through the subblock. All the conguration SRAM cells are also assigned high Vt because these do not contribute to run time delays [6]. Here it should be noted that the dual-Vt assignment is not constrained to keep the CLBs identical in terms of the number of high-Vt and low-Vt subblocks. This kind of dual-Vt assignment, which is the ideal assignment, would result in a dual-Vt FPGA implementation which would not have any performance degradation. This dual-Vt assignment gives a theoretical upper limit for the number of high-Vt assigned subblocks. The dual-Vt assignment produces an output in which some of the subblocks are high-Vt and some are low-Vt with different number of high-Vt and low-Vt subblocks across different CLBs. This is however not a feasible FPGA architecture solution since the FPGA structure needs to be regular. Stage 4: This stage works within the framework of T-Vpack. The methodology is shown in Fig. 4. The re-clustering algorithm is shown in Algorithm 2. The clustering is allowed to proceed in the normal way based on the T-Vpack algorithm. After the clustering has been done we divide all the logic blocks into two groups. One of the groups has logic blocks made up of all high-Vt subblocks (type1), whereas the second group has logic blocks made of subblocks of which some are high-Vt while others are low-Vt (type2). The ratio of high-Vt and low-Vt subblocks in the second type of logic blocks is determined as follows for a given benchmark. We nd the CLB which has the maximum number of low-Vt subblocks and assign low-Vt to subblocks of type2 CLBs such that each type2 CLB has the same number of low-Vt subblocks, same as the one having maximum number of low-Vt subblocks. This would lead to slight increase in the total number of low-Vt subblocks since the number of type1 CLBs are signicantly higher than type2 CLBs. Stage 5:In this stage constrained placement is done followed by routing. The algorithm for constrained placement is shown in Algorithm 3. For all the benchmarks we consider the minimum sized square array required for placing all the logic blocks. Since the restrictions imposed by the physical locations of the logic blocks
Before re-clustering After re-clustering

CLB

Low-Vt subblock

High-Vt subblock

Figure 4: Re-clustering of subblocks into two types of logic blocks


would place constraints on the placement algorithm, the placement quality would not be as good as the original placement. Therefore the motivation behind the new placement strategy is to minimize the effect of constraints on the placement algorithm and provide as much exibility as possible to the original placement algorithm considering the leakage power and delay tradeoffs. The constrained placement algorithm rst determines the allowed physical locations for the two kinds of logic blocks based on the number of logic blocks in rst and second groups. To provide exibility to the placement algorithm, we allow extra physical locations for type2 logic blocks, and provide a small margin for conversion of the logic blocks from type2 to type1, if that leads to improvement in placement quality. This margin is allowed by providing extra physical locations for type2 logic blocks equaling the number of extra physical locations. Stage 6:After the placement and routing in the previous stage, this stage computes the leakage power savings obtained and the delay penalty for the benchmark. Evaluation of different architectures:Based on the above ow two kinds of architectures can be realized, viz., heterogeneous ar-

Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED06) 0-7695-2523-7/06 $20.00 2006

IEEE

Algorithm 2 Recluster
Do clustering based on T-Vpack algorithm for each logic block do if All subblocks high V t then Mark the logic block as type1 else Mark the logic block as type2 end if end for Search for the logic block containing max low-Vt subblocks and assign the number as Max-Low-Vt-subblocks for Each type2 logic block do if N o.of lowV tsubblocks < M axLowV tsubblocks then while N o. of low V t subblocks < M ax Low V t subblocks do Re-assign a high-Vt subblock to low-Vt end while end if end for

Figure 5: Delay Penalty Vs. Leakage Savings for seq. Single low-Vt circuit was routed with channel width of 70
where Pavg , Pact , and Pof f are the average, active and standby (off) power respectively. For personal wireless communication systems, typically the standby time or off time (tof f ) is 90% of the total time (T ) and active time (tact ) is 10% of the total time. During the active time the components of power dissipation can be written as Pact = [Pdyn + Psckt + Pactleak ]used + [Pactleak ]unused (3)

Algorithm 3 Constrained Placement


P-ratio = (Num type1 CLBs)/(Num type2 CLBs) Determine physical locations for the two types of CLBs based on P-ratio Initial Random placement for all the CLBs with physical constraints Start placement based on VPR placement algorithm Allow the type2 CLBs to be placed on physical locations meant for type1 CLBs but not vice versa End Placement for each type2 CLB do if CLB placed on type1 physical location then Convert the CLB to type1 end if end for

where Pdyn ,Psckt , and Pactleak are the dynamic, short circuit and active leakage power consumptions respectively. Pof f is the standby leakage power consumption of the FPGA, because during the standby mode, only leakage power is dissipated.

4.2

Evaluation Methodology

chitecture and homogeneous architectures. Below we outline the associated CAD ow. heterogeneous Architecture, type1 and type2 CLBs:This architecture has two kinds of CLBs and the CAD ow for this would have all the stages as outlined above. The leakage savings and delay penalty here would vary depending upon the number of type1 and type2 CLBs and the placement. Homogenous Architecture, type1 CLBs: This architecture has only type1 CLBs, (all high-Vt) and is for evaluation purposes. The leakage savings would be maximum in this case, but the delay penalties would also increase. The delay penalty would be entirely due to the high-Vt logic blocks. Therefore in benchmarks which have many subblocks on the critical path, the delay penalty would be large. For evaluating this architecture the re-clustering stage of the CAD ow is not required. Homogenous Architecture, type2 CLBs:This architecture has each CLB having a xed ratio of high-Vt and low-Vt subblocks. This architecture is realized by converting all the CLBs to type2 during the re-clustering stage. Since the placement and routing does not get affected by it and all the subblocks on the critical path(s) remain low-Vt, (because the re-clustering automatically takes care of it) there is no delay penalty associated with this architecture.

For obtaining the results from the benchmarks we used the LUT size of 4, and a cluster size of 12. It was shown in [6] that a LUT size of 4 and a cluster size of 12 leads to minimization of total power. We used the default routing architecture present in the FPGA architecture le of VPR. For obtaining the results, routing was done for 1.2 times the minimum routing channel width required for successfully routing the benchmark and it was kept same for both single low-Vt and dual-Vt implementations. The leakage savings that we report is the overall leakage savings, including the routing resources also.

4.3

Evaluation of Leakage Savings

4
4.1

Results and Discussions


Estimating Power Savings

The total power consumption can be divided into 2 parts, active power and standby power. The total average power can be written as tact Pact + tof f Pof f Pavg = (1) tact + tof f T = tact + tof f (2)

Table 1 shows the leakage savings obtained for different benchmarks and different architectures considered for evaluation. We discuss each of the different implementations below. heterogeneous Architecture:The leakage savings obtained here depends upon the number of each type of logic blocks and also on the number of low-Vt subblocks in type 2 logic blocks. The average delay penalty (5.98%) in this case is primarily because of altered placement which impacts the nal routing. It should, however be noted that the delay penalty is quite sensitive to routing channel widths. To illustrate this consider the variation of delay penalties for benchmark seq with different routing channel widths. Fig. 5 shows the variation of delay penalty and leakage savings with routing channel width. The single low-Vt implementation was routed with a routing channel width of 70. It shows that there is almost no delay penalty when the routing channel width is increased to 72 for the dual-Vt implementation. Homogenous Architecture, type1 CLBs: This architecture serves as a basis for evaluating other dual-Vt FPGA implementations as it provides the leakage savings that would be obtained when all the subblocks are assigned high-Vt. The leakage savings is maximum in this case. The average delay penalty in this case is 12%. This delay penalty comes from the high-Vt subblocks on the critical path and it is not too signicant becuase routing delay dominates the total delay. Homogenous Architecture, type2 CLBs: The leakage savings here is least as compared to the other two cases as more number of

Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED06) 0-7695-2523-7/06 $20.00 2006

IEEE

Table 1: Leakage savings with logic blocks assigned dual-Vt for a cluster size of 12 for 3 different implementations (type1 CLBs have all the subblocks high-Vt i.e. 12 high-Vt subblocks)
Benchmark No. of Subblocks/BLE % of type1 CLBs No. of lowVt subblocks in type2 CLB out of 12 subblocks 2 3 4 3 3 4 5 4 5 2 4 3 2 3 6 3.53( 4) Leakage Savings (heterogeneous) Leakage Savings (Homogeneous : type1) Leakage Savings (Homogenous : type2)

alu4 apex2 apex4 bigkey clma des diffeq dsip elliptic ex1010 frisc misex3 seq spla tseng Avg.

1522 1878 1262 1707 8393 1591 1497 1370 3604 4598 3556 1397 1750 3690 1047 -

88.3% 97% 87% 97% 96.7% 91.7% 84.3% 96.5% 96.4% 98.7% 89.6% 97.5% 96% 93% 89.9% 93.3%

60% 58.5% 44.3% 63% 56% 60% 58.9% 64.6% 58% 58% 56.6% 59.5% 58.7% 55.7% 59.6% 58%

61.8% 59.6% 56.7% 64% 57.2% 61.5% 63% 65.3% 59.3% 58.3% 59.2% 60.7% 60.2% 57.2% 63.5% 60.5%

52.3% 42% 43.4% 34.2% 25.2% 33% 26.3% 16% 38% 45% 33.6% 42.7% 33.8% 40.5% 26% 35.5%

subblocks have low-Vt. The leakage savings would depend upon the number of high-Vt subblocks per CLB. This architecture shows the leakage savings that can be obtained without any extra placement effort and without incurring any delay penalty and/or without using any extra routing resources.

Table 2: Design tradeoffs for the dual-Vt heterogeneous architecture. Delay penalty, and dynamic and leakage power comparison for a cluster size of 12
Benchmark Delay Penalty Single Low-Vt Dynamic Energy Dissipation/clock cycle (pJ) 156.5 162.1 72.8 226.5 840 319.6 88 180.7 299.4 173.8 136.3 126.7 152.6 176.6 96.8 214.1 Pleak (mW) Dual-Vt Dynamic Energy Dissipation/clock cycle (pJ) 157.1 161 73.4 226.5 837.6 318.5 88 186.6 298 175.8 138.3 127.9 152.8 173.5 96.7 211.8 Pleak (mW)

4.4

Design tradeos and overall power savings

Before deciding upon any dual-Vt FPGA architecture, we need to consider the design tradeoffs. The design tradeoffs for this approach are the delay penalties, and the impact on the dynamic power and overall power. Table 2 shows the tradeoffs associated with this methodology. It shows that the dynamic power consumption remains almost same for the single low-Vt and the dual-Vt implementations. Therefore, this dual-Vt implementation does not lead to any penalty on the dynamic power. A dual-Vt custom VLSI design does not lead to any delay penalty. However, because of the very nature of programmability of the FPGA, there would be some delay penalty associated with a circuit implemented on a dual-Vt FPGA as compared to single low-Vt FPGA. Further the delay penalties would vary with the benchmark. Essentially, the delay penalty occurs because the critical path for a circuit in the dual-Vt FPGA changes from that in the single low-Vt FPGA. There are two sources of delay penalties in a dual-Vt FPGA : 1) the increased delays of the subblocks, some of which might lie on the critical path and 2) altered placement and routing in the dual-Vt FPGA because of the different delays through the subblocks which impacts the placement and subsequently routing. Table 2 shows the delay penalties for various benchmarks for heterogeneous architecture. The impact of placement and routing is more pronounced on the dual-Vt heterogeneous architecture because of additional constraints on placement in the CAD framework. We observe that for some benchmarks, the performance actually improves for the dualVt implementation (negative delay penalty). This is attributed to the way placement is done by the VPR. The placement in VPR is based on simulated annealing which tries to optimize the placement by trying to achieve the global minimization of the placement cost function. In the dual-Vt constrained placement, the additional constraints sometimes forces the VPR to not settle with a particular placement,

alu4 apex2 apex4 bigkey clma des diffeq dsip elliptic ex1010 frisc misex3 seq spla tseng Avg.

-2.9% 9.2% 20.9% 26.9% -0.9% -1.14% 9.4% -10.4% 3.4% 1.84% 6.3% 4.1% 0.78% 10.1% 11.6% 5.98%

3 3.8 2.8 10.9 17.4 15.5 2.9 10.4 7.2 9.2 7.2 2.7 3.6 7.6 2.3 7.1

1.2 1.6 1.6 4 7.6 6.1 1.2 3.7 3 3.9 3.1 1.1 1.5 3.4 0.92 3.2

but rather forces VPR to look for new placements which sometimes may lead to better nal placement. There is no area penalty as the dual-Vt technique, inherently does not lead to any area penalty. Fig. 6 shows the overall average power savings for different ratios of on time for a benchmark alu4 for illustrative purposes. It shows that overall power savings of 40% can be achieved for an active time of 10%, which is typically the case for wireless personal communication systems.

4.5

Distribution of Leakage Savings

In the single low-Vt implementation the majority of leakage is from the logic part. In the dual-Vt implementation, the leakage contribution by the logic block reduces considerably. For evaluating the contribution of leakage from routing resources and logic elements

Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED06) 0-7695-2523-7/06 $20.00 2006

IEEE

Figure 6: Overall power savings for different active time for the benchmark alu4

all the experimental benchmarks. One of the things that can be looked into would be the postprocessing of the architecture evaluation after placement and routing. This would mean looking for the subblocks on the critical path(s) and assigning low-Vt to the subblock, if the particular CLB has low-Vt subblocks and they do not lie on the critical path(s). The other subblock would then be assigned high-Vt to keep the ratio intact. Since this work considered the minimum square required for routing and placing the benchmarks, which would not be the case in actuality, allocating some extra type2 CLBs would improve the placement since they are quite few in number leading to rigidity in the allowed physical locations for type2 logic blocks. Further, the methodology that we used for evaluating different FPGA architectures can be modied to suit specic FPGA architectures. The CAD tools for placement and routing need not be modied once the architecture has been nalized as the placement and routing tool should automatically take care of the increased delay of the high-Vt subblocks.

Conclusions

Figure 7: (a) Leakage contributions of routing resources and logic resources before and after dual-Vt implementation. (b) % of leakage savings obtained from SRAM cells and logic elements.
before and after dual-Vt implementation we evaluated the heterogeneous architecture for the contribution of leakage from routing resources and logic blocks. Fig. 7(a) shows the leakage contribution of routing resources and the logic elements. Fig. 7(b) shows the contribution of savings coming from the high-Vt SRAM cells and high-Vt logic part. Here SRAM cells include the SRAM cells of the routing resources and the logic blocks as well. It can be seen that majority of the savings come from the logic part of the logic blocks. SRAM cells contribute to only 15% of the total leakage savings.

The dual-Vt FPGA architecture explored above indicate that on an average leakage savings of 55% can be obtained by the dual-Vt FPGA architecture. We proposed a dual-Vt FPGA CAD ow for implementation and evaluation of dual-Vt FPGA architectures. The results show that as high percentage of subblocks is assigned highVt, the re-clustering results in logic blocks, majority of which have all high-Vt subblocks. For our future work we intend to consider the routing resources and the logic blocks together for dual-Vt FPGA architecture evaluation.

References
[1] L. Wei et al., Design optimization of dual-threshold circuits for low-voltage low-power applications, IEEE Trans. VLSI, Vol. 7, pp. 16-24, March 1999. [2] J. H. Anderson et al., Active leakage power optimization for FPGAs, FPGA, pp. 33-41, 2004. [3] T. Tuan et al., Leakage Power Analysis of a 90nm FPGA, IEEE Custom Integrated Circuits Conf., pp. 57-60, 2003. [4] A. Gayasen et al., Reducing leakage energy in FPGAs using region-constrained placement, FPGA, pp.51-58, 2004. [5] A. Rahman et al., Evaluation of low-leakage design techniques for Field Programmable Gate Arrays, FPGA, pp. 23-30, 2004. [6] F. Li et al., Architecture evaluation for power efcient FPGAs, FPGA, pp. 175-184, 2003. [7] V. Betz et al., Architecture and CAD for Deep-Submicron FPGAs, Kluwer Academic Publishers, MA 1999, ISBN: 0792384601 [8] K. Poon et al., A exible power model for FPGAs, International Conf. on Field Programmable Logic and Applications, pp. 312-321, 2002. [9] Y-F. Tsai, D. Duarte, N. Vijaykrishnan, M. J. Irwin, Implications of technology scaling on leakage reduction techniques, DAC, pp. 187-190, 2003. [10] Fei Li et al., Low-power FPGA using predened dualVdd/dual-Vt fabrics, FPGA, pp. 42-50, 2004.

Recommendations for Dual-Vt FPGA design

A typical design ow associated with a dual-Vt based FPGA implementation would be as shown in Fig. 8. The dual-Vt FPGA design would involve determining the physical parameters of the FPGA for best leakage savings and delay tradeoffs. This can be done by evaluating a number of benchmarks and choosing a unique set of parameters. A simple method to do this would be averaging of the parameters. The architecture development would need experimentation to determine the best ratio of type1 and type2 logic blocks. However, homogenous architecture with only type2 logic blocks can be implemented to decrease the complexity. In this case the number of high-Vt and low-Vt subblocks can be an average of
Dual-Vt FPGA CAD Flow: Used during development of physical FPGA architecture, for optimizing the FPGA parameters Mapping of application by the user on to the FPGA. Placement and routing CAD tools need not change

Fabrication of FPGA with optimized dualVt FPGA parameters

Figure 8: Realizing a dual-Vt FPGA design

Proceedings of the 7th International Symposium on Quality Electronic Design (ISQED06) 0-7695-2523-7/06 $20.00 2006

IEEE

You might also like