A Low-Power 8-Read 4-Write Register File Design

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A Low-power 8-Read 4-Write Register File Design

Hao YAN1,2, Yan LIU1, Dong-hui WANG1, Chao-huan HOU1


Digital System Integration Lab Institute of Acoustics, Chinese Academy of Sciences1 Graduate University of Chinese Academy of Sciences2 Beijing, China yanhaobuaa@126.com
Abstract This paper gives a study of leakage in 90nm logic CMOS technology, and by analyzing the power constitutions in multi-ported register files and the leakage in different nMOS transistors with different power supply, a low-swing strategy for bit lines is used in saving power. In this paper an 8 read / 4 write ports write-through register file is designed and it dissipates 10.04mW at 500MHz, which save 33.36% power in read operation and 24.8% energy in average. The leakage power on the bit lines also reduces 1.7% by using high threshold voltage transistors in low-swing scheme.

[4] [5]. In reference [6] [7], substrate/well bias schemes have also been proposed to enhance bit lines leakage tolerance. However, general N-well CMOS process can not provide the special bias for substrate. This restricts the application. Other leakage tolerant techniques like reference [8], a special control signal is added before a read operation occurs. And this signal is always complex pre-conditioning in controlling. In this paper, a low-swing strategy is adopted to reduce not only the leakage power on the bit lines, but also limited the total voltage swing for saving power. Different with reference [9]s differential low-swing circuits, this paper introduces a low-swing scheme for multi-ported register files. And this scheme is very easy to use and gives about 24.8% power saving in active mode. Section II gives a detail description of this low-swing strategy and section III shows the implementation in this method and the simulation results are given, section IV is the conclusion of this paper. II. LOW-SWING STRATEGY IN BIT LINE

I.

INTRODUCTION

For mobile and handheld microprocessors, lowering power consumption for longer battery life is always a key issue in the design. And the main energy consumed in a microprocessors core is in ALU and on-chip memories, which contain the most critical speed paths, limiting the microprocessor frequency. In the modern microprocessors architecture, various types of memories such as caches, register files (RF), sometimes content addressable memories (CAM) are used. Among these on-chip memories, register files on the top of memory hierarchy that directly contacted with the execution units require very fast access times. Multi-ported register files are basic block of superscalar microprocessors [1], which enable concurrent execution multiples instructions in a single cycle and consuming one of the largest percentages of power at about 25%. The architecture of multi-ported register files consists of a decode block and memory array block. In multi-ported register file, the energy cost is proportion to its ports number, and the main power contribution to multi-ported register files is in address decoding steps and the charge and discharge in bit lines for reading operations. By using different decoder types, the power consumption also changes. Generally, the NAND structure decoder consumes less power and the total switching capacitances are also reduced via pre-decoding. The other part in power consumption becomes even more serious coupled with the technologys progressing. With CMOS technology scaling, aggressively low threshold devices result in an exponential increase in bit line active leakage currents and poor bit line noise immunity [2]. In order to make the bit line robust, the additional circuits called keeper is added to maintain the charge on bit lines. However, despite the die costs, this comes at a cost of bit lines delay increase due to keeper contention. In reference [3], the leakage trend increases 2 or 3 times for each technology generation, and consume 30~50% of the power in the core. In order to reduce the energy lost generated by leakage, several modified memory core cells are proposed

This section covers the details of this low-swing method in saving power. Typically, lowering the power supply is a powerful method in reducing both the active and leakage power [10] [11]. As the previous part goes with the square of VDD and the last one is proportion to the VDD, the whole power could be cut down by decreasing VDD voltage. However, the lowering VDD increases the gate delays, which goes against the high speed demands in register files. In order to keep the gate delays same as before, the low threshold voltage transistors are used for compensating the lowed VDD, but setting 90nm CMOS logic technology as an example, the VDD is already falling down to 1 volt. This means that the adoption of low Vth transistors brings little space in lowering VDD. And in more advanced processes, lowering VDD becomes even more difficult. Even though the low threshold transistors can be used in lowering the VDD, but the leakage caused by low threshold becomes prominent. In the below part, the leakage in 90nm CMOS technology is studied as a guide for total power reduction in register files. A. The Leakage in 90nm Logic CMOS Process The leakage current component is the product of the offcurrent (sub-threshold, gate, etc.) multiplied by the supply voltage. Lowering the supply voltage VDD exponentially reduces the leakage power. In this part, various MOS transistors leakage is studied by using 90nm logic CMOS technology. There are mainly three

types of nMOS transistors in 90nm CMOS technology. Fig .1 shows the simulation results of three type nMOS transistors leakage currents when VDD changes from 0 to 1 volt. These transistors are all in minimum size. Compared with each others, the nMOS transistor with high threshold has the lowest channel leakage current and the low Vth transistor has extremely high leakage current which is about 52 times larger than high Vth transistor when VDD equals to 1 volt. So for the purpose of reducing the leakage power, the low Vth MOS transistors should be avoided. And from figure 1, a fact is presented that only the VDD reduces to about 0.15 volt could make the leakage current equals to normal nMOS transistor leakage current whose VDD is 1 volt. Based on those truths mentioned above, the usage of low threshold MOS transistors and lowering VDD is not working well at shrinking down the leakage power consumption.

Nowadays, Multi-ported register files always using singleended implementation to realize high density in area, and timeborrowable domino logic to enhance performances. Due to the bit lines segments, additional power consumption on the globe bit lines is coming out, and the parasitic effects and leakage deteriorate the whole chips performance along with the scaling in technology. In multi-ported register files, the main power consumption is in reading operation. During reading operation, the bit lines need to be charged and discharged frequently. This makes the power consumed on bit lines take a great part in total power distribution. And in traditional precharge step, the pMOS transistor is used. The advantage of pMOS precharge is the full swing on the bit lines with high noise immunity. However, the full swing on the bit lines is not a must in register file. Therefore a low swing method can be used in register files design if the bit line is robust enough. To realize the low-swing on the bit lines, there are several ways such as precharge the bit line in carefully calculated time with special current source, or use feedback circuits to detect the level on the bit line. In this paper, a very simplicity method is to use nMOS transistors instead of pMOS transistors to charge the bit lines. As the poor capability of passing high logic in nMOS transistors, the voltage can be only charged to VDD - Vth. Different with multi voltage strategy, this method use a unit single voltage supply, and the low-swing scheme mainly focuses on the voltage level of bit lines. And by switching the types of nMOS transistor, the voltage level on bit lines can be adjusted slightly. In this time, the power consumed in bit lines is the square of VDD- Vth, and is cut down directly. Suppose VDD- Vth is 0.5 volt, and this can give about 75% improvement in power on the bit lines. That is very attractive in reducing the energy. In the full swing bit lines register files, the sense parts may be use inverters to pass the results, which threshold voltage is one half of VDD. And after lowering the swing on the bit lines, the threshold voltage of sense parts must be dropped as well. Figure 3.a is an optional logic in sense parts and figure 3.b is the voltage transfer characteristic curve. In figure 3.b, the threshold of this circuit is about 0.25 volt, and in actual implantations, the threshold voltage can be designed according to the exact specification. There are many other ways to detect the bit lines voltage. For examples, the sense amplifiers can be obtained just by carefully designed the transistors size in the inverters to get the proper threshold voltage. And the precharged domino logic or differential sense amplifier can also be used to detect the voltage changing on the bit lines. The later one needs to generate a lower voltage reference on-chip. By using the sense logic in figure 3, the low-swing method in charging bit lines also does not affect the bit lines segment. Therefore the usage of this scheme brings no trade off in timing demands. On the contrary, by setting reasonable sensing threshold, an improvement in charging speed is also obtained. But too sensitivity would also make the register files noise immunity worse. Thus aiming at the robustness of bit lines, the less sensitive amplifiers are recommended to compensate the poor noise immunity brought by low-swing on the bit lines.

Figure 1. The simulation results of different threshold voltage nMOS transistors channel leakage currents as the supply voltage changes.

Figure 2. The leakage currents through gates in there types nMOS transistors with minimum geometry in simulation.

Figure 2 depicts the changes in the currents that leak through the gates with different type nMOS transistors when the input voltages rising from ground to VDD. And the result shows that those currents are very close to each other, which can be neglect compared with the channel leakage currents in figure 1. B. The Low-swing Strategy in Saving Power This part will give a low-swing scheme on the bit lines to reduce the total charging power consumed on the bit lines, and to fight against the energy lost through leakages, the low channel leakage transistors are used.

Figure 4. The leakages on the bit line through read out logics. Figure 3. (a) The sense logic circuit in low-swing scheme. (b) The voltage transfer characteristic curve of this logic.

As the off current in nMOS transistor goes down with the voltage on bit line, in this low-swing scheme, the leakage power consumption is also cut down along with the bit lines active power consumptions. And from figure 1, when the power supply is about half VDD, the leakage current of low Vth MOS transistors is 4.76 times of normal nMOS transistors and 34.5 times of high Vth MOS transistors. This result implies that the energy caused by leakage could be cut down about 34%. And it is very powerful in the situations where the leakages influence is too bad. Figure 4 shows a leakage situation on the bit lines in a single-ended application. After the pre-charge step, the voltage on the bit line goes high. (The voltage is VDD in pMOS precharge step, and in nMOS precharge scheme, it is VDDVth ). And if the data on the bit line is 0, the charged bit line does not need to be discharge, but the charge stored on the bit line is losing all the time as a result of the leakages of nMOS transistors. In figure 4 the bit line is also connected to the gates of the sense amplifiers, and there is always a leakage current through the gates. But fortunately, the bit lines leakages due to the sense amplifiers gates are no longer needed to be considered, because of the neglectable currents compared with the leakage currents in read out logics in 90nm CMOS technology. Coupled with the advanced technology, the channel leakage currents can not be ignored. Additionally, in order to make up the voltage drops caused by the leakage, the voltage keepers are always introduced for bit lines robustness. But this let the voltage on the bit lines invariably equals to the voltage supply, which maximizes the leakage currents of nMOS transistors. Therefore by replacing the transistors in read out logic path with high Vth nMOS transistors can control the leakage currents when these transistors are turned off. And from figure 1, the leakage current of high Vth nMOS transistors is below 0.3 nA in the smallest geometry. So plus with the low-swing method, the leakage currents can be reduced greatly. As a result, the voltage keeper is no longer need. This is truly in 90nm CMOS technology by using smallest size transistors as the bit line segment is less than 32 bits, but below the 65nm, the leakage becomes extremely serious. The keeper may be added to increase the stabilities in bit lines. In other words, a fine segment in bit line is needed for alleviating leakage.

III. IMPLEMENTATION AND SIMULATION RESULTS In order to show the performance of the low-swing strategy in saving power, an 8 read / 4 write write-through register file is designed. This multi-ported register file is organized in 32words32bits. Figure 5 depicts the timing of this register file and figure 6 gives the layout of this register file. In this register file, the bit lines are not segmented, and the 32 cells bit lines are charged or discharged simultaneity.

Figure 5. The timing waveform of this register file.

Read port

Read Control Logic Read port PRE-CHARGE LOGIC

16 bits Memory Array

Decoder Block

16 bits Memory Array

Write ports

Address Ports and Priority Encoder

Write ports

Figure 6. The layout of this register file.

In this register file, each port has its own gated clock and when the enable signal is invalid, the port is shut down for saving power. And the decoder is implemented with Source Coupled Logic circuit.

The power consumption of this register file is measured under the condition that all 12 ports work and the switching activity of each port is above 1/2. Table I shows the post simulation results of this register file at 500MHz in the typical corner. The sense logic in this register file is a designed threshold inverter just for comparing with the inverter sensed in non low-swing register file. And the timing parameters of each register files are almost same. The read access time of this register file is 1.4ns at worst case. From table 1, the low-swing scheme plays a great role in reducing the power dissipation and can save 24.8% energy compared the one not using low-swing. And by replacing the transistors in the read out logic of memory core cell, the high Vth nMOS transistors consume lowest energy about 1.7% power improvement in low-swing scheme. These results show that the leakage on bit lines in 90nm CMOS technology is not very severity. However, there are a lot of leakages take places in memory core cells, and these leakages take a great part in leakage power. If the transistors in crossed coupled inverters in core cell are all changed into high threshold voltage transistors, the power is reduced to 10.02mW. This gives a little additional power improvement and shows that the leakages in crossed coupled inverters are not big deals. Actually, the leakages through the writing and reading logic in core cells are the main source of core cells, especially in multi-ported register file. In the reading cycles, the low-swing scheme can save 33.36% power. And by using high Vth nMOS transistors, the energy consumed in leakage reduces about 2%. This 2% saved power is in the low-swing scheme and is typically above 2% compared with Non low-swing applications. In this multi-ported register file, the souse coupled logic decoder block also dissipates a great part in power because of frequently charging and discharging. If the decoder is changed to the NAND structure, the effect in power saving will be even more prominent.
TABLE I. Power / mW Non low-swing N10 Write Read Total 3.65 9.71 13.36 Nlvt10 3.63 6.59 10.22 THE POWER SUMMURY. Low-swing Bit lines N10 3.59 6.57 10.16 Nhvt10 3.57 6.47 10.04 core Nhvt10 3.55 6.47 10.02

energy saving in leakage is about 1.7% in low-swing scheme in active state, and in non active mode, the leakage current can be reduce greatly by the high Vth transistors. REFERENCES
[1] Wei Hwang, Rajiv V Joshi, Walter H Henkels, A 500-MHz, 32-Word 64-Bit, Eight-Port Self-Resetting CMOS Register File, IEEE J. SolidState Circuits,1999, 34:56A. N. Netravali and B. G. Haskell, Digital Pictures, 2nd ed., Plenum Press: New York, 1995, pp. 613-651. [2] R. Krishnamurthy, et al., A 130-nm 6-GHz 256x32b leakage-tolerant register file, IEEE Journal of Solid-State Circuits, vol. 37, pp. 624-632, May 2002. [3] De and S. Borkhar, Technology and Design Challenges for Low Power and High Performance, in Proceedings International Symposium Low Power Electronics Design, Aug. 1999, pp. 163-168. [4] Shengqi Yang; Low-leakage robust SRAM cell design for sub-100nm technologies, Proceedings of the ASP-DAC 2005. Asia and South Pacific, pp.539 - 544 Vol. 1,2005 [5] Jain, S.K ,Agarwal, P., A low leakage and SNM free SRAM cell design in deep sub micron CMOS technology, VLSI Design, 2006 [6] H. Kawaguchi, et al., Dynamic leakage cut-off scheme for low-voltage SRAMs, in Symposium VLSI Circuits Digest Technical Papers, June 1998, pp. 140-141. [7] T. Kuroda, et al., A 0.9V 150MHz 10mW 4mm2 2-D discrete cosine transform core processor with variable threshold voltage scheme, IEEE Journal Solid-State Circuits, vol. 31, pp. 1770-1779, Nov. 1996. [8] K. Agawa, et al., A bitline leakage compensation scheme for lowvoltage SRAMs, IEEE Journal Solid-State Circuits, vol. 36, pp. 726734, May 2001. [9] D. Deleganes, et al., Low-voltage swing logic circuits for a Pentium 4 processor integer core, IEEE Journal Solid-State Circuits, vol. 40, pp. 36-43, Jan. 2005. [10] D. Liu and C. Svensson, Trading speed for low power by choice of supply and threshold voltage, IEEE Journal Solid-State Circuits, vol. 28, pp. 10-17, January 1993. [11] R. Gonzalez, et al., Supply and threshold voltage scaling for low power CMOS, IEEE Journal Solid-State Circuits, vol. 32, pp. 1210-1216, Aug. 1997.

IV.

CONCLUSION

This paper proposed a low-swing strategy in bit lines charging step. And by the study of different threshold voltages nMOS transistors leakage in 90nm logic CMOS process, an anti-leakage in bit line using low-swing scheme multi-ported register file is implemented. This register file is an 8 read / 4 write write-through register file which is organized in 32words32bits without bit lines segment. The post simulation results prove that the total power consumption reduce to 24.8% in low-swing strategy and 33.36% in reading processes. The

You might also like