Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2012 24th International Conference on Microelectronics (ICM)

Comparative Study of Current Mode and Voltage Mode Sense


Amplifier used for 28nm SRAM
Baker Mohammad1,2, Percy Dadabhoy2,, Ken Lin2, Paul Bassett2
1
Khalifa University of Science, Technology and Research, Abu Dhabi UAE
2
Qualcomm Incorporated, Austin Tx, USA

Abstract—Increased process variation and reduced operating


voltage present two of the main challenges in using sense
amplifiers for small geometry bulk CMOS process technology.
This fact coupled with the need to increase on-chip memory to
reduce traffic on the bus and increase performance creates the
need for a robust and reliable sense amplifier - a differentiating
factor in memory area, power, and speed. We present a detailed
study of the Voltage Latched Sense Amplifier (VLSA) and the
Current Latched Sense Amplifier (CLSA) design in 28nm
industry standard process technology [1][2]. We present results
on how the two sense amplifier behave for the two design
topology for low power (LP) process technology optimized for
mobile low leakage application and the second one is high
performance High Performance (HP) applications. Detailed
Spice simulation with statistical models and Monte Carlo
simulations is utilized to compare the two designs for active
power, leakage power, speed, and area. Our study shows that
VLSA performs better than CLSA - being 67% faster, 35%
smaller area, and similar active power for the LP. The VLSA
also performed better than the CLSA is the HP technology as
well. The sensitivity to temperature for the LP technology node
was more at high voltage but for the HP process the lower voltage
perform worse.

Index Terms—Cache memory, sense amplifier, small signal


array, SRAM Array

I. INTRODUCTION

E MDEDDED memory forms an integral part of the design of


today’s processors and SOC. The main purpose of
embedded memory in general and cache in particular is to
Figure 1. Basic schematic of one column for small signal
memory array using 6T cell
improve the instruction and data access time. Larger caches
are favored since they can potentially reduce the number of the sense amplifier. The sense amplifier activated during a
time-expensive external memory accesses. The propagation read operation is used to sense the votlage differential on the
delay of each cache datapath element directly contributes to bitlines at its input, and generate a full-rail voltage swing at its
the total access time. A reduction in the propagation delay of output (figure 1). The reduced swing on the memory bitlines
the datapath can aid in building larger caches, as large caches improves performance and also reduces power consumption
may have several levels of multiplexing. [3].
One of the elements of the datapath in an SRAM design is In this paper we explore the design robustness of sense
amplifiers for low voltage operation by comparing two sense
amplifiers designs in two variation of 28nm process
technology one is optimized for mobile application (LP) and
Manuscript received February 14, 2012. This work was supported by the other is targeted for high performance circuits (HPM)
Qualcomm Inc.
Baker Mohammad is Assistant Professor in ECE at Khalifa University of [1][2]. We compare the two designs based on various factors
Science, Technology, and Research, Abu Dhabi, UAE (e-mail: such as power, delay, sensitivity and area. The two sense
baker.mohammad@kustar.ac.ae)

978-1-4673-5292-5/12/$31.00 ©2012 IEEE


which includes the memory cell area, all input, output,
A d d r,
e n a b le wordline drivers, and any other control or multiplexing
circuitry. Consider a 32KB memory design where the single
c lk
6T memory cell area is 0.2um2, then the total cell area is
W L

T r ig g e r V bl
BL &BLB
If the complete design of the 32KB area is 150000 um2 then
the utilization is 33%. Typical values for level 1 (L1) caches
Ta
are 25-40% and for L2 is 60-70%. The reason for this
SA_en
difference between L1 and L2 is that L1 is built for
dout
performance with smaller capacity, while for L2 the emphasis
is on density and less on speed. Using a sense amplifier
Figure 2 Timing relationship between main control provides the ability to have many cells in the same column
signals of a 6T memory array (figure 1) and hence sharing of the pre-charge, write driver,
and sense amplifier circuits which reduces the overhead. The
amplifier topologies compared are the current latched sense bitline then does not need a full voltage swing when using a
amplifier (CLSA), and the voltage latched sense amplifier sense amplifier to transfer the data value from the cell to the
(VLSA). The methods used for the measurement of the output but rather a small differential (0.2*Vdd), which
different comparison metrics is explained, along with a provides low power during read access.
presentation of the results. The main timing signals and its relative trigger time are
The paper is organized as follows: Section II presents an shown in Figure 2. The access is normally triggered from a
overview of the design principles of a small signal array; clock edge which de-asserts the pre-charge (off) and asserts
Section III briefly explains the topology and operation of the WL signal (turn it on) to access the cell. The self time Sa_en
sense amplifiers; Section IV explains the measurement of the signal gets asserted through a tracking circuitry after enough
different comparison metrics, and Section V concludes the voltage difference gets created between BL and BLB. The
paper. Sa_en signal enables the sense amplifier to sense the
difference in voltage which is based on the data stored in the
II. BASIC DESIGN PRINCIPLE OF SMALL SIGNAL ARRAY 6T. The sense amplifier also stores the data value in the latch
Six transistor cell (6T) based memory is widely used for to make it available for downstream logic. The time from WL
embedded memory due to its small area [3] and relatively fast to Sa_en (Ta) is the array access time and is programmable
access time. One important design parameter for memory through a tracking circuit. This programming enables a
especially when using 6T, is area utilization, which compares tradeoff between timing, power and yield.
the actual memory cell area to the total area of the memory

Figure 3: a) CLSA sense amplifier schematic and layout showing matching transistors b) VLSA sense amplifier
schematic and layout
III. CLSA & VLSA SENSE AMPLIFIERS
TABLE I: LP TECHNOLOGY SENSE AMPLIFIER METRIC
The CLSA & VLSA sense amplifier topologies are shown
COMPARISON
in figure 3a and b respectively. Both sense amplifiers have the Metrics VLSA/CLSA
‘bit’ and ‘bitb’ inputs connected to the column bitlines of the
Area 0.65x
SRAM. The sense input is actuated once the requisite voltage
Bitline input capacitance 3.2x
differential has developed on the bitlines. Each sense amplifier
Sense enable capacitance 1.5x
has the cross-coupled inverters that convert the voltage
differential at their inputs on the bitline to a full swing at the
TABLE II: DELAY AND LEAKAGE COMPARESON
outputs. The output inverters in each sense amplifier are used
FOR CLSA AND VLSA FOR BOTH LP AND HPM
for driving the downstream logic, and also serve to isolate the
internal nodes of the sense amplifier from the external load. delay CLSA VLSA
Figure 3a shows the CLSA sense amplifier design. Since LP 3.03 1
this is a current latched design, the bitlines drive the gates of HPM 1.56 0.63
transistors M9 and M10. Transistors M1, M4, M5 and M8 are
the precharge transistors. Transistors M2,M6 and M3,M7 form Leakage CLSA VLSA
the inverter pair that resolves the bitline differential voltage. LP 0.28 1
Traditionally this topology has been used because the memory HPM 3.36 1.79
bitlines are driving high impedance (gate) and full discharge
of array bitline due to timing mismatch is not a concern. triggering of the sense amplifier, transistor M7 is off and pass
Figure 3b shows the VLSA sense amplifier design transistors M1 and M4 are on. As the differential develops on
(schematic and layout in the 28nm design rules) . M2-M5 and the bitlines, it does so too on the internal nodes of the sense
M3-M6 form the inverters that resolve the differential voltage amplifier ‘sol’ and ‘sor’. When the sense signal ‘saenb’ is
on the bitlines to a full-swing at the output. The internal nodes asserted, the cross-coupled inverters formed of M2-M5 and
of this design are precharged through the bitlines. The obvious M3-M6 amplifies this differential voltage to its full-swing
advantage of this topology over the CLSA is the lower number output.
of transistors needed which means faster access and smaller
footprint. The challenge in using this topology has been the IV. COMPARISON METRICS
race condition for isolation signal that decouples the sense
amplifier bitline (sol, sor) from the array bitline (bit, bitb). If Simulations are performed on the two sense amplifier
the sense amplifier is enabled while M1 and M4 are on, the designs using full post layout netlist and same design flow
memory bitline (bit or bitb) could be discharged to logic 0. In used to qualify production level design. Both design have
traditional designs, a different signal other than sen (isolate) is same stimuli and drive same load for comparison. The metrics
used to control M1 and M4 which makes it hard to match chosen for comparison help to describe all aspects of operation
sense and isolate operation, but for our design we used the of the sense amplifier that are useful from a design
same signal that enable the sense amplifier to isolate the array perspective. Table I shows the comparative results of these
bitlines. simulations

Operation A. Speed of Operation


1) CLSA: Spice simulation with fully extracted post layout netlist was
The operation of this sense amplifier design is based on the performed to compare the delay of both sense amplifiers. The
current differential produced by M9 and M10 in the two pull same waveform input for both bitlines and sense enable was
down branches of the sense amplifier. At the commencement used. The speed of operation is determined by measuring the
of the read operation, either one of the ‘bit’ or ‘bitb’ inputs is 50% delay that it takes for the internal nodes of the sense
lowered depending on the data stored in the cell. When the amplifier, once the sense amplifier is triggered. In LP
sense amplifier is triggered by the lowering of the input technology, the VLSA is 67% faster than the CLSA.
‘saenb’, M11 turns on and the precharge transistors are B. Area
simultaneously turned off. Since the gate voltage of M9 and Figure 3 (a) and (b) show the layouts for the CLSA and
M10 differs by the generated bitline differential, their channel VLSA designs respectively. The transistor area for the VLS
currents are unequal. The current thus flowing in the two design is 35% less as compared to the CLSA design. The
branches of the sense amplifier is unequal and the voltage at transistors that form the inverter pair in each sense-amp are
either ‘out’ or ‘outb’ falls faster than the other node. This also shown in the figure. The NMOS transistors are
difference in voltage is resolved by the cross-coupled inverters highlighted using red, and the PMOS transistors are shown
formed of M2,M6 and M3,M7. circled in black
2) VLSA:
This design operates directly based on the voltage
differential developed on its internal nodes by the input C. Leakage
bitlines. When the wordline is turned on and prior to the Contemporary aggressive low power designs continuously
Figure 4: Normalized required offset voltage for LP process a) CLSA and b) VLSA sense amplifiers across process corners,
voltages and temperatures (VLSA requires half the offset compared to CLSA at the same PVT corner)

Figure 5: Normalized required offset voltage for HPM process a) CLSA and b) VLSA sense amplifiers across process corners,
voltages and temperatures (VLSA requires half the offset compared to CLSA at the same PVT corner)

demand increasing performance within constant or smaller


E. Input Capacitance
power budgets. With advancing process nodes usually
characterized by higher leakage, keeping leakage power to an As mentioned earlier, the sense amplifier is one element of
acceptable portion of the total power budget is a perpetual the SRAM datapath and hence its compatibility with upstream
challenge. blocks is necessary. The input capacitance of the sense
High accuracy simulations are performed using Spice to amplifier is measured using a Spice AC analysis as well as
obtain the leakage of the sense amplifiers. The simulations are through a transient simulation
performed for a long duration in order to ensure settling time 1) AC analysis: In this method a small amplitude AC source is
and high accuracy results. The leakage profile of the sense applied to the input of the sense amplifier and Spice is used to
amp is also analyzed to identify transistor channel length and measure the imaginary part of the current flowing through the
threshold voltage types that can be modified to improve the source. This is then used to mathematically compute the
leakage power consumption of the design. These changes must impedance of the sense amplifier inputs.
be made with minimal impact on the operation speed of the 2) Transient simulation: A small amplitude voltage
sense amplifier. Table 2 lists the leakage ratio for the CLSA differential is applied to the inputs of the sense amplifier, and
and VLSA and shows that CLSA is less leakage due to the charge transferred from the source is measured. The
stacking effect compare to the VLSA. As expected HPM measured magnitude of the charge transferred is used to
exhibit higher leakage than the LP process. mathematically determine the input capacitance of the sense
amplifier.
D. Dynamic Energy
F. Sensitivity The function of the sense amplifier is to
The energy consumed by the sense amplifier in operation is
resolve the voltage differential applied at its inputs to a full
determined by measuring the work done by the supply voltage
swing at the output. This important characteristic of the
source. In Spice, this can be accomplished by measuring the
sense amplifier determines the minimum input voltage
charge that is supplied by a voltage source during the
differential between sa_BL and sa_BLB (as shown in figure
operation interval of the sense amplifier. The operation
1.) required for ensuring reliable and proper operation of the
interval commences once the sense amplifier is triggered, and
sense amplifier This minimum voltage will be referred to as
until the internal node of the sense-amp changes state to store
offset voltage (Vdiff). During a read operation, one cell of
the new data present at its input. Spice was used to measure
each column in the array is turned on and it discharges a
the charge supplied by the voltage source during this operation
bitline dependent on the data stored in the cell. The bitline
interval, and the corresponding energy is then computed
voltage differential thus developed is applied to the input of
mathematically.
the sense amplifier. A sense amplifier with low sensitivity
will require a higher voltage differential. For large cache
designs that have long bitlines with many bitcells, this mismatch models at each process corner to determine the
results in higher power dissipation and longer access time. input offset voltage performed for four different
The sensitivity of the sense amplifier thus plays an important combinations of operating voltages and temperatures as
role in determining the dynamic power dissipation of the indicated in the figure, each line in the plot is for one
SRAM design While it is important to use a lower bitline particular combination of voltage and temperature.
differential to improve power characteristics, it should be Simulations are done across different process corners (SS,
sufficiently high to ensure reliable operation. Monte Carlo SF, FS, TT, &FF)[4]–[5] as also indicated in the figures. The
simulations with full transistor spice mismatch models are CLSA design shows large variation in the input offset
performed to determine the sensitivity in the presence of voltage across PVT corners as compared to the VLSA
process variation. A 1000 sample Monte Carlo simulation is design. Lower temperature exerts the worst performance
performed for different process corners. The industry with the highest voltage offset required for both cases with
standard five corners (SS, SF, FS, TT, FF) is used. The two CLSA topology require almost twice the voltage offset
letters indicate the speed of NMOS and PMOS respectively compared to VLSA at the same PVT corner.
for example SS referes to slow nMOS and slow PMOS
G. Technology advantage
while SF indicate slow NMOS and fast PMOS. The
number of functional failures for a range of different bitline Advanced technology nodes enable higher transistor density
differential voltages (Vdiff) is observed to determine the on the chip which allows the implementation of more
sensitivity. We determined the required minimum voltage functions on the chip. Smaller transistors also operate faster
offset (Vdiff) to be the one where we achieve zero error for which enables higher operation frequencies. Simulations show
the entire 1000 samples. We simulated both design that the HPM technology CLSA sense-amp is 50% faster than
topologies on the two technologies LP and HPM. We show the design in LP technology. Similarly, the VLSA sense-ampis
that VLSA design is more robust and less sensitive to 37% faster. Monte Carlo simulations show that the advanced
process and temperature variation. Figures 4 & 5 show the process technology node also improves the minimum required
variation in the minimum input offset voltage for sense-amp offset voltage of the sense amplifier. Lower offset voltage
operation for the CLSA and VLSA designs. Note that the Y- requirements again translate into higher operation speed
axis has the normalized offset voltage for the CLSA & benefits. Figure 6 shows the ratio by which the offset voltage
VLSA designs. We used Monte Carlo simulations with full of each sense amplifier topology in the LP technology is

Figure 6. Increase in offset voltage for LP tech sense amp compared to HPM (a) CLSA (b) VLSA

Figure 7. Increase in offset voltage for cold vs. hot temperature (a) LP CLSA (b) LP VLSA
greater than that in the HPM technology. Each plot in the
figure is for a particular combination of supply voltage and
temperature, across the five different process corners. As can
be seen, the required offset voltage for the CLSA topology is
upto 30% lower in the HPM technology as compared to that in
the LP technology. Similarly, the required offset voltage is
about 45% greater in the LP technology for the VLSA design
as compared to the same design in HPM technology.

H. Temperature Sensitivity
Increased chip transistor density in advanced process
technology nodes also leads to higher chip operating
Figure 6: Percentage increase of required offset voltage
temperatures. It is thus pertinent to know the impact that
for -30C compare to same 125C PV 125C (hot)
temperature has on the required minimum offset voltage.
Temperature sensitivity
Figures 7 shows the extent to which the offset voltage
increases at cold as compared to hot temperatures for the LP
REFERENCES
technology CLSA and VLSA designs. As can be seen, at
[1] J. Yuan et al., “Performance Elements for 28nm Gate Length Bulk
higher voltages of 1.26v, the required offset voltage is higher
Devices with Gate First High-k Metal Gate”, Solid-State and Integrated
by 15%-20% for the CLSA design. In the HPM technology, Circuit Technology, 2010, 10th IEEE International Conference on, pp.
this trend is reversed with higher increases for the offset 66–69.
voltage seen for lower values of operating voltage (see Fig. 8) [2] Wu et al., “A Highly Manufacturable 28nm CMOS Low Power Platform
Technology withFully Functional 64Mb SRAM Using Dual/Tripe Gate
– at low voltages, the CLSA design offset voltage is higher by Oxide Process”, VLSI Technology, 2009 Symposium on, pp. 210–211.
more than 35% at cold temperatures as compared to hot. There [3] N. Weste and D. Harris. CMOS VLSI Design: A Circuits and
is not much variation in required offset voltage with Systems Perspective. Addison-Wesley, 2005
temperature for the VLSA design in HPM technology. [4] International Technology Roadmap for Semiconductor (ITRS) itrs.net
[5] Baker Mohammad, Martin Saint-Laurent, Paul Bassett, and Jacob
Abraham. Cache Design for Low Power and High Yield, IEEE
International Symposium on Quality Electronic Design (ISQED) ,March
V. SUMMARY & CONCLUSION 2008, pp 103-107, San Jose, CA, USA
[6] Baker Mohammad, Jacob Abraham; A reduced Voltage Swing Circuit
This publication describes the simulation methodology used Using A single Supply to Enable Lower Voltage Operation for SRAM-
to compare the two VLSA and CLSA sense amplifier designs. based Memory; Microelectronics journal, Elsevier, December 2011
The simulations results clearly show the advantage of the
VLSA design over the CLSA design. The faster speed of
operation and lower input differential required by the VLSA
design makes it an ideal choice for high speed, low power
datapath design. Traditional design complexity arise from
using VLSA has also been addressed by using one signal to
enable the sense amplifier and to isolate the array bitline from
sense amp bitline (as shown in figure 3b).

Figure 8: Increase in offset voltage for cold vs. hot temperature for LP technology (a) CLSA design (b) VLSI

You might also like