Matchline Controller For Content Addressable Memory

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

710 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 64, NO.

6, JUNE 2017

Energy-Efficient Adaptive Match-Line Controller


for Large-Scale Associative Storage
Sandeep Mishra, Member, IEEE, and Anup Dandapat, Senior Member, IEEE

Abstract—Ternary content-addressable memory (TCAM) is a


hardware search engine that is used to speed up searching through
prestored contents rather than addresses. A supplementary don’t
care (X) state suits TCAM for many network applications but
requires a large design area and consumes high power. This brief
reports a state-of-the-art architecture of TCAM, which reduces
the search (compare) energy dissipation through the adaptive
match-line (ML) controller by blocking the evaluation in the
subsequent cells when a mismatch or invalid ML data originate
in the evaluated cell. The proposed 128 bit × 32 bit low-energy
adaptive associative memory has been implemented using a pre-
dictive 45-nm CMOS process and simulated in SPECTRE at a
supply voltage of 1 V. The macro area in our proposed architecture
has been reduced by 48% compared with the traditional TCAM
design with a trivial increment of 13.2% in the energy delay
product.
Index Terms—Content-addressable memory (CAM), low-power
design, search engines, ternary CAM (TCAM).

I. I NTRODUCTION

T ERNARY content-addressable memory (TCAM) renders


an accelerated data search medium by comparing the
search value with the prestored contents in a single clock.
A TCAM performs three preliminary operations, i.e., WRITE,
Fig. 1. DBAM array structure (SL/SL: SLs. BL/BL: data-lines. P : mask
COMPARE , and READ . During the search operation, the input is input).
prefetched to the match index, and a simultaneous comparison
is carried out with the previously loaded information. TCAM is Primarily, the following power saving architecture types are
an efficient search engine that makes it suitable in asynchronous followed: 1) performance-aware design (based on the TCAM
transfer mode switching and fast lookup of network routing. storage pattern, matching probability, and area utilization);
Beyond just data match, these applications search for associated 2) precomputation based (use of mismatch filtering and ML
pieces of data using the TCAM as associative memory. In evaluation blocking); 3) segmented design (ML segmentation,
addition to faster searching, a large number of storage cells hybrid ML, and use of dynamic voltage supply); and 4) selec-
occupy a substantial design area and make TCAM more power tive ML precharging and cell gating (selective ML evaluation
hungry [1]–[4]. and self gating). Among these, all the nonsegmented architec-
Simultaneous search of the entire TCAM bank leads to tures face the challenge of high-leakage power consumption.
a higher search-line (SL) switching activity and concurrent A segmented architecture has been presented in [11] based
match-line (ML) evaluation. Numerous attempts have been on charge sharing, where the MLs have been detached into
made in reducing the power consumption based on these con- various segments. The initial segment has been precharged,
siderations of low SL switching and low ML current [5]–[14]. and the rest have been charge shared. An electrical separation
has been created such that a mismatch in a segment does not
drain the ML charge in others. Ruan et al. have partitioned the
Manuscript received June 14, 2016; accepted July 24, 2016. Date of
publication July 28, 2016; date of current version May 26, 2017. This work input bit stream into several groups, among these, the output
was supported in part by the Ministry of Human Resource Development, Gov- has been derived with the use of a block XOR approach [12].
ernment of India, under Grant 25-2/2010-TS.II. This brief was recommended These designs reduces the leakage but without any reduction in
by Associate Editor M. Alioto.
The authors are with the Department of Electronics and Communication the cell area. High-density implementations have been achieved
Engineering, National Institute of Technology Meghalaya, Shillong 793 003, by unique storage approaches in [15]–[18], yet [15], [17], and
India (e-mail: ssandeep.mmishra@nitm.ac.in; anup.dandapat@nitm.ac.in). [18] do not serve the purpose of a TCAM. A TCAM finds its
Color versions of one or more of the figures in this brief are available online
at http://ieeexplore.ieee.org. application in networking and compression, where the masking
Digital Object Identifier 10.1109/TCSII.2016.2595598 trait plays a pivotal role. The designs presented in [15] and [16]
1549-7747 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
MISHRA AND DANDAPAT: AMC FOR LARGE-SCALE ASSOCIATIVE STORAGE 711

Fig. 2. Low-level architecture of the LAAM [N (E): test node signal of the erstwhile CAM cell].

provide a way of ML error detection but do not prevent the TABLE I


D ECISION TABLE OF THE P ROPOSED AMC SL/SL: SL
unnecessary comparison. Minimizing the switching activity and
reducing the search (comparison) power consumption greatly
alleviates the TCAM dissipation.
This brief presents a novel TCAM design, which portraits
the twofold architecture of CAM, solitary checking the adap-
tive ML controller (AMC). The preliminary objective in our
proposed design is to block redundant comparisons through
the proposed AMC. It falls under the type-4 power reduction
scheme (selective ML evaluation) as a mismatch or an invalid
ML data blocks the evaluation in the subsequent cells. The rest
of this brief is structured as follows. Section II describes the
low-energy adaptive associative memory (LAAM) where the
AMC and LAAM bank have been introduced. It is followed by
the design analysis and performance comparison in Section III,
and we conclude in Section IV.

II. L OW-E NERGY A DAPTIVE A SSOCIATIVE M EMORY


The proposed design begins with the dual-bit associative
memory (DBAM), as depicted in Fig. 1. It features a twofold
architecture of CAM in an up–down manner. A coupled WRITE /
MASK / SEARCH control word line (WL) has been applied to
both (upper and lower) single-bit CAMs, as well as the mask
storage. Alternate data (“1” and “0”) have been applied through Fig. 3. Critical ML path of the AMC.
data lines Data/Data, and according to the WL status, WRITE
or SEARCH operation is carried out. A common decoupled SL
followed by a parallel comparison with the contents stored in
Search/Search has been provided to the pair of CAMs (CAM1 all the words. Majority of the energy dissipation occur during
and CAM2 ), such that either of the stored values matches with
this comparison of searched value with all prestored data. The
the search key in the absence of masking. The design is active
primary motive is to block the unnecessary comparison when
high: the output “U” becomes “1” for a match, and else is set the MLs of the erstwhile cells or the comparator outputs of the
to “0” which suits for NAND-type ML sensing. A traditional
present cell carry invalid data. Thus, we have introduced the
eight-transistor prefixing circuitry (SRAM) has been used for
AMC prior to the comparator of subsequent cell.
mask storage with an input “P .”
The mask inputs (P1 and P2 ) can be masked together or indi-
A. AMC
vidually. The output “M ” set at “1” results in wild match (local
masking) to all search keys. If global masking is considered, The AMC presented in Fig. 2 considers comparison outputs
SLs (Search and Search) are set at “1”. The same can be done by of the present cells (Cmp1 and Cmp2 ), as well as the ML
masking both CAM cells together locally. When the compare outputs of the erstwhile cells (E1 and E2 ) to provide a correct
operation is carried out, data are prefetched to the match index, ML value in the present cell. In the case of masking, the valid
712 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 64, NO. 6, JUNE 2017

Fig. 4. LAAM bank and extracted layout with macro performance.

bit (VB) is set to “0,” thereby making the AMC dependent


only upon the previous ML values (E1 and E2 ). In the no-
masking state, both Cmp and E values have been taken into
consideration. The property of NAND-based ML scheme is the
discharging of ML for a match result. When a mismatch occurs,
ML remains charged at VDD , thereby reducing the transient
energy dissipation.
Based on the decision function presented in Table I when VB
carries invalid value “1,” the MLs carry mismatch signal VDD ,
irrespective of the evaluation result. A high charge on the test
node (N ) makes ML1 and ML2 dependent upon the evaluation
result Cmp1 and Cmp2 , respectively. The mismatch result in
both the erstwhile CAM cells (E1 = E2 = 1) discharges the N
voltage. At this state, comparison in the subsequent cells has Fig. 5. Measured waveforms of the 128 × 32 LAAM.
been blocked by signal N (E) to reduce the undesired energy
consumption due to the comparison. are not precharged rather controlled by the WL. We have used
The critical ML path is determined by the VB generator an improved current race ML sensing scheme as presented in
of the present cell and the test node of the erstwhile cells. [16]. Instead of enabling the MLs sources through IBIAS , the
Transistors present in this critical path of the AMC have been nets ML1 and ML2 are discharged to GND during WRITE, as
sized for an optimized energy and speed performance. The well as the precharge phase.
modeled AMC, as depicted in Fig. 3, consists of low-threshold
transistors at critical path and standard threshold in the rest. III. R ESULTS AND A NALYSIS
Except the AMC, all other modules remained unsized for
manifest reconstruction. The design presented in this brief has been implemented
using the predictive generic process design kit 45-nm CMOS
B. LAAM Bank process. The propounded architecture depicted in Fig. 4 has
been simulated using Spectre for a supply voltage scaling from
The LAAM bank is presented in Fig. 4 with support circuitry 1.2 to 0.5 V and across a varying temperature from −20 ◦ C
for WRITE / SEARCH control. One of the major constraints in the to 100 ◦ C. The measured waveforms of the extracted layout
AMC is the VB decision time. The results of the previous stages netlist are shown in Fig. 5. To check the competency of our
are used for the ML control, thereby making a virtual pipeline proposed architecture, it has been compared with the swapped-
control in the present cell. The AMC has been sized properly XOR TCAM [7] and N-CAM [10] for various TCAM macros in
for a minimum controller delay. Another issue present is the the mentioned environment.
intercell routing space. There are many I/O ports in each AMC,
which almost takes 40% of the layout area. However, most of
A. Performance Comparison at Various TCAM Macros
these are internal connections; thus, the area taken by the AMC
is utilized for minimizing the routing space. Due to the dual-bit structure, the proposed architecture per-
When a match occurs in all the CAM cells in a row, the ML is forms better for large TCAM sizes as the number of comparison
charged down to 0. However, in the presented design, the MLs reduces, as shown in Fig. 6. The ML delay variation of 123.7%
MISHRA AND DANDAPAT: AMC FOR LARGE-SCALE ASSOCIATIVE STORAGE 713

Fig. 6. Area, ML delay, and energy dissipation comparison of the proposed LAAM with referred designs at various TCAM macros at 27 ◦ C and a VDD of 1 V.

TABLE II
F REQUENCY A SSESSMENT OF THE P ROPOSED LAAM C ONSIDERING THE
ML L OW-L OGIC V OLTAGE (V OLTAGES ARE M EASURED IN V OLTS )

voltage has been used in this regard. Table II presents the ML


voltage analysis for a match case (discharge) at an operating
frequency range of 50–500 MHz. Results show that the design
functions well up to a supply voltage scaling of 0.8 V at all
frequencies and works even at a low-supply voltage of 0.6 V at
50 MHz. The maximum frequency of operation (not presented)
Fig. 7. ML delay and energy performance at various process corners (FF: Fast of the proposed LAAM is 625 MHz (search duration = 0.8 ns).
corner. FS: Fast nMOS, slow pMOS. SF: Slow nMOS, fast pMOS. SS: Slow Fig. 8(a) and (b) depicts the peak current and the ML delay
corner. TT: Typical corner). comparison of the proposed LAAM with referred TCAM ar-
chitectures at various supply voltages for 2-Kb macro. The
for the 128-bit macro to 81.2% for the 32-bit 2 Kb have proposed design performs faster at all supply voltages below
been found providing a better energy delay product (EDP) 1.2 V among compared designs.
metric at higher TCAM sizes. As the macro size increases,
the proposed design clearly outclasses the referred TCAMs in
speed performance, as presented in Fig. 6(b), which ensures the D. Performance Comparison Summary
cascading capability of the proposed design for forming large Performance comparison with recently proposed designs has
TCAM structures. been summarized in Table III. Binary CAMs (BCAMs) have
been implemented in [1], [7], and [14], which provides a better
B. LAAM Sensitivity to Process Corner Variation EDP metric, but the masking feature of TCAM is essential in
The proposed LAAM has been analyzed at various process many applications, as discussed in Section I. The least macro
corners (TT-FS), as shown in Fig. 7. The architecture performs area with comparable delay metric makes the proposed design
best at the fast corner (FF). An incidental variation of 17.3% in a true candidate for implementing the modern memories.
the ML delay of the presented design proves the TCAM sustain-
ability at various process corners. The SL comparison control IV. C ONCLUSION
in the swapped-XOR TCAM helps in maintaining the reliability
at all corners, whereas the N-CAM is vulnerable to these In this brief, a state-of-the-art architecture of TCAM that
variations. The proposed design works at a low-supply volt- reduces energy consumption by using adaptive ML controller
age of 0.6 V with an EDP of 2203.2 fJ × ps at typical corner (TT). has been presented. The proposed LAAM reduces the undesired
comparison for invalid ML data based on the decision function.
Presented results conclude that the proposed LAAM functions
C. Low-Voltage Analysis
at a low-voltage supply of 0.6 V and can reduce the macro area
During the frequency assessment, the ML settling time di- by 48% with comparable ML delay metric making it suitable in
rectly affects the frequent search operation. The ML settling high-density design requirements.
714 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 64, NO. 6, JUNE 2017

Fig. 8. Peak current, ML delay, and energy dissipation comparison at various supply voltages.

TABLE III
F EATURE S UMMARY AND C OMPARISON

R EFERENCES [9] Y. J. Chang, “Using the dynamic power source technique to reduce TCAM
[1] N. Onizawa, S. Matsunaga, V. C. Gaudet, and T. Hanyu, “High-throughput leakage power,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 11,
low-energy content-addressable memory based on self-timed overlapped pp. 888–892, Nov. 2010.
search mechanism,” in Proc. IEEE 18th Int. Symp. ASYNC, 2012, [10] P. T. Huang et al., “0.339 fJ/bit/search energy-efficient TCAM macro
pp. 41–48. design in 40nm LP CMOS,” in Proc. IEEE A-SSCC, 2014, pp. 129–132.
[2] B. Wang, T. Q. Nguyen, A. T. Do, J. Zhou, M. Je, and T. T. H. Kim, [11] S. Baeg, “Low-power ternary content-addressable memory design using a
“Design of an ultra-low voltage 9T SRAM with equalized bitline leakage segmented match line,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55,
and CAM-assisted energy efficiency improvement,” IEEE Trans. Circuits no. 6, pp. 1485–1494, Jul. 2008.
Syst. I, Reg. Papers, vol. 62, no. 2, pp. 441–448, Feb. 2015. [12] S. J. Ruan, C. Y. Wu, and J. Y. Hsieh, “Low power design of
[3] Y.-J. Chang, “A high-performance and energy-efficient TCAM design for precomputation-based content-addressable memory,” IEEE Trans. Very
IP-address lookup,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, Large Scale Integr. (VLSI) Syst., vol. 16, no. 3, pp. 331–335, Mar. 2008.
no. 6, pp. 479–483, Jun. 2009. [13] I. Hayashi et al., “A 250-MHz 18-Mb full ternary CAM with low-voltage
[4] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory matchline sensing scheme in 65-nm CMOS,” IEEE J. Solid-State Circuits,
(CAM) circuits and architectures: A tutorial and survey,” IEEE J. Solid- vol. 48, no. 11, pp. 2671–2680, Nov. 2013.
State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006. [14] A. T. Do, S. Chen, Z. H. Kong, and K. S. Yeo, “A high speed low power
[5] S. H. Yang, Y. J. Huang, and J. F. Li, “A low-power ternary con- CAM with a parity bit and power-gated ML sensing,” IEEE Trans. Very
tent addressable memory with pai-sigma matchlines,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 1, pp. 151–156, Jul. 2013.
Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1909–1913, [15] D. Kayal, A. Dandapat, and C. Sarkar, “Design of a high performance
Oct. 2012. memory using a novel architecture of double bit CAM and SRAM,” Int.
[6] C. Wang, C. Hsu, C. Huang, and J. Wu, “A self-disabled sensing technique J. Electron., vol. 99, no. 12, pp. 1691–1702, Jun. 2012.
for content-addressable memories,” IEEE Trans. Circuits Syst. II, Exp. [16] S. Mishra and A. Dandapat, “EMDBAM: A low-power dual bit associa-
Briefs, vol. 57, no. 1, pp. 31–35, Jan. 2010. tive memory with match error and mask control,” IEEE Trans. Very Large
[7] A. Agarwal et al., “A 128 × 128 b high-speed wide-and ML con- Scale Integr. (VLSI) Syst., vol. 24, no. 6, pp. 2142–2151, Jun. 2016.
tent addressable memory in 32nm CMOS,” in Proc. ESSCIRC, 2011, [17] M. Chae, J. W. Lee, and S. H. Hong, “Decoupled 4T dynamic CAM
pp. 83–86. suitable for high density storage,” Electron. Lett., vol. 47, no. 7,
[8] J. W. Zhang, Y. Z. Ye, and B. D. Liu, “A current-recycling technique pp. 434–436, Mar. 2011.
for shadow-ML sensing in content-addressable memories,” IEEE Trans. [18] V. Vinogradov, J. Ha, C. Lee, A. Molnar, and S. H. Hong, “Dynamic
Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 6, pp. 677–682, ternary CAM for hardware search engine,” Electron. Lett., vol. 50,
Jun. 2008. no. 4, pp. 256–258, Feb. 2014.

You might also like