Download as pdf or txt
Download as pdf or txt
You are on page 1of 77

Embedded Memories for SoC

Harold Pilo
IBM Systems and Technology Group
Essex Junction, Vermont, USA

February 20, 2011

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Typical SoC Memory Content
Memory takes up 56% of active area
18.22 X 17.85
PLL PLL PLL
HS12GBF04 HS12GBF04 HS12GBF04 HS12GBF04 HS12GBF04 HS12GBF04 HS12GBF04 HS06
FD FD FD FD FD FD FD GPF0
2; FD
TVS
ENS
3rd

20% SRAM:
HS12GBF04
FD HS12GBF04
eDRAM
FD
DRAMA16X2 DRAMA16X2 DRAMA16X2 DRAMA16X2
56X8X400D3 56X8X400D3 56X8X400D3 56X8X400D3
V1M411 V1M411 V1M411 V1M411
HS12GBF04
FD

CktBlk_PM+EDC CktBlk_PM+EDC
HS12GBF04
FD
DRAMA16X2
56X8X400D3
V1M411
DRAMA16X2 DRAMA16X2
56X8X400D3 56X8X400D3
V1M411 V1M411
DRAMA16X2
56X8X400D3
V1M411
- 6T SRAM
DRAMA16X2 DRAMA16X2 DRAMA16X2 DRAMA16X2
56X8X400D3 56X8X400D3 56X8X400D3 56X8X400D3

- 6T Register File
HS12GBF04 CktBlk_PM+EDC
V1M411 V1M411 V1M411 V1M411
FD HS12GBF04
DRAMA16X2 DRAMA16X2 DRAMA16X2
FD 56X8X368D3 56X8X368D3 56X8X368D3
V1M411 V1M411 V1M411

DRAMA16X2 DRAMA16X2 DRAMA16X2


HS12GBF04
- Dual-Port SRAM
56X8X368D3 56X8X368D3 56X8X368D3
V1M411 V1M411 V1M411
FD HS12GBF04
FD

- 2-Port Register Array


CktBlk CktBlk CktBlk CktBlk CktBlk CktBlk CktBlk CktBlk
HS12GBF04 _PE _PE _PE _PE _PE _PE _PE _PE CktBlk_P
FD SM
CktBlk CktBlk CktBlk CktBlk
HS12GBF04
CktBlk CktBlk CktBlk CktBlk
_PE _PE _PE _PE _PE _PE _PE _PE FD
CktBlk
CktBlk_IPSI+IPSE CktBlk_PFM CktBlk CktBlk CktBlk CktBlk CktBlk CktBlk CktBlk CktBlk _ELA
HS12GBF04 _PE _PE _PE _PE _PE _PE _PE _PE M
FD
CktBlk
_PE
CktBlk
_PE
CktBlk
_PE
CktBlk
_PE
CktBlk
_PE
CktBlk
_PE
CktBlk
_PE
CktBlk
_PE
CktBlk_PLLP
6T SRAM
TVS
ENS

CktBlk_TMM+UTM
SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM
1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM
1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
SRAM
1DBS
SRAM
SRAM
1DBS
SRAM
SRAM
1DBS
SRAM
SRAM
1DBS
SRAM
SRAM
1DBS
SRAM
SRAM SRAM SRAM SRAM
1DBS 1DBS 1DBS 1DBS
SRAM SRAM SRAM SRAM
SRAM
1DBS
SRAM
36% eDRAM
M CktBlk_(MAC+PCS)20 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM
CktBlk_PRM CktBlk_(MAC+PCS) 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM

10 1DBS
SRAM
1DBS
SRAM
1DBS
SRAM
1DBS
SRAM
1DBS 1DBS 1DBS 1DBS 1DBS
SRAM SRAM SRAM
1DBS
SRAM

CktBlk_LLM CktBlk_(MAC+PCS)20 CktBlk_(MAC+PCS)20 1DBS


SRAM
1DBS
SRAM
1DBS
SRAM
1DBS
S
SRA
1DBS 1DBS 1DBS
SRAM SRA SRAM
1DBS
SRAM
RA
S
1DBS 1DBS 1DBS M1DB
M 1DBS M1DB 1DBS 1DBS

CktBlk_P CktBlk SRAM SRAM SRAM SRA SRAM SRAM SRAM SRAM SRAM SRAM
1DBS 1DBS 1DBS M1DB 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
SM _CIM SRAM SRAM SRAM SRA SRAM SRAM SRAM SRAM SRAM SRAM
1DBS 1DBS 1DBS M1DB 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
SRAM SRAM SRAM SRA SRAM SRAM SRAM SRAM SRAM SRAM
1DBS 1DBS 1DBS M1DB 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
CktBlk_FPM CktBlk_ SRAM
1DBS
SRAM
1DBS
SRAM
1DBS
SRA
M1DB
SRAM
1DBS
SRAM
1DBS
SRAM
1DBS
SRAM
1DBS
SRAM
1DBS
SRAM
1DBS
LOS CktBlk_PRM SRAM SRAM SRAM SRA SRAM SRAM SRAM SRAM SRAM SRAM
1DBS 1DBS 1DBS M1DB 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
CktBlk_DPM2 CktBlk_DPM2 CktBlk_DPM2 CktBlk_D CktBlk_REC SRAM SRAM SRAM SRA SRAM SRAM SRAM SRAM SRAM SRAM
CktBlk_FPM 1DBS 1DBS 1DBS M1DB 1DBS 1DBS 1DBS 1DBS 1DBS 1DBS
0 0 0 PM10 CktBlk_LLM

HS12GCT08 HS12GCT08 HS12GCT08 HS12GCT08 HS12GCT08 HS12GCR08


HSS12GCT0 HSS12GCR
TX TX TX TX TX RX
PLL 4 TX 04 RX PLL

Memory consumes 41% of the total chip power

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


SoC Memory Types: Density Rationale

eDRAM / SRAM / Register File Area Density


(32nm Technology Node)

12000 SRAM eDRAM_F eDRAM_A RF

10000
Density (Kb/mm )
2

8000

6000

4000

2000

0
0.1 1 10 100 1000 10000 100000

Instance Size (Kb)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


SoC Memory Types: Function/Performance

- low leak - mid-range - high perf.

(32nm Technology Node)

11Mb/mm2 0.2-0.5Mb/mm2 1.7 Mb/mm2 4 Mb/mm2


SRAM2SP SRAM1D
4-pipe RA2
0.4Mb/mm2

SRAM1D
Frequency

1.1Mb/mm2
SRAM1D

8Mb/mm2 2-4 Mb/mm2


2-pipe
RA4 TCAM SRAM2SF SRAM2D
3.5Mb/mm2
0.2Mb/mm2

RF2D RF
2Mb/mm2 2MB/mm2
1-pipe 1-pipe

eDRAM eDRAM Reg-Array TCAM Dual-Port 1R1W (Time- SRAM/RF


(Area-Opt) (Perform. Opt.) (2RW/2R2W) (2RW) Multiplexed) (1RW )

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


The SoC Memory Compiler

• SoC design requires hundreds of


custom instance configurations
(1000s of instances)

Grow-ability for 6T SRAM:

Word Depth Data Width


256b 288b
(64b granularity)

32Kb 2b
(2Kb granularity)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Compiler Also Used to Select Best Fit

• These are some 4Kbx32


configurations built by memory
4K x 32
compiler

4K x 32
– Area variation: +/- 13%
– Power variation: +/- 19%
4K x 32 – Performance variation: +/-16%

4K x 32 4K x 32

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


6T-SRAM: The Industry’s Most Pervasive

• Fastest Random Cycle Time


• Architecture flexibility
– Alterations in Word / Bit dimensions and I/O width
• Design simplicity
– Compared to alternatives (ex. eDRAM)
• Compatibility to logic
• Area (small bit-density macros)
• No refresh requirements

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


SRAM Bitcell Area and VDD Scaling

4.0 50% Area Scaling 30-40% Scaling


2.0 1.5
Bitcell Area (m2)

1.2 1.2

VDD (V)
1.1 High-K MG
1.0 1.1
Thin Bitcell
0.9
0.8 1.0
0.7
0.6 0.9
0.5
0.4 0.8
0.3
0.2 ?
0.1

180 130 90 65 45 32 22 14
Technology Node (nm)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


6T SRAM Bitcell

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


6T-SRAM Array Basics – Read Operation

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Data-Path – Read Operation
Column select
Redundancy mux Redundancy mux
(array specific) Data mux
RESTORE (array specific)

Output
Latch

GLOBAL BITLINE
DECODE
WORDLINE
BITLINE

6T SRAM Cell
CLK
CLK

PRECHARGE
CLK Cross-couple NAND
Dynamic-to-Static conversion

Local Domino/ [1] J. Pille, ISSCC 2010


Sense-Amp

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


6T-SRAM Array Basics – Write Operation

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Column Interleaving / Half-Select
Selected
Column  Most SRAM architectures use
Half-Select Half-Select
For Write Column Interleaving
 Only selected columns are
BLs BLs BLs
written as a result of:
NC/NT NC/NT NC/NT
 1-of-n column decode
 Bit-Write
WLn
 Remaining columns remain
in Half-Select Mode
 Exposed to Stability Disturb

WL0
 Similar effect occurs during
NC NT NC NT read operations

BLCn
BLC0

BLC1

BLTn
BLT0

BLT1

 Stability Disturb occurs


during both Write and
Read Operations
Write Driver

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Stability Disturb: Read vs. Write
WL
Half-Select Write
 Wider Wordline
NC NT
 No Additional Load BLs

BLC
BLT
beyond Bit-Line
NC/NT

No Load

WL
Read Operation
 Narrower Wordline
NC NT
 Additional Load BLs

BLC
BLT
beyond Bit-Line
NC/NT

Sense-Amp /
Data-Line Load

Stability Disturb for Read Operation Write (Half-Select) Operation

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Static Noise Margin Simulation

Butterfly Curve

V(NT,NC)
V(NC,NT)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Stability – Static Noise Margin

• Measure of how much “noise” the cell can


tolerate before it loses its data

• Basic metric on how high the low-node can


rise against the feedback inverter’s unity gain

• Butterfly curve illustrates DC stability margin


– Pessimistic prediction of VMIN

• Ideal for cell-to-cell comparisons, but does


not take architecture into account

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Stability – Architecture Effects

• Bit-Line Load: 128 bits/BL design


256 b/BL – Design made robust enough to
64 b/BL
512 b/BL 128 b/BL meet VMIN target
– Same design at 256 bits/BL or
greater fails for stability
– Going from 256 to 64 bits/BL can
provide VMIN benefit of 25mV-50mV

• Clamped BL during Half-Select


– Has similar effect as very heavily loaded
BL (ex. 512 bits/BL)

• Word-Line Pulse Width


– A shorter WL will reduce fail rate, but…
– WL must be long enough to meet
WRITE/READ targets

Yield analysis for large SoC must include all variations


across all memory types and configurations

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Conventional Write Margin Definition
[2] K. Takeda, ISSCC 2006

• Metric for relative DC write


margin comparisons between
cells, but unlike SNM, may not
represent worst case design
point for VMIN predictions
– Ex: Bit-Line low level may not be
completely at GND level

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Technology Scaling and Stability Margin

 Increasing device variations


eroding stability margins
Bitcell Switch-Point
Voltage Level
(normalized)
SNM

SNM with Assist Features

Bitcell 0-level “bump”

 Advanced processes (e.g.


HKMG) and circuit assist
130nm

90nm

65nm

45nm

32nm

features required to maintain


Technology Node scaling trend

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Source of Device Variations
• Chip-to-chip Variations
Chip to chip
– Chip Mean: process line
variations
centering
– Systematic Variations: ACV,
lithography-driven, PC
density

• Within chip variations


– Random Dopant
Fluctuations (RDF)
– Line Edge Roughness (LER)
[3] Y. Wang, ISSCC 2009
– Aging effects: NBTI, PBTI,
Hot Carriers

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Random Dopant Fluctuations
• Statistical variation in the number of dopant atoms
resulting from small geometry transistors

• One  Vth Variation:

Vth  TEOT / Lg*W

– Stability fail rate increases by 7x if Lg* WL reduced by 10% [4]


• 700x for 30% reduction
• 5-orders for 50% reduction

• Why SRAM more impacted than Logic?


– Small geometry transistors
– Bi-stable, non-broken feedback latch
– Reliance on Vth matching
– Highly repetitive (>106, 5)
– Higher base doping for increased base Vth

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


VMIN (VCCMIN or VDDMIN): Failure Categories

• Predominant cause of yield loss at VMIN


1. Failure to write
2. Stability fails from read/half-select
3. Readability or signal margin fails
4. Retention

• VMIN fails occur when Vth


operating window exceed the
limits imposed by the random
Vth variations
[5] M. Yamaoka, VLSI 2004

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Bitcell Fail Probability
• SRAM integration in 32nm has reached > 32MB [6]

• Bitcell fail probability (# of expected cell failures) to


yield high density products required in the 5.2
range (0.2 fails/1Mb or 99.99998% yield for a 1Mb)

100
Cell Fail Probability
90
1'%
Array Yield %

80 0.1'%
70 0.01'%
0.001'%
60
0.0001'%
50 0.00001'%
40 0.000001'%

30
20 [7] R. Joshi, ESSDERC 2006
10
0 # Cells per Array
10 100 1000 10000 100000 1000000

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


How Much Variation to Yield SRAM?

Leakage
Write Margin Limit
Chip-to-chip
Limit
Variations
PFET Strength

Fast/Fast (FF)

Slow/Fast (SF)
Typ/Typ (TT) RDF
Variations
Fast/Slow (FS)

Slow/Slow (SS)
Stability
Signal Margin Limit
Limit

NFET Strength

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Stability Sensitivity to Vth
• 10% Vth error causes > 1-order fail bit count error
• Vth accuracy is key to fail prediction
1 Fail / 32Kb
[8] H. Yamauchi, ISSCC 2009
1 Fail / 0%

Vth error
294Kb 5%
SNM (mV)

1 Fail / 10%
1 Fail / 15%
3.4Mb
10Mb 20%
25%

 Number
© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE
Determining Bitcell Fail Probability

PU- Vth • 3 unique Vth, one for each


PG- Vth
transistor type (PU/PD/PG),
– But 6 independent variables

PD- Vth

• Once Vth is derived for all bitcell devices, determination


requires bitcell interface signals for actual product

– Statistical analysis of the 6 bitcell devices is performed


and requires architecture components:
• Bit-line length and parasitics
• Word-line duration
• Sense-amplifier and write drivers

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Statistical Analysis:
Importance Sampling Methodology
• Monte Carlo (MC) works well, but it requires millions of samples across
the full distribution to estimate such low fail probabilities
– For 16Mb, required sampling ~ 109
• Importance Sampling is a technique which gets around this problem by
distorting the (natural) sampling function to produce more samples in
the important regions (the tails)
– Speed-up to MC ~ 105 for a 5 failure rate

[9] R. Kanj, DAC 2006


Samples are weighted with
Too many occurrence probability!
samples here Probability
Probability

Too few
Samples Importance
here Sampling

Failed More Failed


Samples Samples

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Redundancy Usage and VMIN
• Redundancy plays a large role in determining yield
– Increasingly used to improve VMIN yield rather than hard defects

– Number of elements and domain-coverage matter

Density vs. Yield for an X- Bitcell Yield vs. Bitcell Margin for Large x-Mb SRAM

3
2
Yield

Yield

# of Repairs / Macro

Bitcell Margin
Density (Kb)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Column Redundancy: Static Column Shifting

Static Column shifting with integrated


column address decode

[10] L. Wissel, CICC 2007

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Row Redundancy Implementation

+ Best Possible area


+ Large Repair Domain + Smallest fuse count
- Setup/Access penalty + Simplest design
- Timing complexity - Small repair domain

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


The Conflict Between Stability and Write
 Stability and Write-ability in
conflict with each other w.r.t.
PG and PD/PU strength
 Slow-NFET, Fast-PFET
(SF) is very stable, but
hard to write
 Fast-NFET, Slow-PFET
(FS) is writeable, but
unstable

 Bitcell normally optimized to


strike balance between
stability and write-ability

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Stability vs. Write-ability Trade-off
Pass-Gate (PG) Vth vs. Fail Probability

Bitcell centered to
maximize the worst Target sigma
of stability and write-
ability; the overall ~ 1:1 trade off in
sigma is too low (with stability and write-ability
vs. PG strength
just TYPICAL
Sigma

devices)

PG Vth Adder

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Write Assist By Negative Bit-Line Boost
 Most effective in reducing failure rate
 Little intrusion to the design
 Small area overhead
 Too much negative boost voltage can create
following hazards:
 Reliability fails from increased VGS, VDS for
transistors in the write path
 Inadvertent write to unselected bitcells

[11] D.P. Wang, SOCC 2007


[12] K. Nii, VLSI 2008
[13] Y. Fujimura, ISSCC 2010
[14] H. Pilo, ISSCC 2011

Yield
0.8V Yield Comparison
Cell with and without W.A.
Nodes
Negative BL increases PG
VGS and VDS to improve
write margin
SRAM Instance

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Write Assist By Lowering Supply

[15] K. Zhang, ISSCC 2005


[16] M. Yamaoka, ISSCC 2005
[17] H. Pilo, VLSI 2006

 Very effective technique for reducing


write failure rate
 However…
 Lower VWR write assist voltage may
be in contention with SRAM
retention/hold ceiling voltage for
inactive bitcells sharing VWR voltage
 Technique not suitable for all memory
types, eg. Dual-Port Memory

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Stability Assist: Dynamic Word-Line Up Level

Controller

Vsensor

WLUD Sensor
130mV Improvement in
VMIN for 32nm SRAM

[18] P. Kolar, ISSCC 2010

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Stability Assist: Bit-Line Voltage Supply

 Decreased Bit-Line restore voltage during read / half-select


 Current injection into the bitcell is decreased as BL voltage is
reduced
NC/NT
 Decreases “low-node” bump voltage

[19] A. Bhavnagarwala, VLSI 2007


[20] M. Khellah, VLSI 2006
[14] H. Pilo, ISSCC 2011

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Cell-Terminal Biasing: VCS
+ Improved Stability - Increased Leakage
+ Improved Write Margin - VCS generation
+ Improved Performance
+ Supports aggressive VDD scaling

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Impact of Stability Assist Methods on SNM

[21] R. Mann, SSE 2010

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Impact of Write Assist Methods on WRM

[21] R. Mann, SSE 2010

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


What Can Technology Do?
[8] H. Yamauchi, ISSCC 2009

140
EOT: Equivalent Oxide Thickness
120
HKMG
Introduction
100
Vth (mV)

80

60
40

20
15n 22n 32n 45n 65n
Process generation (nm)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


6T-SRAM w/ Asymmetric MOSFET (1)
[22] K. Nii, ISSCC 2010

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Asymmetric MOSFET (2)
[22] K. Nii, ISSCC 2010

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Read-ability and Signal Margin

 Signal Margin and read-ability VMIN affected by the


following issues:
 Bitcell IREAD and variation within Sense-Amplifier
complex
 Sense-Amplifier mismatch
 Mismatch in delay generation
 Other components (IOFF, SA timing relationships)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


IREAD , Bitcell and Sense-Amp (SA) Mismatch
SA set time controls separation of two distributions

Effective Signal Distribution


SA Vth Mismatch Effective includes:
Distribution Signal
Distribution • Bitcell IREAD variability
• SA SET delay variation
• IOFF from unselected cells
• Other SA timing sensitivities

-40 -20 0 20 40 60 mV

[23] R. Houle, CICC 2007

Probability of SA misread occurs when Prob (effective signal) < Vth mismatch of SA

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Bitcell Tracking Circuits
 Sense-Amplifier delay is critical
to overall yield
 Logic-based delay may not
capture bitcell IREAD
characteristics
 Delay requires close
tracking to bitcell

[14] H. Pilo, ISSCC 2011

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


SRAM Leakage and Leakage Reduction
 Many sources of SRAM Leakage, the most
influential are:
 SRAM Bitcell
 Large repetitive SRAM Periphery Circuits
 Word-Line Driver
 Read / Write Data Path
 HKMG eliminates most of the Gate-tunnel
component
 Sub-threshold Leakage the predominant source
[24] K. Osada, JSSC 2003
 Techniques most commonly grouped into two
categories VSS terminal or VDD terminal
 Many techniques exist to limit leakage as much
as possible without impacting retention

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Reducing Bitcell Leakage: VSS Terminal
 VSS terminal gating captures all leakage
components of bitcell, however,
 Pull-down NFET must be made large to
[25] A. Bhavnagarwala, VLSI 2004 not limit the IREAD of all selected bitcells
BLT VGN BLC  Results in large area impact
D
VDD  Large AC switching current when turning
on/off sleep device
W WL
L

GND GND

CELL

SUB ARRAY

VGN
SRB D

GN
D
SLP

[26] J. Chang, VLSI 2006

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Reducing Bitcell Leakage: VDD Terminal
[27] F. Hamzaoglu, ISSCC 2008  VDD terminal scheme most
commonly used in last few
technologies
 PFET Sleep device only needs
to be large enough to charge
subarray internal supply
 Area benefit compared to VSS
scheme
 This scheme does not capture
leakage across PG device (10-
25% of total leakage)
 Active feedback control to
maximize leakage savings
without impacting data
retention

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Reducing Bitcell Leakage: VDD Terminal (2)
[28] V. Ramadurai, JSSC 2009  This scheme has other
advantages:
 Captures Word-Line Leakage
 Transparent to user
 Transition between active and
low-power mode does no
require wake-up cycles
 Fast Subarray VDD charge-up
enabled by Deep-Trench
capacitor

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


8T Bitcell: 2-Port or Fast Register-File
~30% larger area
compared to 6T

Dedicated Read Port


typically faster than 6T, but
leakage from unselected
bitcells limits # of cells/BL
Applications
• 1-Port RF (1RW) with Improved VMIN
• Dedicated read port for a non-
disturb read operation
• No half-select nor bit-write allowed
(strong PG, weak PU for improved
write-ability)
• 2-port RF (1R1W)
• 6T transistors designed as typical
single-port SRAM
Differential Write Single-ended
operation domino read • Concurrent Write/Read Operation
operation
• Dedicated Write-only Port
• Dedicated Read-only Port

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


8T Bitcell: Dual Port (2RW)
Application
• Extremely versatile memory
• Dual-Port (2RW) allows parallel access to
the memory for writing, reading (or both)
• A/B port configured as both read and
write and can be accessed
• Versatility comes with many challenges asynchronously w.r.t. each other

• Different row and different/common column


address: very robust operation
• Common row and different/common column
address: bitcell requires modifications
• Stability challenged  PD made very large to
compensate -ratio when both PGs active
• write-ability challenged as writing port is
loaded down by reading port
• Readability challenged when both ports
discharge BLs
• Closely spaced Port A/B Word-Lines couple [29] K. Nii, JSSC 2009
asynchronously to further reduce margins

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Content Addressable Memory (TCAM)
Search Lines Match Lines
Priority Encoder
Mismatch

Match
Match Address, Hit Flag
Match XY Encoded TCAM Bitcell

Mismatch WLY

SLX

SLY
Search Line Driver

Data to search: 0 1 1 0 1
ML
• Data is stored in array similar to SRAM

NTy
• Searches are performed concurrently across all
entries via search lines

NTx
• If Search data = Stored data  HIT WLX

• Priority encoder outputs address from HIT entry(s)


with highest priority, typically LSB HIT address

BLC

BLT
• Writing a “0” to both cells masks bit (X)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


TCAM Circuit-Level Power Reduction:
Self-Referencing ML Sensing
Latch [30] I. Arsovski, CICC 2006
PRE CNTL
Search-Lines (SL)
RML RST
PRE
SN
MLOUT
Match ML0
CS
Lines
(ML) keeper
ML
MLn RST S1

Conventional
PRE Voltage-swing
Self-Ref. ML Sensing
ML reduction
~ 60%

Sensing-delay
MLOUT reduction
~90%

Time

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Why eDRAM?

• SoC performance and area limited by SRAM cache size


– eDRAM has 2x-4x area advantage
• Large improvement in eDRAM performance enabled by new
circuits and technology
• SRAM Leakage power dominating over active power
– eDRAM has 2x-10x power advantage
• SER 250x advantage over SRAM
• Trench-based eDRAM is highly compatible to high-performance
logic process
• When can eDRAM be substituted?
– When large contiguous memory structures are needed (>2Mb)
And…
– In applications where fast random cycle-time is not required

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


eDRAM Size - Latency Advantage
45nm eDRAM vs. SRAM Latency
eDRAM faster than SRAM

[31] H. Pilo, ISSCC 2008


[32] J. Barth, ISSCC 2010

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


eDRAM Cycle-Time vs. SRAM
Cell Node Voltage
1.2V Write-Back Pre-Charge
Develop
Signal

Sense Amp Set


Vo lts (V )

Similar to Similar to
Not required in SRAM
SRAM SRAM

0V
Time

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Typical 1T DRAM Cell Terminals
WL<2>

WL<3>

WL<1>

WL<0>

RFWL<1>

RFWL<0>

SA SA SA SA

• VWL: WL “low” level (0 to -1V)


– Suppresses access device leakage  improved retention
• VPP: WL “up” level (> VDD + VT)
– Increases access device overdrive to write full VDD to cell
• VBB: access device back-bias (0 to -1V)
– Increases access device VT  improved retention (N/A in SOI)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Reading the 1-T DRAM Cell

• Data is stored as QCELL (VCELL* CCELL)


– CCELL: 7fF to 20fF
• WL activation merges QCELL into CBIT-LINE
– VCELL is diluted after charge-share (destructive read)
• Needs to be restored to a full level (write-back) by sense-amplifier
– Transfer Ratio (TR) = CCELL / ( CBIT-LINE + CCELL )
– Access and cycle time affected by charge transfer rate
• Access Transistor read current, and effective resistance to the CCELL
• Two distinct Architectures to read signal:
– Conventional Sense-amp: Long BLs and Low TR (15% to 35%)
– Domino-style (-Sense-amp): Short BLs and High TR (83%)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Grounded, SET EQ
Differential Sense
– Direct Write Scheme
• BL driven to full rail before WL activation BT
• Enabled by isolated Sense-amp SET and
BL twist scheme
BC

[33] J. Barth, ISSCC 2004


[34] M. Jacunski, CICC 2010
Write Read
1.6 WL
Voltage [V]

1.2 Direct Write


BT SET Write Back
0.8
Node
0.4 Reference
Bit-Line
BC
0.0

Time [ns]
tRC

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Refresh Operations
• Cell charge gradually leaks to ground
– Primary mechanism: subthreshold leakage (1.5X / 10C increase)
• Refresh operation = Read operation
– Refresh address internally generated by refresh counter
– Data-bus not activated
• Refresh power ~ ½ read power
• Each row selected within specific time interval
– Data Retention specification at highest application temperature
• Example: 8K-Word-Line macro, operating at 4ns cycle
time and 640s data retention spec
– One Word-Line must be refreshed every 80ns (640s / 8K)
– At 4ns cycle, availability is degraded to 95%
• 5% of operations consumed by refresh

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Concurrent Refresh
• Developed to improve
eDRAM availability
BANK
• Separate bank address
and independent Refresh BANK
Address Counter (RAC)

User Row Address


• Multiple banks accessed BANK
simultaneously
BANK
• User may select banks to
refresh RREQ
RAC
• eDRAM availability
improved to > 99%
RAC+1

[35] T. Kirihata, JSSC 2003

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Outline
• Introduction
• 6T SRAM: Industry’s Most Pervasive Embedded Memory
– Stability, Write-ability, Read-ability
– Device Variability Effects on SRAM
– Design Margining, Redundancy and Yield

• SRAM Assist and Power Reduction


• Static Memories Beyond the 6T: Multi-Port, TCAM
• eDRAM
• SOI Considerations
• Test of SoC Memories
• Summary / Q&A

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


SOI in Embedded Memory
• SOI Advantages
– Latch-up eliminated, no well connections required, smaller foot-print
– Easier implementation of SRAM Dual Power supply design
– 5-7x reduction of soft errors
– Decreased junction capacitance
• Reduction in Bitline load for improved signal margin
• Challenges
– Body potential modulated by transistor G/D/S coupling and leakage
which affects transistor Vth
• Increased Sense-Amp mis-match
• Increased variability reduces bitcell margin
• Pattern dependency on stability margin
• Variations in timing (1st switch vs. nth switch)

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Sense-Amplifier Mismatch
• Body contacted (BC) device eliminates mismatch
– Area from larger BC device compensated by the elimination of
array Well/Substrate ties not required in SOI

SET

N+ S/D

PC bridge
P+ body contact

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Pattern Dependency: Write to Half-Select
WL
• SOI bitcell bodies are modulated during write
cycle from WL/BL/NC/NT coupling
T6 – Stability after a write operation can be
affected
T2 NT NC – Faster cycle time is worse as bodies have not

BLC
BLT

T3 T4 had chance to discharge to quiescent state

NC
WL NT NT

Write Half-select
T6 (PU)
T4/T6 bodies increase,
T4 (PD) lowering the trip point of
the feedback inverter

T2 (PG) T2 body increases with respect


T3 (PD) to T3 body, reducing the -
ratio

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


DRAM in SOI
Body potential modulated by coupling and leakage
Better source follower vs. bulk during write back (body coupling)
Improved write ‘1’ cell voltage
Degraded Ioff/ Retention if body floats high (body leakage)
GND pre-charge keeps body low
Eliminate long periods with BL high (limit page mode)
CA
ILeak > ILeak GND WL 1Volt
FWD REV
BL Body Node
When BL = GND BOX
DT
Body  GND FWD REV
NB
[36] J. Barth, ISSCC 2007

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Built In Self Test (BIST)
• On chip circuitry to help streamline structured test of a core
• Eliminates Customer Burden of Macro Test
• Common Test Flow Development
– Eliminates Part Specific Test Development
• Enables:
– Low Cost Automated Test Equipment (ATE)
– At-Speed Test Self Repair
• One BIST engine amortized across several memory macros
– Area overhead < 5%

Why BIST?  Testing through pads would require complicated


address/data/command muxing for each memory and difficult to
test at-speed

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Total Built-In Self Repair Solution

• Repair Solution: processing failures and determining


a way to repair them, allocating redundancy and
storing results in fuses
– Computed on-chip and stays on-chip

• Macros tested in parallel and at speed

• Unified Chip Level Fusing

• BIST Re-used for Burn-in


– Dynamic, static, In-Situ Fail Accumulation

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Memory BIST Architecture
MEMCLK/CCC1
SCLK/SC1 (50MHz) (At-speed) [37] M. Ouellette, SOI 2008
PLL

ABIST • < 15% of test circuitry (FBIO) runs


BC Interface

BIST at speed
CNTL BIST • Moving Slow/Fast interface as
Engine FBIO
BIO FBIO RAM close to memory as possible
• Majority of test logic is slow
resulting in area reduction and
improved timing closure

CNTLCLK/CNTLC1
(50MHz) At Speed
FARR
Slow

• BISTCNTL: common control interface for BIST operations, fuse/redundancy


operations and clocking/test modes
• BIST: self test engine that generates test stimulus
• FARR: Failing Address and Repair Register: stores/allocates redundancy
• BIO/FBIO: BIST I/O, provides interface to memories

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE


Summary
• 6T SRAM continues to provide the most flexible, and highest
performance solution for SoC memory

• Traditional 6T SRAM designs will continue to scale, but at a


reduced rate compared to last several technology nodes

• SRAM circuit and process innovations are key to prolong the


area, power and performance scaling

• For large, contiguous memory blocks, eDRAM can provide a


density and cost advantage over embedded SRAM given the
improvements in performance and integration

© 2011 IEEE IEEE International Solid-State Circuits Conference © 2011 IEEE

You might also like