Professional Documents
Culture Documents
Reg File
Reg File
Reg File
By Joong-Seok Moon
Register File
A set of registers that store data Consists of a small array of static memory cells Smallest size and fastest access time in memory hierarchy (Register File On-chip Cache Off-chip Cache Main Memory DISK) Frequently used by microprocessors and DSPs Permits multiple read and write ports
2-read/1-write: Scalar microprocessor (e.g. DLX) 8-read/4-write: Super-scalar microprocessor (often more than that), VLIW 1-read/1-write: DSP data/coefficient memory
Single-ended 2-read/1-write ports (Slow-write) Fully-static, No precharge required NMOS of I1 should be sized bigger because node A will be Vdd-Vth during write operation I2 should be weak (N1-N2 change the data) I3: buffer for the storage node
But actually single-ended operation (Its ok usually write is much faster than read)
wordRD2 A I1 B
N2
N3 N1
B=1: discharge bitRD (slow read for large bitline cap) B=0: hold precharge value
wrEN N6
N5
N6
bitWR/bitWR
Further optimization
bitWR=1
N4,N6 on: Node A pulled down N5 on: Node B pulled up True dual-ended write
bitWR=0
N5 on: Node B pulled down One transistor on pull-down path Single-ended write with enhanced speed
Write Operation
Static N to 2N decoder
wordline0=A0bA1bA2bA(N-1)b More than 32 registers: multi-level decoder is desired Works well with edgetriggered flip-flops for address inputs Can we connect decoder output directly to drive wordline?
A0 A1 A2
Extremely dangerous, why? Glitches Read might be ok, but write can be problematic Put latches at the decoder output
AN-1 AN-2 AN-1 AN-2 AN-1 AN-2
wordlineN-1
AN-1 AN-2
Dynamic N to 2N decoder
Domino N-input AND gate Charge sharing problem for large N Gate Keeper may be required Long NMOS chain for large N No glitch at the output Need qualified address input
A2 A1 A3 A0
Make NMOS half size Reverse input sequence Same active strenght Charge-sharing reduced
wordEN
Write Driver
Tri-state Buffer
Read-Out Circuitry
Small bitline capacitance Single-ended sensing May not need sense amplifier
Skewed buffer is fine for precharged scheme Sensing value only when bitline goes to 0 Latching old value (Latch and sensing)
Read-Out Circuitry
Nl is off Pf is on only if Vdd-Vth (read 1) Pf charges back to Vdd I1 must be sized with higher beta
After read
Architectural Consideration
W M W E M W D E M W
Pipelined processor
DLX assumes write in high-phase of clock and read in low-phase of clock: implicit bypassing But only half of the clock cycle is allowed for read Explicit bypassing: compare read and write addresses
If same: bypass write data to read output directly without read or discard read value If different: normal read
Architectural Consideration
Read caching
Compare read addresses If same, do not read and direct cached value As write-read bypass, comparators are required Make sense only if comparators consume less power than register file
In DSP, quantitative study shows that values contain more 0 than 1 For precharged register file design,
Value in memory = 0: preserve precharge Value in memory = 1: discharge precharged value in bitlines
Some comments
Skewed inverter for read-out circuit burns lots of power (slow slew rate, reduced voltage-level) Precharge time and reading time should not overlap to avoid short-circuit currents Precharge on->request read->precharge off->ack read->request precharge->read off-> Asynchronous concepts is widely used in register file design