Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 45

Low-Latency Interfaces for

Mixed-Timing Domains
[in DAC-01]

Tiberiu Chelcea Steven M. Nowick

Department of Computer Science


Columbia University

{tibi,nowick}@cs.columbia.edu
Introduction
Key Trend in VLSI systems: systems-on-a-chip (SoC)
Two fundamental challenges:
 mixed-timing domains
 long interconnect delays

Our Goal: design of efficient interface circuits

Desirable Features:
 arbitrarily robust
 low-latency, high-throughput
 modularity, scalability
Few satisfactory solutions to date….
Timing Issues in SoC Design

(a) single-clock (b) mixed-timing domains

sync or Domain #1
async
Domain #1
long long
inter- inter-
connect connect

Domain #2 sync or
async
Domain #2
Timing Issues in SoC Design (cont.)
Solution: provide interface circuits
(a) single-clock (b) mixed-timing domains

sync or Domain #1
Domain #1 async

long long
inter- inter-
connect connect

sync or
Domain #2 async
Domain #2

Carloni et al., NEW: “mixed-timing NEW: “mixed-timing


“relay stations” FIFO’s” “relay stations”
Contributions
Complete set of mixed-timing interface circuits:
 sync-sync, async-sync, sync-async, async-async
Features:
 Arbitrary Robustness: wrt synchronization failures
 High-Throughput:
 in steady-state operation: no synchronization overhead
 Low-Latency: “fast restart”
 in empty FIFO: only synchronization overhead
 Reusability:
 each interface partitioned into reusable sub-components

Two Contributions:
 Mixed-Timing FIFO’s
 Mixed-Timing Relay Stations
Contribution #1: Mixed-Timing FIFO’s
Addresses issue of interfacing mixed-timing domains
Features: token ring architecture
 circular array of identical cells
 shared buses: data + control
 data: “immobile” once enqueued
 distributed control: allows concurrent put/get operations

2 circulating tokens: define tail & head of queue


Potential benefits:
 low latency
 low power
 scalability
Contribution #2: Mixed-Timing Relay Stations
Addresses issue of long interconnect delays
“Latency-Insensitive Protocols”: safely tolerate long
interconnect delays between systems

Prior Contribution: introduce “relay stations”


 single-clock domains (Carloni et al., ICCAD-99)

Our Contribution: introduce “mixed-timing relay stations”


 mixed-clock (sync-sync)
 async-sync

First proposed solutions to date….


Related Work
Single-Clock Domains: handling clock discrepancies
 clock skew and jitter (Kol98, Greenstreet95)
 long interconnect delays (Carloni99)

Mixed-Timing Domains: 3 common approaches


 Use “Wrapper Logic”:
 add logic layer to synchronize data/control
(Seitz80, Seizovic94)
 drawback: long latencies in communication

 Modify Receiver’s Clock:


 stretchableand pausible clocks
(Chapiro84, Yun96, Bormann97, Sjogren/Myers97)
 drawback: penalties in restarting clock
Related Work: Closer Approaches
Mixed-Timing Domains (cont.):
 Interface Circuits: Mixed-Clock FIFO’s (Intel, Jex et al. 1997):
 drawback: significant area overhead =
synchronizer for each cell

Our approach: mixed-clock FIFO’s


… only 2 synchronizers for entire FIFO
Outline
 Mixed-Clock Interfaces
 FIFO
 Relay Station

• Async-Sync Interfaces
 FIFO
 Relay Station

• Results

• Conclusions
Mixed-Clock FIFO: Block Level
Initiates put operations
InitiatesIndicates
get operations
data items validity
Indicates
Bus forwhen (always
dataFIFO
items full 1 in this design)
Indicates when FIFO empty
full req_get

Mixed-Clock
req_put valid_get
synchronous synchronous

FIFO
empty
put inteface data_put data_get get interface
CLK_put CLK_get

Bus for data items


Controls put operations Controls get operations
Mixed-Clock FIFO: Steady-State Simulation
PutSender
Controller Ata the
starts put
enables
Steady end of clock
aoperation
putFIFO
state: neithercycle
operation full, nor empty
TAIL Cell enqueues data
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
Mixed-Clock FIFO: Steady-State Simulation
Passes the put token
TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
Mixed-Clock FIFO: Steady-State Simulation
TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD

Get Operation
Mixed-Clock FIFO: Steady-State Simulation
TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
SteadyPuts
Steady state operation: state operation:
and Gets “reasonably spaced”
Zero synchronization
Zero probability overheadfailure
of synchronization
Mixed-Clock FIFO: Steady-State Simulation
TAIL TAIL TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
Mixed-Clock FIFO: Full Scenario
Put interface stalled FIFO FULL
TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
Mixed-Clock FIFO: Full Scenario
TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
Mixed-Clock FIFO: Full Scenario
FIFO NOT FULL
TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
Mixed-Clock FIFO: Full Scenario
TAIL
full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

HEAD
Mixed-Clock FIFO: Cell Implementation
Enables a put
Validity bitoperation Data item in
in Put Part
Synchronous reusable
CLK_put en_put
en_put req_put
req_put data_put
data_put

ptok_out
ptok_out ptok_in
ptok_in
En
Data
StatusValidity
Bits:
Controller
Cell FULL f_i
f_i REG
SR
Cell EMPTY e_ie_i

En
En
gtok_out
gtok_out gtok_in
gtok_in

CLK_get en_get
en_get valid
valid data_get
data_get
reusable
Synchronous Get Part
Enables a get operation Data item out
Validity bit out
Mixed-Clock FIFO: Architecture

full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector
Synchronization Issues
Challenge: interfaces are highly-concurrent
 Global “FIFO state”: controlled by 2 different clocks

Problem #1: Metastability


 Each FIFO interface needs clean state signals

Solution: Synchronize “full” & “empty” signals


 “full” with CLK_put
 “empty” with CLK_get

Add 2 (or more) synchronizing latches to each signal


Observable “full”/“empty” safely approximate true
FIFO state
Synchronization Issues (cont.)
Problem #2: FIFO now may underflow/overflow!
 synchronizing latches add extra latency
Solution: Modify definitions of “full” and “empty”
New FULL: 0 or 1 empty cells left
New EMPTY: 0 or 1 full cells left
Synchronizing Latches
CLK_put
CLK_put
full
e_0 e_1 e_2 e_3
e_1 e_2 e_3 e_0  Two consecutive
NO two cells = FIFO not full
consecutive
empty
CLK_put empty cells

New Full Detector


Synchronization Issues (cont.)
Problem #3: Potential for deadlock
Scenario: suppose only 1 data item in quiescent FIFO
 FIFO still considered “empty” (new definition)
 Get interface: cannot dequeue data item!

Solution: bi-modal “empty detector”, combines:


 “New empty” detector (0 or 1 data items)
 “True empty” detector (0 data items)

Two results folded into single global “empty” signal


Synchronization Issues: Avoiding Deadlock
Bi-modal
Detects empty
“new empty”detection: select
(0 or 1 empty cells)either ne or oe into
Combine
CLK_get global “empty”
Whenreconfigured
NOT
CLK_get When
reconfigured,
ne use “oe”:
use “ne”:
FIFO quiescent
FIFO active 

f_0 f_1 f_2 f_3 avoids
avoidsdeadlock
underflow
f_1 f_2 f_3 f_0

CLK_get empty

en_get
CLK_get
CLK_get

f_0 f_1 f_2 f_3 oe

CLK_get

req_get
Detects “true empty” (0 empty cells)
Reconfigure whenever active
get interface
Mixed-Clock FIFO: Architecture

full
Full Detector
req_put Put
Controller
data_put
CLK_put

CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector
Put/Get Controllers

en_get
req_get
full en_put
req_put empty valid_get

valid

Put Controller: Get Controller:


 enables put operation  enables get operation
 disabled when FIFO full  indicates when data valid
 disabled when FIFO empty
Outline
 Mixed-Clock Interfaces
 FIFO
 Relay Station

• Async-Sync Interfaces
 FIFO
 Relay Station

• Results

• Conclusions
Relay Stations: Overview
Proposed by Carloni et al. (ICCAD’99)

Delay = > 1 cycle


Delay
system
system = sends
1 now
1 1sends
cycle“data
“dataitems”
packets”
to system
to system
2 2
System 1

System 2
RS RS RS RS

CLK
Data Packet = “stop” control = stopIn + stopOut
data item -+apply counter-pressure
Steady State: passstall
- result: datacommunication
on every cycle
validity bit(either valid or invalid)

Problem: Works only for single-clock systems!


Relay Stations: Implementation
MR

switch
packetIn packetOut

mux
AR

stopOut Control stopIn

• In normal operation:
 packetIn copied to MR and forwarded on packetOut

• When stopped (stopIn=1):


 stopOut raised on the next clock edge
 extra packet copied to AR
Relay Station vs. Mixed-Clock FIFO
validIn validOut full empty
Mixed-
Relay
stopOut stopIn req_put Clock req_get
Station
FIFO
dataIn dataOut dataIn dataOut

Steady state: always pass data Steady state: only pass data
when requested

Data items: both valid & invalid Data items: only valid data

Stopping mechanism: stopIn Stopping mechanism: none


& stopOut (only full/empty)
Mixed-Clock Relay Stations (MCRS)
System 1

System 2
RS RS MCRS
RS RS

CLK1
CLK CLK2
Mixed-Clock Relay Station derived from the Mixed-Clock FIFO
Change ONLY Put and Get Controllers

stopOut
full Station stopIn
req_get
Mixed-Clock
Mixed-Clock

valid_put
req_put valid_get
valid_get
FIFO

packetIn empty packetOut


Relay

data_put
data_put data_get
data_get
CLK1
CLK_put CLK2
CLK_get
Mixed-Clock Relay Station: Implementation
Mixed-Clock Relay Station vs. Mixed-Clock FIFO
Identical:
- FIFO cells
- Full/Empty detectors (...or can simplify)
Only modify: Put & Get Controllers

Always enqueue data (unless full)


en_get
stopIn
full en_put
empty validOut
validIn

to cells valid

Put Controller Get Controller


Outline
• Mixed-Clock Interfaces
 FIFO
 Relay Station

• Async-Sync Interfaces
 FIFO
 Relay Station

• Results

• Conclusions
Async-Sync FIFO: Block Level
full req_get put_req req_get

Mixed-Clock

Async-Sync
req_put valid_get valid_get
put_ack

FIFO

FIFO
empty empty
data_put data_get put_data data_get
CLK_put CLK_get CLK_get

Async Domain Sync Domain

Asynchronous put interface: uses handshaking communication


 put_req: request operation
 put_ack: acknowledge completion
 no “full” signal
Synchronous get interface: no change
Async-Sync FIFO: Architecture
NoFIFO
When Full Detector or Put Controller
full, acknowledgement withheld
Asynchronous put interface
until safe to perform the put operation

put_ack
put_req
put_data

cell cell cell cell cell


CLK_get
data_get
req_get
Controller
Get

valid_get
empty Empty Detector

Get interface: exactly as in Mixed-Clock FIFO


Async-Sync FIFO: Cell Implementation
Data Validity Asynchronous Put Part
Controller put_req put_data put_ack

we C
+ OPT reusable
e_i we1

from async
new
DV REG FIFO (Async00)
f_i

En
gtok_in
gtok_out

CLK_get en_get get_data


reusable (from mixed-clock FIFO)
Synchronous Get Part
Async-Sync Relay Stations (ASRS)

Micropipeline
System 1

System 2
(async)

(sync)
ARS ARS ASRS RS

optional
CLK2
Outline
• Mixed-Clock Interfaces
 FIFO
 Relay Station

• Async-Sync Interfaces
 FIFO
 Relay Station

• Results

• Conclusions
Results
Each circuit implemented:
 using both academic and industry tools
 MINIMALIST: Burst-Mode controllers [Nowick et al. ‘99]
 PETRIFY: Petri-Net controllers [Cortadella et al. ‘97]

Pre-layout simulations: 0.6m HP CMOS technology

Experiments:
 various FIFO capacities (4/8/16 cells)
 various data widths (8/16 bits)
Results: Latency
Experimental Setup:
- 8-bit data items
- various FIFO capacities (4, 8, 16)

Latency = time from enqueuing to dequeueing data into


an empty FIFO

4-place 8-place 16-place


Design
Min Max Min Max Min Max
Mixed-Clock 5.43 6.34 5.79 6.64 6.14 7.17
Async-Sync 5.53 6.45 6.13 7.17 6.47 7.51
Mixed-Clock RS 5.48 6.41 6.05 7.02 6.23 7.28
Async-Sync RS 5.61 6.35 6.18 7.13 6.57 7.62

For each design, latency not uniquely defined: Min/Max


Results: Maximum Operating Rate
Synchronous interfaces: MegaHertz
Asynchronous interfaces: MegaOps/sec

4-place 8-place 16-place


Design
Put Get Put Get Put Get
Mixed-Clock 565 549 544 523 505 484
Async-Sync 421 549 379 523 357 484
Mixed-Clock RS 580 539 550 517 509 475
Async-Sync RS 421 539 379 517 357 475

Put vs. Get rates:


- sync put faster than sync get
- async put slower than sync get
Conclusions
Introduced several new low-latency interface circuits
Address 2 major issues in SoC design:
 Mixed-timing domains
 mixed-clock FIFO
 async-sync FIFO

 Long interconnect delays


 mixed-clock relay station
 async-sync relay station

Other designs implemented and simulated:


 Sync-Async FIFO + Relay Station
 Async-Async FIFO + Relay Station

Reusable components: mix & match to build circuits


Provide useful set of interface circuits for SoC design

You might also like