Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 9

Flattened Butterfly

Topology for On-Chip


Networks
John Kim, James Balfour, and William J.
Dally
Presented by Jun Pang

Motivation & Goal

Most on-chip networks (2D mesh): low-radix

High-radix networks

Pros: simple & short wires


Cons: long network diameter & energy inefficiency
(many hops)
Intermediate routers: reduced a lot
Small latency & lower power

Goal: how does on-chip network use highradix routers to reduce latency & energy

On-chip network

Plentiful bandwidth due to inexpensive wires


while buffers are expensive
lower cost: from smaller distance

By reducing number of channels & buffers


Concentration: several terminal nodes share
resources (routers)

Latency:

Reduce hop count at the expense of T Sto get an


overall reduced latency

On-chip Flattened Butterfly

Topology

Fig. 3a

Radix=10(concentration factor:4; 3:d1; 3:d2)


2 hops
Longer wires-> deeper buffers

Non-minimal global adaptive routing (UGAL)

Load balance & performance: path diversity


Routing minimally or non-minimally
Non-minimal: minimal Direction-ordered routing
(prevent deadlock)
Only 2 VCs

Bypass Channels &


Microarchitecture

Goal: reduce distance traveled by packets to reduce


latency and energy
Two types of muxes

Yield arbiter to guarantee global fairness

Input muxes: bypass inputs or direct inputs


Output muxes: direct outputs or bypass inputs
If primary input is idle, non-primary input is chosen
Control packet: prevent starvation

Combination of minimal and non-minimal routing

Bypass Channels (continue)

Switch architecture

Minimal: simplified crossbar switch


Non-minimal: more complexity
Non-minimal with bypass channels: less
complexity

Flow control & routing

Buffers for non-primary inputs


Separate buffers for destination of control packets
Modify UGAL to support bypass channels

Evaluation

Throughput: up to 50% throughput increase


compared to concentrated mesh
Power: about 38% power reduction
compared to mesh
Latency: about 28% latency reduction
compared to mesh

Scalability

Lower channel increasing factor than


hypercube
Three ways to scale

Concentrate factor
Dimension of the flattened butterfly
Hybrid approach

Future technology helps long wires


Increasing VCs will slightly reduce latency

Conclusion & Concerns

Flattened-butterfly:

interesting idea
Maximum distance between nodes=2
Non-minimal routing to balance load
Bypassing channel to reduce latency
Lower latency and power, high throughput compared to
mesh

Concerns:

High channel count? (bigger than mesh & torus)


Low channel utilization? (due to high channel)
Control complexity? (arbitration, control packets)
Bypass channel: good idea? (How about just use nonminimal or minimal?)

You might also like