Professional Documents
Culture Documents
Low-Latency Virtual-Channel Routers For On-Chip Networks: Robert Mullins, Andrew West, Simon Moore
Low-Latency Virtual-Channel Routers For On-Chip Networks: Robert Mullins, Andrew West, Simon Moore
Outline
s
Motivation
Why Network-on-chip (NoC)
Why NoC
s
Bus based inter-connects were sufficient until now But not now
Shared bus is slow (arbitrates between several requesters) More components increase loading => speed drops further Ad-hoc routing of wires results in backend complications, lower performance and higher power consumption
Challenges
Topologies, Routing protocol Network and router design with small footprint and low latency
The need to put repeaters into long wires allows us to add the switching needed to implement a network at little additional cost Makes efficient use of critical global wiring resources by sharing them across different senders and receivers Simplifies overall design Design a single router and do copy-paste in both dimension
s s
Layered Design of reconfigurable micronetworks. Exploits methods and tools used for general network. Micronetworks based on the ISO/OSI model. NoC architecture consists of Physical, Data link, and Network layers.
A typical NoC
s
Layered Design of reconfigurable micronetworks. Exploits methods and Implemented in tools used for general cores, enables endto-end reliable network. transport Micronetworks based on the ISO/OSI model. NoC architecture consists of Physical, Data link, and Network layers.
A typical NoC
s
Layered Design of reconfigurable micronetworks. Exploits methods and Implemented in tools used for general cores, enables endto-end reliable network. transport Micronetworks based on the ISO/OSI model. NoC architecture of Physical, Data and Network layers.
Multi hop route consists packet setup, link, addressing, etc
A typical NoC
s
Layered Design of reconfigurable micronetworks. Exploits methods and Implemented in tools used for general cores, enables endto-end reliable network. transport Micronetworks based on the ISO/OSI model. NoC architecture of Physical, Data and Network layers.
Multi hop route consists packet setup, link, addressing, etc Contention issues, reliability issues, grouping of physical layer bits, e.g. flits
A NoC topology
s
s s
Cores Communicates With Each Other Using NoC NoC Consists of Routers (R) and Network Interfaces (NI) A NI linked to Router by Non-Pipelined Wires One or More Cores Connected to a NI
Mesh
Routing protocols
s s
We will only consider mesh topology Objective is to find a path from a source to a destination Greedy Algorithms (deterministic)
Choose shortest path (e.g. X-Y)
s s
Adaptive routing
If congestion, choose alternative path Deflection routing Is adaptive better than greedy => NOT REALLY (when only local information is used)
Switching techniques
s
Store-and-forward policy (Packet Switching): each switch waits for the full packet to arrive in switch before sending to the next switch Cut-through routing or worm hole routing: switch examines the header, decides where to send the message, and then starts forwarding it immediately
In worm hole routing, when head of message is blocked, message stays strung out over the network, potentially blocking other messages (Needs only buffer the piece of the packet that is sent between switches). Cut through routing lets the tail continue when head is blocked, storing the whole message into an intermediate switch. (Need buffer large enough to hold the largest packet).
With virtual channels, deadlock can be avoided Move message and reply on different channels => Will never have loop on a single channel
s s s
Can exploit far greater number of pins and wires May use fat data and flow control wires Objective: Design routers with minimal latency
This will also result in smaller buffers
s s s
Can exploit far greater number of pins and wires May use fat data and flow control wires Objective: Design routers with minimal latency
This will also result in smaller buffers
Routing Logic
s
Three possibilities
Return a single VC Return set of VCs on a single port Return any VCs
VC Allocation
s s
VC Allocation Logic
At every outgoing VC following logic is needed
Switch Allocation
s
Individual flits at input VCs arbitrate for access to the crossbar port Arbitration can be performed in two stages First stage
A VC among V possible VCs at every input port is selected V:1 arbiter at every input port
s s
Second stage
Winning VC at every input port is matched to the output port P:1 arbiter at every output port
s s
Switch Allocation
Issues
s s s s
VC allocation and Switch allocation are serialized A flit will either take 2 clocks to get through Else clock speed will be low Solution: Speculative switch allocation
An even better idea is to perform speculative and non-speculative switch allocation in parallel
Non-speculative allocation has higher priority Note that non-speculative allocation is done for input VCs which has already been allocated an output VC Speculative
will work
s s
Mostly one cycle delay under light load Mostly one cycle delay under heavy load
s s
Further Enhancement Is it possible to have zero cycle VC/switch allocation YES, Most of the time, thats what this paper is about!
If somehow, you know the arbitration results before flits actually arrive and fight for the VC and switch I mean, every arbitration decision
VC allocation Switch allocation Etc
Tree Arbiters
Implements large arbiters using tree of small arbiters
Matrix Arbiters
Fair and Fast arbiter implementation
A Matrix Arbiter
Safe Environment
Only one request may arrive in a cycle Thus it is safe to assert all grant enables Thus grant can still be generated in same cycle
Unsafe Environment
Multiple request may arrive in same cycle Can still assert all grants But need to abort when multiple requests arrive in same cycle
s s
All first stage V:1 arbiters operate under safe environment However P:1 arbiters doesnt
s s s
Why will it work? Because in lightly loaded network, multiple requests for same VC/port will not arrive (few aborts) In heavily loaded network flits will remain buffered and Non-speculative arbitration (higher priority) will happen most of the time
Few aborts again
Final design
s
They have sampled a NoC based ASIC last week using this idea Runs at several GHz speeds Note that fast cycle time is possible by
Running VC allocation and Switch allocation in parallel Must use speculation, else delay will be higher (1 more cycle)
s s
Simulation results
If (doubts) Then Ask; Else Thank you; Goto Discussion; End if;