Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

Dynamic Networks

CS 213, LECTURE 15
L.N. Bhuyan
8/6/2014 CS258 S99 2
What is Dynamic Network
Dynamic Network is the network that can
connect any input to any output by enabling or
disabling some switches in the network
Examples:
- Shared Bus: The bus arbiter connects a
processor to a memory
- Crossbar: Consists of a lot of switching
elements, which can be enabled to connect many
inputs to many outputs simultaneously
- Multistage Network: Consists of several stages
of switches that are enabled to get connections
- The nodes in static networks (like Mesh) also
consist of dynamic crossbars
8/6/2014 CS258 S99 3
Crossbar Switch Design
Complexity O(N**2) for an NXN Crossbar
Cross-bar
Input
Buffer
Control
Output
Ports
Input
Receiver
Transmiter
Ports
Routing, Scheduling
Output
Buffer
8/6/2014 CS258 S99 4
How do you build a crossbar
I
o
I1
I
2
I
3
I
o I
1
I
2
I
3
O0
O
i
O
2
O
3
N**2 switches => Cost O(N**2)
Time taken by the arbiter = O(N**2)
Multiplexors are controlled from
the arbiter/controller/scheduler
From Control
8/6/2014 CS258 S99 5
Crossbar Contd.
An NXN Crossbar allows all N inputs to be connected
simultaneously to all N outputs
It allows all one-to-one mappings, called permutations.
No. of permutations = N!
When two or more inputs request the same output, it is
called CONFLICT. Only one of them is connected and
others are either dropped or buffered
When processors access memories through crossbar, this
situation is called memory access conflicts
Given p as the probability of request by a processor per
cycle and assuming that each of N processors request is
uniformly directed to all N memories, the average number
of connections allowed per cycle, called Bandwidth (BW) is
BW = N{1- (1-p/N)**N} Derive this!!!
8/6/2014 CS258 S99 6
Input buffered swtich
Independent routing logic per input
Scheduler logic arbitrates each output - priority, FIFO, random
Head-of-line blocking problem The head packet in a buffer cannot depart because
the output is busy with another packet. The second packet may be destined to an output
that is free, but cannot depart due to blocking by the first packet => One solution is to
create multiple input queues, one per output, called Virtual Output Queuing adopted in
most routers.
Scheduler Design How to ensure maximum simultaneous connections is a challenging
research area.
Cross-bar
Output
Ports
Input
Ports
Scheduling
R0
R1
R2
R3
8/6/2014 CS258 S99 7
Problems with Input-Buffered Switch
FIFO Input buffers give rise to Head of the Line
(HOL) problem
Current routers employ a separate input queue
for each output, called virtual output queue
(VOQ)
Then how to schedule the packets from different
VOQs for transmission?
8/6/2014 CS258 S99 8
VOQ-based Input Buffered Switch
8/6/2014 CS258 S99 9
Scheduling in Input Buffered Switch
n independent arbitration problems?
static priority, random, round-robin
simplifications due to routing algorithm?
general case is max bipartite matching Iterative
algorithms iSLIP in Cisco
Cross-bar
Output
Ports
R0
R1
R2
R3
O0
O1
O2
Input
Buffers
8/6/2014 CS258 S99 10
Output Buffered Switch
How would you build a shared pool?
Control
Output
Ports
Input
Ports
Output
Ports
Output
Ports
Output
Ports
R0
R1
R2
R3
8/6/2014 CS258 S99 11
Output scheduling
n independent arbitration problems?
static priority, random, round-robin
simplifications due to routing algorithm?
general case is max bipartite matching
Cross-bar
Output
Ports
R0
R1
R2
R3
O0
O1
O2
Input
Buffers
8/6/2014 CS258 S99 12
Multistage Interconnection Network (MIN)
Crossbar switch is not scalable. How about a
network consisting of multiple stages of small
crossbar switches? Has the following properties.
NxN network for N=2
n
Consists of log
2
N stages of 2x2 switches
Has N/2 2x2 switches per stage
Cost O(N log n) instead of O(N
2
) for Crossbar
For N= a
n
,

a MIN can be similarly designed with
axa switches


8/6/2014 CS258 S99 13
Multistage interconnection networks
0
1
2
3
4
5
6
7
000
001
010
011
100
101
110
111
1
1
0


Complexity: Omega Network Complexity O(Nlog
2
N)
Self Routing: The source node generates a tag, which is binary equivalent
Of the destination. At each switch, the corresponding tag bit is checked.
If the bit is 0, the input is connected to the upper output. If it is 1, the
Input is connected to the lower output. If both inputs have either 0 or 1,
It is a switch conflict. One of them is connected. The other one is rejected or
buffered at the switch (if it has buffer => buffered crossbar)
8/6/2014 CS258 S99 14
What is Shuffle?
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
=0
=1
=2
=3
=4
=5
=6
=7
(a) Perfect shuffle (b) Inverse perfect shuffle
shuffle interconnection
S(a
n-1
a
n-2


a
1
a
0
) = (a
n-2
a
n-3


a
0
a
n-1
)

8/6/2014 CS258 S99 15
Omega Network
Every stage of switches is preceded by a perfect
shuffle interconnection
S(a
n-1
a
n-2


a
1
a
0
) = (a
n-2
a
n-3


a
0
a
n-1
)
An input can be connected to a straight or
exchange output in a 2x2 switch.
E(a
n-1
a
n-2


a
1
a
0
) = (a
n-1
a
n-2


a
1

0
)
To route a message/packet in an Omega
network, the destination tag which is binary
equivalent of the destination is used, (d
n-1
d
n-2


d
1
d
0
). The i
th
bit d
i
is used to control the routing
at the ith stage counted from the right with 0 <= i
<= n-1. If d
i
= 0, the input is connected to the
upper output. If d
i
= 1, it is connected to the
lower output.
8/6/2014 CS258 S99 16
Self Routing
A processor generates a tag that is binary equivalent of the
destination
MSB controls the leftmost stage and the lsb controls the
rightmost stage of the Omega network. A small controller
inside the 2 x 2 switch senses this bit and enables the
connection
If bit c
i
= 0, the request is to the upper output; if it is 1, the
request is to the lower output.
Based on digit if switch size is greater than 2
Network conflict - Select Round Robin
Less Bandwidth than crossbar, but more cost effective
What about QoS? Future research

8/6/2014 CS258 S99 17
Theorem: The Omega network is self routing
Let source be (s
n-1
s
n-2
s
2


s
1
s
0
) and
destination be (d
n-1
d
n-2


d
2


d
1
d
0
). Before Stage
1, the source is switched to the position (s
n-2
s
n-3


s
1


s
0
s
n-1
) due to perfect shuffle connection.
After Stage 1 it is switched to (s
n-2
s
n-3


s
1


s
0
d
n-1
) as per the (n-1)
th
of the destination.
Before 2
nd
stage of the switches, the source is
connected to (s
n-3


s
0
d
n-1
s
n-2
) as after 2
nd
stage
it becomes (s
n-3


s
0
d
n-1
d
n-2
)
If we continue like this for n stages, the source
matches (d
n-1
d
n-2


d
i


d
1
d
0
) which is the
destination.
8/6/2014 CS258 S99 18
Switch Size axa
Let N = a**n
The MIN will consist of n stages of axa crossbar
switches with N/a switches per stage.
The routing will be based on digit (a-1) <= I => 0
based on radix a
Interconnection based on a-shuffle
Home Work:
Prove self routing based on radix a. Draw a 16x16 MIN based
on 4x4 switches and explain its operation
Derive the BW of an Omega network with N=a**n with same
input parameters as Crossbar (Slide 5)

8/6/2014 CS258 S99 19
Example: SP
8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single
40 MHz clock
packet sw, cut-through, no virtual channel, source-based
routing
variable packet <= 255 bytes, 31 byte fifo per input, 7 bytes
per output, 16 phit links

P0P1P2P3 P15
E0E1E2E3 E15
Intra-Rack Host Ports
Inter-Rack External Switch Ports
16-node Rack
Switch
Board
Multi-rack Configuration
8/6/2014 CS258 S99 20
Example: IBM SP vulcan switch
Many gigabit ethernet switches use similar
design without the cut-through
FIFO
CRC
check
Route
control
Flow
Control
8 8
D
e
s
e
r
i
a
l
i
z
e
r
64
Input Port
RAM
64x128
In
Arb
Out
Arb
8 x 8
Crossbar
Central
Queue
FIFO
CRC
Gen
Flow
Control
8
8
S
e
r
i
a
l
i
z
e
r
64
Ouput Port
XBar
Arb
FIFO
CRC
check
Route
control
Flow
Control
8 8
D
e
s
e
r
i
a
l
i
z
e
r Input Port

64

FIFO
CRC
Gen
Flow
Control
8
8
S
e
r
i
a
l
i
z
e
r
Ouput Port
XBar
Arb
8

8
8/6/2014 CS258 S99 21
SGI SPIDER Chip
8/6/2014 CS258 S99 22
SPIDER OPERATION
The physical transmission layer for each port is based on a pair
of Source Synchronous Drivers and Receivers (SSD and SSR),
which transmit and receive 20 data bits and a data framing
signal at 400 MBaud.
The data link level guarantees reliable transmission using a
CCITT-CRC code with a go-back-n sliding window protocol [1]
retry mechanism, and is referred to as the Link Level Protocol
(LLP).
The message layer defines 4 virtual channels and a credit
based flow control scheme to support arbitrary message
lengths, as well as a header format to specify message
destination, priority, and congestion control options.
The receive buffers of a port maintain a separate linked list of
messages for each of the 5 possible output ports for each
virtual channel to avoid the block at head of queue bottleneck.




8/6/2014 CS258 S99 23
SPIDER Crossbar Arbitration
To maximize bandwidth through the crossbar without using
unreasonable buffering, each virtual channel buffer is
organized as a set of linked lists. There is one linked list for
each possible output port for each virtual channel. This solution
avoids the block at head of queue problem. To maximize
crossbar efficiency, each virtual channel from each port can
request arbitration for every possible destination. Each
arbitration cycle, the arbiter chooses up to 6 winners from as
many as 120 arbitration candidates to maximize crossbar
utilization.
Messages accumulate a network age as they are routed,
increasing their priority to avoid starvation and promote network
fairness. In order to avoid starvation and encourage network
fairness, the arbiter is rotated each arbitration cycle to favor the
highest priority requestor. Priority is based on the age field of a
message header.


8/6/2014 CS258 S99 24
Arbitration Contd.
After data is received by the SSR and synchronized, it enters the chip core and
begins several operations in parallel. Table lookup and crossbar arbitration is
normally serialized, as the exit port must be known before arbitration begins.
To parallelize these operations, table lookup is pipelined across SPIDER chips.
While arbitration progresses. the table lookup is performed for the next
SPIDER chip, which depends on the destination ID and the direction field. This
does increase table size, as a full table is required for each neighboring
SPIDER chip, but it reduces latency by a full clock. Pipelined tables also add
flexibility to possible routes, as different exit ports can be given depending on
where a messages came from as well as where it is going.


8/6/2014 CS258 S99 25
Summary
Routing Algorithms restrict the set of routes
within the topology
simple mechanism selects turn at each hop
arithmetic, selection, lookup
Deadlock-free if channel dependence graph is
acyclic
limit turns to eliminate dependences
add separate channel resources to break dependences
combination of topology, algorithm, and switch design
Deterministic vs. adaptive routing
Switch design issues
input/output/pooled buffering, routing logic, selection logic
Flow control
Real networks are a package of design choices

You might also like