Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

The Knockout Switch

The Barcher-Banyan Switch

The Knockout Switch

The knockout switch is a fully connected architecture which attempts to combine the
implementation simplicity of input queuing (buffer complexity is linear in the number of ports)
with the throughput performance of output queuing (permitted input load and saturation
throughput both approaching 100%). The knockout switch architecture achieves this goal
by intentionally introducing a new source of packet loss, known as buffer blocking, in addition to
packet loss mechanisms present in any switch architecture, namely buffer overflow and noise-
induced random channel errors. The rate of loss from buffer blocking can be readily controlled
and kept low, to reduce significantly the complexity of a switch based in principle on the output
queuing idea.

The knockout switch architecture is explained in the following set of diagrams.

As in the output queuing model, each fixed-length cell arriving at one of the input ports is placed
on a broadcast bus from which each of the output modules taps the cells intended for itself. It is
obvious that multicast and broadcast cells are readily supported. The output module acts as a
statistical multiplexer, deferring cells that cannot be immediately placed onto the output link
because of contention.
Each input to an output module receives the fixed-length cells broadcasted on the corresponding
input bus. The job of each packet filter is simply to pass the cell to the concentrator if the cell is
destined for that output, and to mark the cell as inactive otherwise. Such a filter can be easily
implemented by a ß-element, only one input and one output of which is used. The role of the
concentrator is to identify among its inputs those cells that are active and route them to its
leftmost outputs, one cell per output line. Note that the concentrator has only L<N outputs.
Should L+1 or more cells arrive simultaneously, only L of them will be processed via the
concentrator; all others will be lost. This is the extra packet loss source in the knockout switch.
By properly choosing L, the loss rate induced by the concentrator can be controlled and
maintained at a reasonably low level. Furthermore, the value of L required to maintain a given
loss rate is relatively small, independent of the number of inputs when the latter is large, and
grows only logarithmically in the loss rate. For example, L = 8 is sufficient to maintain the
packet loss rate in the concentrator at one packet per million, for large N and full input load, and
it only grows by one per every order of magnitude reduced in the loss rate (i.e. L = 11 is enough
for a loss rate of one packet per billion). This effect is the key to maintaining linear complexity
of the knockout switch, as the number of buffers is proportional to L×N rather than N².
The concentrator inputs receive cells, which have already been passed by the packet filters and
are known to be intended for the switch output port served by the concentrator. There are four
(generally, L) stages in the concentrator shown in the diagram. Each stage is designed to operate
like an elimination tournament. Specifically, each ß-element is programmed to set itself to the
"bar" state if there is an active cell on its left input, and to "cross" otherwise. Whenever there is
only one active cell at the inputs of a ß-element, it is allowed to pass downward. If both cells are
active, the right-hand one is "knocked out" to the next stage and contends there. Each stage
produces one "winner" among the active cells that enter it, and each subsequent stage receives
one less active cell than the previous one. Therefore, when there are k active cells, they are
guaranteed to come out on the outputs of the first k stages.

If the packet buffers in the output module diagram were to be loaded directly from the
concentrator outputs, then the leftmost buffers would tend to fill up faster, and might even
overflow despite the presence of empty buffer entries on the right. The shifter prevents that from
happening by spreading each bulk of cells arriving at its input continuously to the right; in other
words, if the last buffer to receive a packet happened to be m, then the next k cells arriving at the
shifter's input will be directed to buffers m+1, ..., m+k (modulo L). Physically, the shifter can be
implemented with an L×L Banyan network.

Because of the round-robin nature of the shifter and the fact that the buffers are filled cyclically,
they can also be emptied cyclically. At each time slot, the output line fetches a cell from a buffer
just right (cyclically) of the buffer last fetched from, beginning with buffer 1. Moreover, if the
output circuitry encounters an empty buffer, the round-robin policy of buffer filling guarantees
that all buffers are empty at that point, and the one just reached is precisely the next one to be
filled again. The output pointer can then just stop there and wait for that buffer to receive a cell,
after which the circular emptying of buffers can restart from that point.

The Knockout Switch full crossbar requires each output port to handle up to n input packets in
simultaneous inputs for same output is unlikely, especially in large switch instead implement
port to accept (l < n) packets at the same time hard issue: what value of l to use!?

Knockout Switch Topology


Knockout switch.

The Knockout switch topology uses a series of cross point switches to select "winners" (cells that
transmit immediately) and "losers" (cells that have to go wait in the buffer) from input
contenders. In the Knockout/Concentrator example shown above, there is a shared buffer system
to hold the cells in the queue. This system works very well until the system is stressed with too
much input, because overflow from the buffers is simply lost.

The Banyan Switch

The Banyan switch is a multistage self-routing architecture which uses fewer &beta;-elements
than the minimum number required for a rearrangeably nonblocking design. More specifically,
an N×N Banyan switch uses (N/2) log N elements. Consequently, the switch cannot be
nonblocking; input-to-output permutations can be constructed that cannot be concurrently routed
with the switch. Therefore, smoothing buffers must lie inside the switch to achieve a reasonably
low packet loss rate.

The structure of an 8×8 Banyan switch is depicted in the diagram.


We see that the &beta;-elements are arranged in three columns of four elements each, in a
pattern that resembles a grid of butterflies. The inputs to the switch are the inputs to the elements
in the first column, and the outputs of the last column are the outputs from the switch. In each
&beta;-element, one output is connected to the input of the element just horizontally on its right,
and the other goes to an element whose line number, represented in binary, differs in precisely
the j's bit, where j is the column number of the element (counting from 0). For example, the
outputs of element (2,1) (bold in the diagram) are connected to the inputs of elements (2,2)
(horizontal connection) and (0,2) (diagonal connection), as the numbers 2 and 0 differ in bit #1
of their binary representation. This simple rule also tells how to construct a path from any input
to any output: in each column j, an appropriate &beta;-element should be set in the "bar" state if
the j's bits of the input and the output numbers equal, and in the "cross" state if those bits differ.
The path shown in bold in the diagram illustrates how to connect input 7 to output 0. Since all
the bits in the binary representations of the input and the output differ, all elements along the
path are set to "cross". Note that every such path is unique. Obviously, several paths cannot be
routed concurrently unless they happen to require the same states of the &beta;-elements. Thus,
in our case, once input 7 is connected to output 0, input 6 cannot be connected to outputs 2, 4,
and 6, because any of these connections would require the element in the first column to be set to
"bar".

Several remedies can be employed to attempt resolving this type of routing conflict: (1) Provide
buffers within the &beta;-elements, so that cells that cannot be immediately delivered are stored
and their routing deferred according to some contention resolution policy; (2) Run the internal
links at a rate that is a multiple of the cell arrival rate, sequentially establishing several paths
within the duration of one cell. To provide an insight to how good these techniques can be in
reducing packet loss rate, it suffices to quote the results of a computer simulation for a large
(1024×1024) Banyan switch run at full input load. With the internal links running at twice the
cell rate (hence capable of establishing two subsequent paths within one time slot) and a buffer
size of 5 cells in each &beta;-element, as many as 92&percnt; of the input cells were delivered,
compared to about 25&percnt; for a simple unbuffered switch, and about 75&percnt; for a
double-rate unbuffered switch. Still, to achieve reasonable packet loss rates (such as one packet
per million), the input load would have to be reduced considerably.

Banyan Switch Topology

Banyan 3 stage switch.

Banyan 3 stage switch.

Banyan switches are based on cross bar switches that have been built into a binary tree topology.
There are many different configurations for Banyan switches. Two possible configurations for 3
stage Banyan switches are shown above. Banyan switches are extremely efficient, but have the
unfortunate problem of BLOCKING. This occurs when two inputs at a switching node are in
contention for the same output and one of the inputs is forced to wait. This situation can be
avoided if the inputs are PRESORTED before entering the Banyan Tree Structure. This topology
is called the Batcher Banyan switching topology and is the next topology on our list.

Batcher Banyan Switch Topology

Batcher Banyan Switch


The Batcher sorting procedure involves 3 levels of sorting to produce NONBLOCKING input
for a 3 stage Banyan network. The price for removing the wait states in the Banyan network is
more nodes (more cost) and a longer travel time through the greater number of nodes. These
switches are however much faster than simple Banyan switches, and of course, much more
expensive.

To date we have studied:


1. NxN Crossbar
2. NxN Time Division Multiplexed (TDM) shared memory
switch
An NxN Crossbar Switch requires N^2 of 2x2 "Beta Elements"
2x2 Beta Element

Banyan Switch:
Clearly it is possible to connect any Ij to any Ok j,k=0,1,2,3
Example:

Connect I1 to O3:
This means that a Banyan Switch can connect any single input
to any given output.
On the other hand, a Banyan Switch is NOT "non-blocking".
For example, once we connect I1 to O3, we can no longer
connect I0 to O2, i.e. making the connection from I1 to O3,
blocks any possible connection from I0 to O2. More.
Note: connections shown in red and blue between stages 2 and 3
represent a "perfect shuffle"

For N=2^k an NxN Banyan switch has:


K = log (base 2) N columns
each with N/2 Beta elements.
Total = N/2 log (base 2) N Beta elements
Example:

N = 16
N/2 log (base 2) N
16/2 log 16 = 32 Beta Elements

Problem: Blocking

If in a given time slot, I1 is "routed" in a downward direction


towards another Beta element, (i.e. Beta element is in the "bar
state") then I0 cannot be "routed" down to Oj for j>=N/2
One Solution:
limits to this design:
-expensive
-buffer overflow still possible
Another Solution: Fast Links
For example: double the speed or the number of links from
column 1 to column 2. NOW 2 cells can arrive at one input to
column 2.
To avoid dropping cells, we now need 4 links from each output
of stage 2 to each input of stage 3.
Generally, we could multiply all link speeds by 2^m, if m>=k-1,
then no loss or blocking occurs. (k = total number of stages).
External Banyan Switch Reference on the web:
The Banyan Switch
Other switch information on the web:
The basic switching element

You might also like