Professional Documents
Culture Documents
Notes Knouckout and Banyan
Notes Knouckout and Banyan
The knockout switch is a fully connected architecture which attempts to combine the
implementation simplicity of input queuing (buffer complexity is linear in the number of ports)
with the throughput performance of output queuing (permitted input load and saturation
throughput both approaching 100%). The knockout switch architecture achieves this goal
by intentionally introducing a new source of packet loss, known as buffer blocking, in addition to
packet loss mechanisms present in any switch architecture, namely buffer overflow and noise-
induced random channel errors. The rate of loss from buffer blocking can be readily controlled
and kept low, to reduce significantly the complexity of a switch based in principle on the output
queuing idea.
As in the output queuing model, each fixed-length cell arriving at one of the input ports is placed
on a broadcast bus from which each of the output modules taps the cells intended for itself. It is
obvious that multicast and broadcast cells are readily supported. The output module acts as a
statistical multiplexer, deferring cells that cannot be immediately placed onto the output link
because of contention.
Each input to an output module receives the fixed-length cells broadcasted on the corresponding
input bus. The job of each packet filter is simply to pass the cell to the concentrator if the cell is
destined for that output, and to mark the cell as inactive otherwise. Such a filter can be easily
implemented by a ß-element, only one input and one output of which is used. The role of the
concentrator is to identify among its inputs those cells that are active and route them to its
leftmost outputs, one cell per output line. Note that the concentrator has only L<N outputs.
Should L+1 or more cells arrive simultaneously, only L of them will be processed via the
concentrator; all others will be lost. This is the extra packet loss source in the knockout switch.
By properly choosing L, the loss rate induced by the concentrator can be controlled and
maintained at a reasonably low level. Furthermore, the value of L required to maintain a given
loss rate is relatively small, independent of the number of inputs when the latter is large, and
grows only logarithmically in the loss rate. For example, L = 8 is sufficient to maintain the
packet loss rate in the concentrator at one packet per million, for large N and full input load, and
it only grows by one per every order of magnitude reduced in the loss rate (i.e. L = 11 is enough
for a loss rate of one packet per billion). This effect is the key to maintaining linear complexity
of the knockout switch, as the number of buffers is proportional to L×N rather than N².
The concentrator inputs receive cells, which have already been passed by the packet filters and
are known to be intended for the switch output port served by the concentrator. There are four
(generally, L) stages in the concentrator shown in the diagram. Each stage is designed to operate
like an elimination tournament. Specifically, each ß-element is programmed to set itself to the
"bar" state if there is an active cell on its left input, and to "cross" otherwise. Whenever there is
only one active cell at the inputs of a ß-element, it is allowed to pass downward. If both cells are
active, the right-hand one is "knocked out" to the next stage and contends there. Each stage
produces one "winner" among the active cells that enter it, and each subsequent stage receives
one less active cell than the previous one. Therefore, when there are k active cells, they are
guaranteed to come out on the outputs of the first k stages.
If the packet buffers in the output module diagram were to be loaded directly from the
concentrator outputs, then the leftmost buffers would tend to fill up faster, and might even
overflow despite the presence of empty buffer entries on the right. The shifter prevents that from
happening by spreading each bulk of cells arriving at its input continuously to the right; in other
words, if the last buffer to receive a packet happened to be m, then the next k cells arriving at the
shifter's input will be directed to buffers m+1, ..., m+k (modulo L). Physically, the shifter can be
implemented with an L×L Banyan network.
Because of the round-robin nature of the shifter and the fact that the buffers are filled cyclically,
they can also be emptied cyclically. At each time slot, the output line fetches a cell from a buffer
just right (cyclically) of the buffer last fetched from, beginning with buffer 1. Moreover, if the
output circuitry encounters an empty buffer, the round-robin policy of buffer filling guarantees
that all buffers are empty at that point, and the one just reached is precisely the next one to be
filled again. The output pointer can then just stop there and wait for that buffer to receive a cell,
after which the circular emptying of buffers can restart from that point.
The Knockout Switch full crossbar requires each output port to handle up to n input packets in
simultaneous inputs for same output is unlikely, especially in large switch instead implement
port to accept (l < n) packets at the same time hard issue: what value of l to use!?
The Knockout switch topology uses a series of cross point switches to select "winners" (cells that
transmit immediately) and "losers" (cells that have to go wait in the buffer) from input
contenders. In the Knockout/Concentrator example shown above, there is a shared buffer system
to hold the cells in the queue. This system works very well until the system is stressed with too
much input, because overflow from the buffers is simply lost.
The Banyan switch is a multistage self-routing architecture which uses fewer β-elements
than the minimum number required for a rearrangeably nonblocking design. More specifically,
an N×N Banyan switch uses (N/2) log N elements. Consequently, the switch cannot be
nonblocking; input-to-output permutations can be constructed that cannot be concurrently routed
with the switch. Therefore, smoothing buffers must lie inside the switch to achieve a reasonably
low packet loss rate.
Several remedies can be employed to attempt resolving this type of routing conflict: (1) Provide
buffers within the β-elements, so that cells that cannot be immediately delivered are stored
and their routing deferred according to some contention resolution policy; (2) Run the internal
links at a rate that is a multiple of the cell arrival rate, sequentially establishing several paths
within the duration of one cell. To provide an insight to how good these techniques can be in
reducing packet loss rate, it suffices to quote the results of a computer simulation for a large
(1024×1024) Banyan switch run at full input load. With the internal links running at twice the
cell rate (hence capable of establishing two subsequent paths within one time slot) and a buffer
size of 5 cells in each β-element, as many as 92% of the input cells were delivered,
compared to about 25% for a simple unbuffered switch, and about 75% for a
double-rate unbuffered switch. Still, to achieve reasonable packet loss rates (such as one packet
per million), the input load would have to be reduced considerably.
Banyan switches are based on cross bar switches that have been built into a binary tree topology.
There are many different configurations for Banyan switches. Two possible configurations for 3
stage Banyan switches are shown above. Banyan switches are extremely efficient, but have the
unfortunate problem of BLOCKING. This occurs when two inputs at a switching node are in
contention for the same output and one of the inputs is forced to wait. This situation can be
avoided if the inputs are PRESORTED before entering the Banyan Tree Structure. This topology
is called the Batcher Banyan switching topology and is the next topology on our list.
Banyan Switch:
Clearly it is possible to connect any Ij to any Ok j,k=0,1,2,3
Example:
Connect I1 to O3:
This means that a Banyan Switch can connect any single input
to any given output.
On the other hand, a Banyan Switch is NOT "non-blocking".
For example, once we connect I1 to O3, we can no longer
connect I0 to O2, i.e. making the connection from I1 to O3,
blocks any possible connection from I0 to O2. More.
Note: connections shown in red and blue between stages 2 and 3
represent a "perfect shuffle"
N = 16
N/2 log (base 2) N
16/2 log 16 = 32 Beta Elements
Problem: Blocking