Professional Documents
Culture Documents
Networks Part1
Networks Part1
ECE655/Krishna Part.6 .1
Fault-Tolerant Networks
Multiple paths connecting the source to the
destination of a message Spare nodes that can be switched in to replace failed units Fault-tolerant topologies Extra-Stage Multi-Stage Networks Interstitial Mesh Redundant Crossbar Hypercube Point-to-Point Networks
ECE655/Krishna Part.6 .2
Page 1
Multi-Stage Network
Non-fault-tolerant multi-stage network
(butterfly network) - typically built out of 2x2 switches - two inputs and two outputs
Butterfly Network
k-stage network (k3)
2 k inputs and 2 k outputs k stages of 2 k-1 switches each Connections follow a recursive
pattern from input to output Input stage - top output line of each switchbox connected to the input lines of a 2 K-1 x 2 K-1 butterfly, and the bottom output line of each switchbox connected to the input lines of another 2 K-1 x 2 K-1 butterfly
switchboxes connected to one 2x2 switchbox and the bottom output line to another 2x2 switchbox
Page 2
A switchbox in stage i has lines numbered 2 apart Output line j of every stage goes into input line j of
the following stage (j=0,...,2 k-1 ) Numbers in any box other than at the output stage are both of the same (even or odd) parity Butterfly is not fault-tolerant: there is only one path from any given input to a specific output If a switchbox in stage i fails - 2 k-i inputs will no longer be connected to 2 i+1 outputs The system can still operate but in a degraded mode
ECE655/Krishna Part.6 .5
stage 0 at the input Bypass multiplexors around switchboxes at the input and output stages - a failed switch can be bypassed by routing around it Examples: Stage-0 switchbox carrying lines 2,3 fails - duplicated by the
extra stage - failed box is bypassed by the multiplexor
set so that input line 0 is switched to output line 1 and input line 4 to output line 5 - bypassing the failed switchbox
Page 3
Definition of Measures
time t, which are communicating with some memory Connectivity Q(t) - expected number, at time t, of operational processor-to-memory paths - an operational path includes a processor, memory, and the links connecting them - all fault-free A processor (memory) is accessible (at time t) if it is fault-free and is connected to at least one fault-free memory (processor) Additional connectivity measure - tuple (A r (t),Am (t)) - expected number at time t of accessible processors (memories) Another measure - (N m (t),N r (t)) - expected number at time t of fault-free processors (memories) to which an accessible memory (processor) is connected
ECE655/Krishna Part.6 .8 Copyright 2004 Koren & Krishna
Page 4
Shortcomings of Measures
BW(t) - depends not only on network conditions but also on the memory requirements of processors Q(t) - number of paths does not indicate how many distinct processors and memories are still accessible (A r (t),A m (t)) - does not imply that a fully connected fault-free A r (t) x A m (t) interconnection network exists; does not indicate how many faultfree memories are connected, on the average, to an accessible processor Combining Q(t) and (A r (t),A m (t)) - a more complete characterization of system N m (t) = Q(t) / A r (t) ; N r (t) = Q(t) / A m (t) (N r (t),Nm (t)) - an upper bound on the expected maximal fully-connected operational system at time t
ECE655/Krishna Part.6 .9 Copyright 2004 Koren & Krishna
Dependability Analysis
Assumption - mean time between component failures
(and possible repairs) is much larger than the average length of the communication period between a processor and a memory Status (faulty or fault-free) of system components is constant for a large enough period of time - allowing the analysis of the system's behavior under a statistical steady-state System is observed at some arbitrary time t - fixed throughout the analysis All measures are functions of t - p r (t), p m (t), p (t) l t is omitted from notations for simplicity - pr , pm ,p l p q - probability that a processor has a request for a memory connection
ECE655/Krishna Part.6 .10 Copyright 2004 Koren & Krishna
Page 5
Bandwidth Analysis
Bandwidth BW - expected number of processors
actively communicating with some memory Simplifying assumption - destinations of memory requests by processors are statistically independent and uniformly distributed among the N memories Network bandwidth - product of number of memories N and m - the probability that a given memory (say memory 0) is non-faulty and has a request at its input
is calculated iteratively - following a path leading to this memory Probability of a request on an output link of a switch - calculated from the probability that a request has been accepted at the input links to this switch
ECE655/Krishna Part.6 .11 Copyright 2004 Koren & Krishna
Bandwidth Calculation
A link is in state X=1 (X=0) if it has (does not
have) a request for the specific memory - a faulty link is in state X=0 Assigning numbers to the k+1 stages (k=log2N) Stage 0 is the last stage - its output links are connected
to the memories Stage k is the first the outputs of the processors X i ,Y i - outputs of a 3 2 1 0
switch in stage i X i+1 ,Y i+1- inputs of a switch in stage i - outputs of two different switches in stage i+1
ECE655/Krishna Part.6 .12 Copyright 2004 Koren & Krishna
Page 6
Bandwidth - Cont.
Inputs into switch are statistically independent P(X i =u,Y i =v) = P(X i =u) P(Y i =v) P(Y i =0) = P(X i =0) = 1- P(X i =1) After some algebraic manipulations 2 P( X i =1) = pl P(X i+1=1) - 0.25 p 2 P(X i+1=1) l For the processors output links P(X k =1) = pq p r P( X 0 =1) is calculated recursively The memory and its input link can be faulty m = P( X 0 =1) p l p m BW = N m
ECE655/Krishna Part.6 .14 Copyright 2004 Koren & Krishna
Page 7
i+1
A r= N r
Page 8
Page 9