Professional Documents
Culture Documents
Real Time System 09 Philip A Lapalante 2nd Edition
Real Time System 09 Philip A Lapalante 2nd Edition
Systems
281
282 Chap. 12 I MultiprocessingSystems
12.1 CLASSIFICATION
OF ARCHITECTURES
12.2DISTRIBUTED
SYSTEMS
12.2.1 Embedded
Embeddeddistributedsystemsare those in which the individual processorsare
assignedfixed, specifictasks.This type of systemis widely usedin the areasof
avionics,astronautics,and robotics.
r EXAMPLE
12.1
In an avionics system for a military aircraft, separateprocessorsare usually assignedfor navigation,
weaponscontrol, and communications.While these systemscertainly shareinformation (see Figure
12.1), we can prevent failure of the overall system in the event of a single processor failure. To
achieve this safeguard, we designate one of the three processors or a fourth to coordinate the
activities of the others. If this computer is damaged,or shuts itself off due to a BITS fail, another
can assumeits roie. I
12.2.2Organic
Another type of distributed processingsystem consistsof a central scheduler
processorand a collection of generalprocessorswith nonspecificfunctions(see
Figure 12.2). These systems may be connected in a number of topologies
(including ring, hypercube, array, and common bus) and may be used to solve
generalproblems. In organic distributed systems,the challengeis to program the
schedulerprocessorin such a way as to maximize the utilization of the serving
processors.
Chap. 12 I MultiprocessingSystems
284
Weapons Communications
computer computer
12.2.3SystemSPecification
because,as
The specification of software for distributed systemsis challenging
even a single-processing systemis
we have seen,the specificationof software for
difficult.
to the
One techniquethat we have discussed,statecharts,lends itself nicely
be assigned
specificationof distributed systemsbecauseorthogonalprocessescan
is multitasking, theseorthogonal states
to individual processors.If eachprocessor
the individual tasks
can be furthei subdivided into orthogonal statesrepresenting
for each Processor.
Sec. 12.2 I DistributedSystems 285
12.2
I EXAMPLE
Considerthe specificationof the avionics systemfbr the military aircrafi. We have discussedthe
function of the navigationcomputerthruughoutthis text. The statechartfor this function is given in
Figure 5.18. The functions for the weaponscontrol and communicationsystemsare depicted in
Figure 12.3 and Figure 12.4,respectively.In the interestsof space,only this pictorial descriptionof
each subsystemwill be given. I
Weaponsyslem
Rockets Bombs
system
Communications
Log on Log on
a
J \
Send Receive Log
message Log message message
, \ \ D , \ \
t \
Log otf Log off
Unscramble Unscramble Unscramble Unscramble
on otf Button on nff
Receive
pressed message
\ / \ /
r J a TOuputto
Scramble Unscramble
speaKer
L ) \
Button Message
otf interrupt
I EXAMPLE12.3
12.2.4Reliabilityin DistributedSystems
'The characterizationof reliability in a distributed system(real-time
or otherwise)
has beenstatedin a well-known paper[89], "The ByzantineGenerals'problem.,'
The processorsin a distributed system can be considered"generals".and the
interconnectionsbetween them "messengers."The generalsand messengerscan
be both loyal (operatingproperly) or traitors (faulty). The task is for rhe generals,
who can only communicate via the messengers,to formulate a straiegy for
capturinga city (seeFigure 12.5).The problem is to find an algorithmthat ailows
the loyal generals to reach an agreement. It tums out that the problem is
unsolvablefor a totally asynchronoussystem,but solvableif the generalscan vote
in'rounds [153]. This provision, however,imposesadditionaltiming constraints
on the system. Furthermore, the problem can be solved only if thi number of
traitors is less than one-third the total number of processors.We will be using the
Byzantine generals' problem as an analogy for cooperative multiproceJsing
throughout this chapter.
12.2.5Galculationot Retiabitity
in DistributedSystems
consider a group of z processorsconnected in any flat topology. It would be
desirable,but costly, to have every processorconnectedto every other processor
in such away that datacould be sharedbetweenprocessors.This, however, is not
usually possible. In any case, we c:ur use a matrix representationto denote the
connectionsbetween the processors.The matrix, R, is constructedas follows: if
processori is conflectedto processor/ we place a "1" in the ith row, column
Trh of
R. If they are not connected,a "0" is placed there. we consider every processor
Sec. 12.2 I DistributedSvsrems
287
Army ,
General3 Generaln
Messenger
Messenger
General2 Messenger
General 1
representmessengers.
T EXAMPLE
12.4
A topology in which each of n processors is comected to every other *.ould have an n by n
reliability matrix with all ls; that rs,
288 Chap. 12 I MultiprocessingSystems
1 1 l
1 l l
R =
1 1 l
12.5
I EXAMPLE
A topology in which none of the r processorsis connectedto any other (except itself) would have
an n by n reliability matrix with all ls on the diagonalbut 0s elsewhere;that is,
1 0 0
0 1 0
p -
0 0 I
12.6
I EXAMPLE
As a more practical example, consider the four processors connected as in Figure 12.6. T\e
reliability matrix for this topology would be
l r 1 1 0 \
I r r o rl
ft=
[ i o r r i
\ 0 1 1 1 I
Since processors 2 and 3 are disconnected, as are plocessors I and 4, 0s are placed in row 2
column 3, row 3 column 2, row I column 4, and row 4 column I in the reliability matrix. I
I EXAMPLE12.7
Suppose the distributed
systemdescribed
in Figure12.6actuallyhad interconnections
with the
reliabilities
markedas in Figure12.'7.The
newreliabilitymarrixwouldbe
t)
I A .7
.4 I 0
R (
.'l 0 I
0 I .9
I
12.2.6IncreasingReliabilityin DistributedSystems
In Figure 12.7, the fact that processorsI and4 do not have direct communications
links doesnot meanthat the two processorscannotcommunicate.ProcessorI can
senda messageto processof4 via processor2 or 3. It turns out that the overall
reliability of the systemmay be increasedby using this technique.
Without formalization,the overall reliability of the systemcan be calculated
by performinga seriesof specialmatrix multiplications.If R and .! are reliability
matrices for a system of n processorseach, then we define the composition of
thesematrices, denotedR O S. to be
n
where (R o s)(t,7) is the entry in the f throw and7ft colurnn of the resultantmatrix
and V repr€sentstaking the maximum of the reliabilities. If R = S, then we denote
R O R = R2, called the second-orderreliabilitv matrix.
294 Chap. 12 I MultiprocessingSystems
I EXAMPLE12.8
R2for this yields
the systemin Figure12.7.Computing
Consider
1 .4 .7 .63
. 4 t . 9 1
. 7 . 91 . 9
6 3 1 . 9 I
I EXAMPLE12.9
reliabilitycanbe seenin Figure12.8,whereprocessors
The utility of the higher-order I and4 are
two connections apart.Here,the reliabilitymatrixis
0 0
l t ' s
R =t ; \ . 4 0
1 . 3
0
\ o .3 I
The second-orderreliability is
. 2 0
A 1n
1 . 3
.3 I
T E X A M P L E1 2 . 1 0
For thepreviousexample, topologyis givenin Figure12.9.
equivalent
the third-order T
sL
Finally, it can be shown that the maximum reliability matrix for n processors
is given by
n
R-^- = \/ R' (r2.3)
ARCFIITECTURES
NON.VONNEUMANN
12.3.1Dataflow Architectures
Dataflow architecture.s use a large numberof specialprocessorsin a topology in
which each of the processorsis connectedto every other.
In a dataflow architecture,each of the processorshas its own local memory
and a counter.Specialtokensare passedbetweenthe processorsasynchronously.
Thesetokens,calledactivitypackets,containan opcode,operandcount,operands,
and list of destinationaddressesfor the result of the computation.An exarnpleof
a genericactivity packetis given in Figure 12.10.Eachprocessor'slocal memory
is usedto hold a list of activity packetsfor that processor,the operandsneededfor
the current activity packet, and a counter used to keep track of the number of
operands received. When the number of operands stored in local memory is
equivalent to that required for the operation in the current activity packet, the
operationis performed and the results are sent to the specified destinations.Once
an activity packet has been executed,the processorbegins working on the next
activity packet in its execution list.
Argument
n
Destinalion1
Destination2
12.11
T EXAMPLE
We can use the dataflow architectureto perform the discreteconvolutionof two signalsas described
oftwo real-valuedfunctions/(r) and
in the exercisesfor Chapter5. That is, the discreteconr.,olution
s(r, t = 0,1,2,3,4.
4
rJrngtaq { u l tl 2
Aotivity f u l t1 2 t(2)
l u l tl 2 (2) l- r(r) s(2)
flult I2
M u l tl 2
(2) I-nr) s(0)
I s(1)
0
t
Processor
3
ternPl
AcilvitY
Processor6
0, 0, g(0), 0, 0, 0,
s(1),
s(2) s(o),
s(1)
12.3.2SystolicProcessors
Systolicprocessorsconsistof a large numberof uniform processorsconnectedin
an array topology. Each processor usually performs only one specialized
operationand has only enoughlocal memory to perform its designatedoperation,
and to store the inputs and outputs.The individual processors,calledprocessing
elements,take inputs from the top and left, perform a specified operation, and
outputthe resultsto the right andbottom.One suchprocessingelementis depicted
in Figure 12.13.The processorsale connectedto the four nearestneighboring
processorsin the nearestneighbortopology depictedin Figure 12.14.Processing
or firing at each of the cells occurs simultaneouslyin synchronizationwith a
central clock. The fact that eachcell fires on this heartbeatlends the namesystolic.
Inputs to the systemare from memory storesor input devicesat the boundary cells
z=c'y+x
Outputs
obtained from
at the left and top. outputs to memory oI output devices ale
boundary cells at the right and bottom.
12.12
T EXAMPLE
andg(l), t = 0,1,2,3,4'
Once againconsiderthe discreteconvolutionof two real-valuedfunctions/(l)
one in Figure 12.15 canbe constructed to perform the convolutibn' A
A systolic array such as the
I
general algorithm can be found in [5.2]'
are
Systolic Processorsare fast and can be implemented in VLSI' They
with propagation delays in the
sornewhat troublesome, however, in dealing
ticks'
connectionbusesand in the availability of inputs when the clock
Inputs
00000
Inputstream
o
.F
o
\o
(\
!0
?=
L-_
Sec. 12.3 I Non-von NeumannArchitectures
12.3.3WavefrontProcessors
WaveJrontprocessorsconsist of an array of identical processors,each with its
own local memory and connected in a nearest neighbor topology. Each
processorusually perforrnsonly one specializedoperation.Hybrids containing
two or more different type cells are possible. The cells fire asynchronously
when all requiredinputs from the left and top are present.Outputsthen appear
to the right and below. Unlike the systolic processor,the outputs are the
unalteredinputs. That is, the top input is transmitted,unaltered,to the bottom
output bus, and the left input is transmitted,unaltered,to the right output bus,
Also different from the systolicprocessor,outputsfrom the wavefrontprocessor
are read directly from the local memory of selectedcells and not obtainedfrom
boundary cells. Inputs are still placed on the top and left input buses of
boundary cells. The fact that inputs propagate through the array unaltered like
a wave gives this architectureits name.Figure 12.17 depictsa typical wavefront
12.13
I EXAMPLE
g(t)' t = O'l'2'3'4.
Onceagainconsiderthe discreteconvolutionof two real-valuedfunctionsflr) and
to perform the convolution.
A wavefront array such as the one in Figure 12.18 can be constructed
proaluctswill be founcl in the innennost PEs. I
tfter five firings- the convoluiion
. . .,
s(4),
s(0)
12.3.4TransPuters
Transputers are fully self-sufficient, multiple instruction set, von Neumann
p.o""rrorr. The instruction set includes directives to senddata or receive data via
ports that are connectedto other transputers.The transputers,though capableof
acting as a uniprocessor,are best utilized when connectedin a nearestneighbor
configuration. In a sense, the transputer provides a wavefront or systolic
processingcapability but without the restriction of a single instruction. Indeed,b1
providing each transputer in a network with an appropriate stream of an*l
synchronization signals, wavefront or systolic computers-which can change
corifigurations-can be implemented.
Transputershave been widely used in ernbeddedreal-time applications,a-uc
cornmercial implementationsare readily available. Moreover, tool support, suct
as the multitasking languageoccam-2,has madeit easierto build transputer-ba-s
applications.
_4
Sec. I Exercises 299
12.4EXERCISES
1. For the following reliabiliry matrix draw the associateddistributed system graph and
compute R2.
1 1 1 \
ft=
( 1 r 0 l
l 0 t l
2. For the following reliability matrix draw the associated distributed system gaph
compute R2.
f t = I r 0 . , oI . z0J
0
\
l
t
\ 0 . 7 0 l l
3. For the following reliability matrix compute R2, R3, and R^ * (Hint" R*u* + R3).
lr o 0.60
l 0 1 0 0 . 8
R = [ 0 . 6 0 1 l
\o 0 . 8I I
4. Show that the O operation is not commutative. For example, if R and ,S are 3 X 3
reliability matrices, then in general,
ROS*SOR
In fact, you should be able to show that for any n x n reliability matrix
RoS=(soR)r
where0r representsthe matrix transpose.
5. Designa dataflowarchitecturefor performingthe matrix multiplicationof two 5 by 5
arrays.Assumethat binaryADD andMULT arepart of the instnrctionset.
6. Design a dataflow architecturefor performing the matrix addition of two 5 by 5 arrays.
Assume that binary ADD is part of the instruction set.
Use data.flow diagrarnsto describe the systems in exercises4 and 5.
8. Design a systolic array for performing the matrik multiplication of two 5 by 5 arrays.Use
the processing element described in Figure 12'13.
9. Design a systolic array for performing the matrix addition of two 5 by 5 arrays. Use the
processingelement described in Frgure 12.13.
10. Use Petri nets and the processing element described in Figure 12-13 to describe the
systolic array to perform the functions described in
(a) Exercise 7
(b) Exercise 8
11. Design a wavefront array for performing the manix multiplication of two 5 by 5 anays'
Use the processing element described in Figure 12.17.
12. Design a wavefront array for performing the maEix addition of two 5 by 5 arrays. Use
the processing element described in Figure 12.17.
13. Use dataflow diagrams to describe the syltems in
(a) Exercise 10
(b) Exercise 1l
14. Use Petri nets to specify the wavefront irray system shown in Figure 12.18.