Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Multiprocessin

Systems

KEY POINTS OF THE CHAPTER

L Building real-time multiprocessingsystemsis hard becausebuilding


uniprocessingreal-timesystemsis alreadydifficult enough.
2. Reliability in multiprocessingsystemscan be increasedthrough redun-
dancy and multiplicity. However, security, processing,and reliability
costs are associatedwith the communicationlinks betweenprocessors.
3. Describing the functional behavior and design of multiprocessing
systemsis difficult and requires nontraditional tools.
4. It is crucial to understandthe underlying hardware architecture of the
multiprocessingsystembeing used.

in this chapterwe look at issuesrelated to real-time systemswhen more than one


processor is used. We characteize real-time multiprocessing systems into two
types: those that use severalautonomousprocessors,and those that use a large
number of interdependent,highly integratedmicroprocessors'
Although many of the problems encounteredin multiprocessingreal-time
systemsare the same as those in the single-processingworld, these problems
becomemore troublesome.For example,systemspecificationis more difficult.
Intertask communication and synchronization becomes interprocessorcommu-
nication and synchronization. Integration and testing is more challenging. and
reliability more difficult to manage.Combine these complications with the fact
that the individual processorsthemselvescan be multitasking. and you can seethe
level of complexity.

281
282 Chap. 12 I MultiprocessingSystems

In a singlechapterwe can only give a brief introductionto thoseissuesthat


needto be addressedin the designof real-timemultrprocessmgsystems.

12.1 CLASSIFICATION
OF ARCHITECTURES

Computerarchitectures canbe classifiedin termsof singleor multiple instructions


streamsand singleor multiple datastreamsas shown in Table 12.1.By providing
a taxonomy.it is easierto match a computerto an applicationand to remember
the basiccapabiiitiesof a processor.In standardvon Neumannarchitectures,the
senal tetch and executeprocess,coupledwith a singlecombineddatalinstruction
store. forces serial instruction and data streams.This is also the case in RISC
rreciucedinstruction set computer) architectures.Although many RISC archi-
tectures include pipelining, and hence become multiple instruction stream,
pipeiinin-eis not a requisitecharacteristic of RISC.

T-{BLE l2.l Classificationfor ComputerArchitectures

Single Data Stream Multiple Data Stream

SineleIn:tr.rctionStream von Neumannarchitecture/uniprocessors Systolic processors


RISC Wavefront processors
\ f u l t i p l e I n s t r u e t i oS
nt r e a m Pipeiinedarchitectures Dataflow processors
Very long instruction word processors Transputers

In both systolic and u'avefront processors,each processingelement is


executing the same (and onll'.t instruction buf on different data. Hence these
architecturesare SIMD.
In pipelines architectures,effectively more than one instruction can be
processedsimultaneously(one for each level of pipeline). However, since only
one instruction can use data at any one time, it is MISD. Similarly, very long
instructionword computerstend to be implementedwith microinstructionsthat
have very long bit-lengths (and hence more capability). Hence, rather than
breaking down macroinstructionsinto numerous microinstructions, several
(nonconflicting)macroinstructionscan be combined into severalmicroinstruc-
tions.For example,if objectcodewas generatedthat calledfor a load one register
followed by an increment of another register, these two instructions could be
executed simultaneously by the processor (or at least appear so at the
rnacroinstruction level) with a series of long microinstruction. Since only
nonconflicting instructions can be combined, any two accessingthe data bus
conflict. Thus, only one instruction can accessthe data bus, and so the very long
instructionword computeris MISD.
Sec. 12.2 I DistributedSystems 2E3

Finally, in dataflow processorsand transputers(see the foilowing discus-


sion), each processing element is capable of executing numerous different
instructionand on different data;henceit is MINllD" Distributedarchitecturesare
also classifiedin this way.

12.2DISTRIBUTED
SYSTEMS

We characterizedisn"ibutedreal-time systemsas a coilection of interconnected


self-containedprocessors.We differentiatethis type of system from the type
discussedin the next section in that each of the processorsin the distributed
systemcan perform significantprocessingwithout the cooperationof the other
processors.
Many of the techniquesdevelopedin the context of multiraskingsystems
can be appliedto multiprocessingsystems.For example,by treatingeach of the
processorsin a distributedsystemas a task,the synchronizationand communica-
tion techniquespreviouslydiscussedcan be used. But this is not alwaysenough,
becauseoften eachof the processorsin a multiprocessingsystemare themselves
multitasking.In any case,this type of distributed-processing
systemrepresents
the
best solution to the real-time problem when such resourcesare available.

12.2.1 Embedded
Embeddeddistributedsystemsare those in which the individual processorsare
assignedfixed, specifictasks.This type of systemis widely usedin the areasof
avionics,astronautics,and robotics.

r EXAMPLE
12.1
In an avionics system for a military aircraft, separateprocessorsare usually assignedfor navigation,
weaponscontrol, and communications.While these systemscertainly shareinformation (see Figure
12.1), we can prevent failure of the overall system in the event of a single processor failure. To
achieve this safeguard, we designate one of the three processors or a fourth to coordinate the
activities of the others. If this computer is damaged,or shuts itself off due to a BITS fail, another
can assumeits roie. I

12.2.2Organic
Another type of distributed processingsystem consistsof a central scheduler
processorand a collection of generalprocessorswith nonspecificfunctions(see
Figure 12.2). These systems may be connected in a number of topologies
(including ring, hypercube, array, and common bus) and may be used to solve
generalproblems. In organic distributed systems,the challengeis to program the
schedulerprocessorin such a way as to maximize the utilization of the serving
processors.
Chap. 12 I MultiprocessingSystems
284

Weapons Communications
computer computer

Figure 12.1 A distributedcomputersystemfor a military aircraft'

Figure 12.2 Organic distributed computer in common bus configuration'

12.2.3SystemSPecification
because,as
The specification of software for distributed systemsis challenging
even a single-processing systemis
we have seen,the specificationof software for
difficult.
to the
One techniquethat we have discussed,statecharts,lends itself nicely
be assigned
specificationof distributed systemsbecauseorthogonalprocessescan
is multitasking, theseorthogonal states
to individual processors.If eachprocessor
the individual tasks
can be furthei subdivided into orthogonal statesrepresenting
for each Processor.
Sec. 12.2 I DistributedSystems 285

12.2
I EXAMPLE
Considerthe specificationof the avionics systemfbr the military aircrafi. We have discussedthe
function of the navigationcomputerthruughoutthis text. The statechartfor this function is given in
Figure 5.18. The functions for the weaponscontrol and communicationsystemsare depicted in
Figure 12.3 and Figure 12.4,respectively.In the interestsof space,only this pictorial descriptionof
each subsystemwill be given. I

Weaponsyslem

Rockets Bombs

Figure 12.3 Weapons control system for a military aircraft

system
Communications

Log on Log on

a
J \
Send Receive Log
message Log message message
, \ \ D , \ \
t \
Log otf Log off
Unscramble Unscramble Unscramble Unscramble
on otf Button on nff
Receive
pressed message
\ / \ /
r J a TOuputto
Scramble Unscramble
speaKer
L ) \
Button Message
otf interrupt

Figure 12.4 Communications system for military aircraft.


286 Chap. l2 I Multiprocessing
Systems

A second techniquethat can be used is the dataflow diagram. Here the


processsymbols can representprocessors,whereasthe directeo arcs
represent
communicationspaths between the processors.The sinks and sourcescan
be
either devices that produce and consume data or processesthat produce
or
consumeraw data.

I EXAMPLE12.3

12.2.4Reliabilityin DistributedSystems
'The characterizationof reliability in a distributed system(real-time
or otherwise)
has beenstatedin a well-known paper[89], "The ByzantineGenerals'problem.,'
The processorsin a distributed system can be considered"generals".and the
interconnectionsbetween them "messengers."The generalsand messengerscan
be both loyal (operatingproperly) or traitors (faulty). The task is for rhe generals,
who can only communicate via the messengers,to formulate a straiegy for
capturinga city (seeFigure 12.5).The problem is to find an algorithmthat ailows
the loyal generals to reach an agreement. It tums out that the problem is
unsolvablefor a totally asynchronoussystem,but solvableif the generalscan vote
in'rounds [153]. This provision, however,imposesadditionaltiming constraints
on the system. Furthermore, the problem can be solved only if thi number of
traitors is less than one-third the total number of processors.We will be using the
Byzantine generals' problem as an analogy for cooperative multiproceJsing
throughout this chapter.

12.2.5Galculationot Retiabitity
in DistributedSystems
consider a group of z processorsconnected in any flat topology. It would be
desirable,but costly, to have every processorconnectedto every other processor
in such away that datacould be sharedbetweenprocessors.This, however, is not
usually possible. In any case, we c:ur use a matrix representationto denote the
connectionsbetween the processors.The matrix, R, is constructedas follows: if
processori is conflectedto processor/ we place a "1" in the ith row, column
Trh of
R. If they are not connected,a "0" is placed there. we consider every processor
Sec. 12.2 I DistributedSvsrems
287

Army ,
General3 Generaln

Messenger
Messenger

General2 Messenger
General 1

Figure 12.5 The Byzantine generals' problem.

representmessengers.

T EXAMPLE
12.4
A topology in which each of n processors is comected to every other *.ould have an n by n
reliability matrix with all ls; that rs,
288 Chap. 12 I MultiprocessingSystems

1 1 l
1 l l

R =

1 1 l

12.5
I EXAMPLE
A topology in which none of the r processorsis connectedto any other (except itself) would have
an n by n reliability matrix with all ls on the diagonalbut 0s elsewhere;that is,
1 0 0
0 1 0
p -

0 0 I

12.6
I EXAMPLE
As a more practical example, consider the four processors connected as in Figure 12.6. T\e
reliability matrix for this topology would be

l r 1 1 0 \
I r r o rl
ft=
[ i o r r i
\ 0 1 1 1 I
Since processors 2 and 3 are disconnected, as are plocessors I and 4, 0s are placed in row 2
column 3, row 3 column 2, row I column 4, and row 4 column I in the reliability matrix. I

Figure 12.6 Four-processordistributed


system.

The ideal world has all processorsand interconnectionsuniformly reliable-


but this is not always the.case.We can assigna number between0 and 1 for each
entry to representits reliability. For example, an entry of 1 representsa perfecl
messengeror general.If the entriesare lessthan 1, then this representsa traitorou$
generalor messenger.(A very traitorous generalor messengergets a 0; a "small-
time" traitor may get a "0.9" entry.) Disconnectionsstill receive a 0.
12.2 I DistributedSystems

I EXAMPLE12.7
Suppose the distributed
systemdescribed
in Figure12.6actuallyhad interconnections
with the
reliabilities
markedas in Figure12.'7.The
newreliabilitymarrixwouldbe

t)
I A .7

.4 I 0
R (
.'l 0 I
0 I .9
I

Figure 12.7 Four:-processordistributed


systemwith reliabilities.

Notice that if we assumethat the communicationslinks have reciprocal


reliability (the reliability is the sameregardlessof which directionthe messageis
traveling in), then the matrix is symmetric with respectto the diagonal.This,
alongwith the assumptionthat the diagonalelementsarealwaysI (not necessarily
true), can greatly simplify calculations.

12.2.6IncreasingReliabilityin DistributedSystems
In Figure 12.7, the fact that processorsI and4 do not have direct communications
links doesnot meanthat the two processorscannotcommunicate.ProcessorI can
senda messageto processof4 via processor2 or 3. It turns out that the overall
reliability of the systemmay be increasedby using this technique.
Without formalization,the overall reliability of the systemcan be calculated
by performinga seriesof specialmatrix multiplications.If R and .! are reliability
matrices for a system of n processorseach, then we define the composition of
thesematrices, denotedR O S. to be
n

(R O SXr,j) = .V. R(i,lc)S(ft.1) (r2.r)


k= l

where (R o s)(t,7) is the entry in the f throw and7ft colurnn of the resultantmatrix
and V repr€sentstaking the maximum of the reliabilities. If R = S, then we denote
R O R = R2, called the second-orderreliabilitv matrix.
294 Chap. 12 I MultiprocessingSystems

I EXAMPLE12.8
R2for this yields
the systemin Figure12.7.Computing
Consider
1 .4 .7 .63
. 4 t . 9 1
. 7 . 91 . 9
6 3 1 . 9 I

12.2.6.1Higher-Order Reliability Matrices Higher-order reliabilities


can be found using the same technique as for the secondorder. Recursively,we
can define the ruthorder reliability matrix as
R"=R"-lOR (r2.2)

I EXAMPLE12.9
reliabilitycanbe seenin Figure12.8,whereprocessors
The utility of the higher-order I and4 are
two connections apart.Here,the reliabilitymatrixis

Figure 12.8 Four-processordistriburcd


systemwith reliabilities.

0 0
l t ' s
R =t ; \ . 4 0
1 . 3
0
\ o .3 I

The second-orderreliability is
. 2 0
A 1n

1 . 3
.3 I

Calculating the third-order reliability matrix gives


.2 .06
.4 .12
t - J
.3 I
12.3 I Non-von NeumannArchitectures

The higher reliability matrix allows us to draw an equivalenttopology for


the distributedsystem.One obviousconclusionthat canbe drawn from looking at
the higher-orderreliability matrices,and one that is intuitively pleasing,is that we
can increasethe reliability of messagepassingin distributedsystemsby providing
redundantsecond-,third-, and so on. Order pathsbetweenprocessors.

T E X A M P L E1 2 . 1 0
For thepreviousexample, topologyis givenin Figure12.9.
equivalent
the third-order T

sL

Figure 12.9 Equivalent third-order


topology for Example 12.9.

Finally, it can be shown that the maximum reliability matrix for n processors
is given by
n
R-^- = \/ R' (r2.3)

For example,in the previousexample,R-u* = Rl yR2 yR3.


To what n do we need to compute to obtain x percent of the theoretical
maximum reliability? Is this dependenton the topology? Is this dependenton the
reliabilities?In addition,the reliability matrix might not be fixed; that is, it might
be some function of time r. Finally, the fact that transmissionsover higher-order
paths increase signal transit time introduces a penalty that rnust be balanced
againstthe benefit of increasedreliability. There are a number of open problems
in this area that are beyond the scope of this text.

ARCFIITECTURES
NON.VONNEUMANN

The processing of discrete signals in real-time is of paramount importance to


virtually every type of system. Yet the qomputationsneededto detect, extract,
mix, or otherwiseprocesssignals are computationally intensive.For example,the
convolutionsum discussedin Chapter5 is widely usedin signal processing.
Becauseof these computationally intensive operations,real-time designers
must look to hardware to improve response times. In response, hardware
Chap. 12 I MultiprocessingSystems

designershave provided severalnon-von Neumann,multiprocessingarchitectures


which, thoughnot generalpurpose,can be usedto solvea wide classof problems
in realtime. (Recall that von Neumann architecturesare stored program, single
fetch-executecycle machines.)These multiprocessorstypically feature large
quantities of simple processorsin VLSI.
Increasingly,real-timesystemsaredistributedprocessingsystemsconsisting
of one or more generalprocessorsand one or more of theseotherstyleprocessors.
The general,von Neumann-styleprocessorsprovide control and input/output,
whereasthe specializedprocessoris used as an engine for fast execution of
complexand specializedcomputations.In the next sections,we discussseveralof
thesenon-von Neumannarchitecturesand illustratetheir applications.

12.3.1Dataflow Architectures
Dataflow architecture.s use a large numberof specialprocessorsin a topology in
which each of the processorsis connectedto every other.
In a dataflow architecture,each of the processorshas its own local memory
and a counter.Specialtokensare passedbetweenthe processorsasynchronously.
Thesetokens,calledactivitypackets,containan opcode,operandcount,operands,
and list of destinationaddressesfor the result of the computation.An exarnpleof
a genericactivity packetis given in Figure 12.10.Eachprocessor'slocal memory
is usedto hold a list of activity packetsfor that processor,the operandsneededfor
the current activity packet, and a counter used to keep track of the number of
operands received. When the number of operands stored in local memory is
equivalent to that required for the operation in the current activity packet, the
operationis performed and the results are sent to the specified destinations.Once
an activity packet has been executed,the processorbegins working on the next
activity packet in its execution list.

Opcode I n (numberof arguments)


Argument
1
Argument2

Argument
n
Destinalion1
Destination2

Figure 12.10 Generic activity template


Destinationm
for dataflow machine.
12.3 I Non-von NeumannArchitectures 293

12.11
T EXAMPLE
We can use the dataflow architectureto perform the discreteconvolutionof two signalsas described
oftwo real-valuedfunctions/(r) and
in the exercisesfor Chapter5. That is, the discreteconr.,olution
s(r, t = 0,1,2,3,4.
4

ff* s)(4 = ,l /(i )s(r- i)

The processortopology and activity packet list is clescribedin Figure 12 ll I

rJrngtaq { u l tl 2
Aotivity f u l t1 2 t(2)
l u l tl 2 (2) l- r(r) s(2)
flult I2
M u l tl 2
(2) I-nr) s(0)
I s(1)
0
t
Processor
3

ternPl

AcilvitY

Processor6

Figure 12.11 Discreteconvolutionin a dataflow architecture.

Dataflow architectures are an excellent parallel solution for signai


processing.The'only drawbackfor dataflow architecturesis that cuffentlv ihe\
cannot be implementedin VLSI. Performancestudies for dataflo'x real-tr::i
systemscan be found in [148].

L2.3:.1.1System Specificationfor Dataflow Processors Datai-lcr'r' --i-r-


tecturesare ideal'becausethey are dilect implementations of dataflorr g:i:.-s i:.
fact, programm'ersdraw dataflow diagrams as part of the programmng ;,:r\-3s>.
The graphsare then translatedinto a list of activity packelsfor eachctLr--e SStlr.-t
294 Chap. l2 I MultiprocessingSystems

0, 0, g(0), 0, 0, 0,
s(1),
s(2) s(o),
s(1)

Figure 12.12 Specificationof discreteconvolutionusing dataflow diagrams

exampleof is given in Figure 12.12.As we have seenin the example,they are


well-adaptedto parallel signal processing[52], [53].

12.3.2SystolicProcessors
Systolicprocessorsconsistof a large numberof uniform processorsconnectedin
an array topology. Each processor usually performs only one specialized
operationand has only enoughlocal memory to perform its designatedoperation,
and to store the inputs and outputs.The individual processors,calledprocessing
elements,take inputs from the top and left, perform a specified operation, and
outputthe resultsto the right andbottom.One suchprocessingelementis depicted
in Figure 12.13.The processorsale connectedto the four nearestneighboring
processorsin the nearestneighbortopology depictedin Figure 12.14.Processing
or firing at each of the cells occurs simultaneouslyin synchronizationwith a
central clock. The fact that eachcell fires on this heartbeatlends the namesystolic.
Inputs to the systemare from memory storesor input devicesat the boundary cells

z=c'y+x

w=l Figure 12.13 Systolic processorelement.


12.3 I Non-von NeumannArchitectures
29s

Outputs

Figure 12.14 Systolic array in nearestneighbor topology'

obtained from
at the left and top. outputs to memory oI output devices ale
boundary cells at the right and bottom.

12.12
T EXAMPLE
andg(l), t = 0,1,2,3,4'
Once againconsiderthe discreteconvolutionof two real-valuedfunctions/(l)
one in Figure 12.15 canbe constructed to perform the convolutibn' A
A systolic array such as the
I
general algorithm can be found in [5.2]'

are
Systolic Processorsare fast and can be implemented in VLSI' They
with propagation delays in the
sornewhat troublesome, however, in dealing
ticks'
connectionbusesand in the availability of inputs when the clock

Inputs

sgl s$) s (41


e (3) s (3) s (3)
s(z',) s (?',) s (21
o(1) o (1) s (1)
; (0) btol s (0)

00000
Inputstream

Figure 12.15 SystoTieagay for convolution'


296 Chap. 12 I MultiprocessingSystems

o
.F

o
\o
(\

!0
?=

L-_
Sec. 12.3 I Non-von NeumannArchitectures

12.3.2.1 Specification of Systolic Systems The similarity of the jargon


associatedwith systolicprocessorsleadsus to believethat Petri netscan be used
to specify such systems.This is indeed true, and an example of specifying the
convolution operationis given in Figure 12.16.

12.3.3WavefrontProcessors
WaveJrontprocessorsconsist of an array of identical processors,each with its
own local memory and connected in a nearest neighbor topology. Each
processorusually perforrnsonly one specializedoperation.Hybrids containing
two or more different type cells are possible. The cells fire asynchronously
when all requiredinputs from the left and top are present.Outputsthen appear
to the right and below. Unlike the systolic processor,the outputs are the
unalteredinputs. That is, the top input is transmitted,unaltered,to the bottom
output bus, and the left input is transmitted,unaltered,to the right output bus,
Also different from the systolicprocessor,outputsfrom the wavefrontprocessor
are read directly from the local memory of selectedcells and not obtainedfrom
boundary cells. Inputs are still placed on the top and left input buses of
boundary cells. The fact that inputs propagate through the array unaltered like
a wave gives this architectureits name.Figure 12.17 depictsa typical wavefront

Figure 12.17 Wavefront processorelement

processing element. Wavefront processorsare very good for computatronally-


intensive real-time systems and are used widely in modern real-time signal.
processing[51], [52]. In addition, a wavefront archirecturecan cope with timing
uncertainties such as local bfocking, random delay in semmrrnisations, and
fluctuations in computing times [86].
298 Chap. 12 I MultiprocessingSystems

12.13
I EXAMPLE
g(t)' t = O'l'2'3'4.
Onceagainconsiderthe discreteconvolutionof two real-valuedfunctionsflr) and
to perform the convolution.
A wavefront array such as the one in Figure 12.18 can be constructed
proaluctswill be founcl in the innennost PEs. I
tfter five firings- the convoluiion

0 o o 0 (o) (1) (2) !(9) (4)


0 o o t(o) t(t) t(21 (3) (1) 9
0 o rlol riri f(2) !(3) (1) q g
0 rt-ol tiri t(2i 1(3) (1) o o o
(0) iiii tizi tigi r(4i o o o o

. . .,
s(4),
s(0)

Figure 12.18 Discrete convolution using a wavefront array'

Wavefront processors combine the best of systolic architectures with


dataflow architectures.That is, they supportan asynchronousdataflow computing
structure-timing in the interconnectionbusesand at input and output devicesis
not a problem. Furthermore,the structure can be implemented in VLSI.

L2.3.3.1 System Specification for Wavefront Processors As is true of


the dataflow architecture,dataflow diagramscan be usedto specify thesesystems.
For example, the convolution system depicted in the previous example can h
specifiedusing Figure 12.12'
Finally, Petri nets and finite stateautomataor a variation, cellular automata-
may have potential use for specifying wavefront systems'

12.3.4TransPuters
Transputers are fully self-sufficient, multiple instruction set, von Neumann
p.o""rrorr. The instruction set includes directives to senddata or receive data via
ports that are connectedto other transputers.The transputers,though capableof
acting as a uniprocessor,are best utilized when connectedin a nearestneighbor
configuration. In a sense, the transputer provides a wavefront or systolic
processingcapability but without the restriction of a single instruction. Indeed,b1
providing each transputer in a network with an appropriate stream of an*l
synchronization signals, wavefront or systolic computers-which can change
corifigurations-can be implemented.
Transputershave been widely used in ernbeddedreal-time applications,a-uc
cornmercial implementationsare readily available. Moreover, tool support, suct
as the multitasking languageoccam-2,has madeit easierto build transputer-ba-s
applications.

_4
Sec. I Exercises 299

12.4EXERCISES
1. For the following reliabiliry matrix draw the associateddistributed system graph and
compute R2.
1 1 1 \
ft=
( 1 r 0 l
l 0 t l

2. For the following reliability matrix draw the associated distributed system gaph
compute R2.

f t = I r 0 . , oI . z0J
0
\
l
t
\ 0 . 7 0 l l

3. For the following reliability matrix compute R2, R3, and R^ * (Hint" R*u* + R3).

lr o 0.60
l 0 1 0 0 . 8
R = [ 0 . 6 0 1 l
\o 0 . 8I I

4. Show that the O operation is not commutative. For example, if R and ,S are 3 X 3
reliability matrices, then in general,
ROS*SOR

In fact, you should be able to show that for any n x n reliability matrix
RoS=(soR)r
where0r representsthe matrix transpose.
5. Designa dataflowarchitecturefor performingthe matrix multiplicationof two 5 by 5
arrays.Assumethat binaryADD andMULT arepart of the instnrctionset.
6. Design a dataflow architecturefor performing the matrix addition of two 5 by 5 arrays.
Assume that binary ADD is part of the instruction set.
Use data.flow diagrarnsto describe the systems in exercises4 and 5.
8. Design a systolic array for performing the matrik multiplication of two 5 by 5 arrays.Use
the processing element described in Figure 12'13.
9. Design a systolic array for performing the matrix addition of two 5 by 5 arrays. Use the
processingelement described in Frgure 12.13.
10. Use Petri nets and the processing element described in Figure 12-13 to describe the
systolic array to perform the functions described in
(a) Exercise 7
(b) Exercise 8
11. Design a wavefront array for performing the manix multiplication of two 5 by 5 anays'
Use the processing element described in Figure 12.17.
12. Design a wavefront array for performing the maEix addition of two 5 by 5 arrays. Use
the processing element described in Figure 12.17.
13. Use dataflow diagrams to describe the syltems in
(a) Exercise 10
(b) Exercise 1l
14. Use Petri nets to specify the wavefront irray system shown in Figure 12.18.

You might also like