Verilog Modeling and Simulation of A Communication Coprocessor For Multicomputers

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Verilog Modeling and Simulation of a Communication Coprocessor for

Multicomputers

A S hyainpi-akash C P Ravikurnar
Cadence Design Systems (India) Pvt. Ltd., De p iir t me n t of E I ect r ic a1 En g i n eel-in g ,
SDF # A-l/B-8, Noida Export Processing Zone, 1n di ;in I n s t i t LI t e of Techno 1ogy ,
PO NEPZ, NOIDA, UP 20 1305, HUIZ Khas, New Delhi 1 IO0 16,
INDIP. INDIA
Email : ashy alii @ cadence.coni Em ai 1: rku iiiar 0ee.iitd .ernet.in

Abstract the destination is considerably reduced. An output ch;rnnel


is allocated t o ;I packet iintl not to a flit. If the packet size i s
Inrge enough. the time taken for transmission will depend
~ i i a i n l yu p o n the number. of flits transmitted and hence
becomes i tidependent of the distance between source and
destinntioii. I t can also be seen that if a ch;iiiiiel gets
blocked. i t i \ I-equired t o buffer only ii tlit: hence the
amount of buffel. \pace required at a node i s very siiiall.

Data
Data Tail
Header Flits TI:+

I
1.0 Introduction

. _.
M assively parallel processing systems which use o\'er
4000 processors are being conceived foi- achieving ter;i-
tlops performance needed i n adcit-saing grand challenfe
Z m
probleiiis of computing. Such machines are built as dis-
tributed memory multiprocessors, and data sharing among
Time
processors must take place through iiiessage passing. Ell-
cient inter-processor coiiimiinicatio~iis ;I necessity in tiias- Figure 1. Wormhole routing
S I vely p a r a l l e l c o i i i p u t e r s . T h e s e p i ~ o c e s s o r s ;ire

i ii lei-coi i iiec ted u s i 11g s o 111e st ;I11d a i d to po I ogy a 11d lilt: s-


During tt-~i~isiiiission, the header flit gets blocked if the oiit-
sages itre i-outed hy f o r w a r c i i n ~thi-oLigh interniedi;ite
piit ch;iiiiiel is already assigned t o any other packet. Since
node\.
o 11I y the Ii e ade I- tl i t li ;is the des t i n at ioii add re
. .
~ - e i i i a i i i i nHits
~ i i i u \ t wait i n their channels until the header.
1.I Wormhole routing using virtual channels
tlit ciiii niake pi-ogress. The physical channels ~iseclby a n y
Wormhole routing is illustrated i n Figui-e I . I n woi-iiiliole of these blocked flits can not be used to route another
routiiig[(,,c)],a packet is divided into smaller units called packet. A solution f o r this problem suggested by Dally[2],
,flits (tlow control digits). The packet is broken down into is to use i,ii-tiiol c l ~ ~ r i i i i e In ~ l .this
~ . solution, iiiany virtual
one /ic~ider,flif, one or more clritci,~/itsand a tai1,flit. Only channels are multiplexecl on to a single physical channel.
the header flit contains the destination address; therefore i t much like ;I I-oad i s divided into several lanes. Thus, even
is the header tlit which governs the route. and the reiiiain- if a virt~iiilchannel gets blocked, flits through other vii-tual
in,< flits follow the headei- tlit i n ;I pipelined fashion. The channels can make iise of this physical channel. Hence i t
decision about the I-oute c;in be made ;is soon as the heatici- one sequence of tlits gets blocked, i t will not affect other
flit is available, hence the time takcn h y the packet to rcxcli t l i t q i n transniisi;ion x i c l we make higher iitilization of the

58
0-8186-7082-7/95 $04.00 0 1995 IEEE

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
13h y si c al
resources. Vi t-tti a1 c h a t i t i e 1s are in i p 1eine t i t ed by 1.5 Hypcrcu b e iiitercoiinectiori
allocating separate buffers for each of thein. Virtual chan-
The hype I-cu be i II t c rc o ti ti ec t io n t i et vlio I-k I 6 Ii ;I x bee t i
nels also allow one to introduce deadlock free routingl.31.
employed in ;I number of commercially succesdul p:ir~dlcl
1.2 Fault-toleraintrouting computers .;ucli ;is the I n t e l iPSC/2. Intel iPSC/X60.
NCURE-2. the connection machine and so on. The x l v a i i -
It i
i n assi vel y para1le1 coni pu tet-s, the occurrence of node
tages of hypercube i ti tercon nec tio ti are its mod U 1 a1-it y.
and/or l i n k faults is of high probability. Thus ii routing small degree. small communication diameter ancl fault-tol-
algorithm which is oblivious to network conditions s u c h erance. An /r-cliniensional hypercube consists of 2" tic.ltles.
as faults is of little use. Depending on the network topol- labelled using /!-bit strings. Two node!; i and,j in ;in /r-D
ogy, there will exist more than one routing path between hypercube Lire connected if and only if their bit atldt-esses
two nodes of a multicomputer. A f;iult-tolerant t-outing differ in exactly one bit position. Thus Ithe degree of c x h
algorithm is capable of successfully forwarding the tiles- node i n an /1-11 hypercube is P I , and the total n u m b e r o f
sage to the destination node in the presence of one o r more links i n that i s 1!.2"-'.The node synimetry of the netvliork
faults. as long 3s a routing path exists between source m c i gives it node f~iiilt-tolerance,More recently. liypercitbes
destination. and meshes have been generalized into k-ary /r-ciihes by
1.3 Motivation for hardware routers Dally and Seitzl3l.

Considerable amounl. of computaiions must be pet-forinecl 1.6 Aim of this work


as part of a routing algorithm. These computations involve The ob,jective of the work is to develop 'Verilog tnodel of ;I
the decomposition of the message into flits. the reassembly for use i n ii miilticoiiipi.itei..
coiiiiiitinicatioti cop~-ocesso~-
of the tlits received into a message. the application of the Veri 1 o g H DL is chosen t o identify d i ffere ti t fu tic t i 1.) n al
routing function to determine the output channel along blocks iiecdeci i n the coprocessor and verify its behavior
which the flits must be forwarded, copying the flit into Rit- by simulation. We ;dso simulated ;I real ti me environinent
buffer, and book-keeping activities t o maintain the status of a 3-D hypercube topology which uses this c o p r o ~
of the current node and that of the next node. A software for coniniLi ti icati 011.
simulation of the wormhole router implemented a t IIT
Delhi [SI uses about 2000 lines o f C code. Since routing i s In out- design o f the communication coprocessor chip.
one of the primitive functions in a multiprocessor environ- em 13hasi s is g i vt:n to i nip I em en t worm hole roil t i ng IIs i ng
ment, it is desirable that it is a s fast ;IS possible. This moti- virtual channels ,with static fault-tolerance for link failures.
vates us to develop h,ardware routers. Although we have verified the design of the c o ~ ~ r o c e w ~ r
for ii hypercube interconnection, the router i s not specific
1.4 Earlier hard ware routers t o any particulm. topology; the router is provicled wiith a
While much work has been reported i n the literature on tlie p r o g r a in ma bl e I oo k u p table t o i in pl e me n t the roit t i iig
s i mu I ;I t ion and perfo rm a nce an a Iy s i s of different ro uti n g function. The router niaiiitains ~i,firn/t~iicc.to/' t o storc the
techniques (2,3,9,S], very few attempts have been nmnde to status of links, F o r each physical l i n k , separate receiver
i in p I e in e n t these ro uti ti g a I gor i t h 111s i n ha t-dw a re. Fo ti r a t i cl t r;i n s ni i t t er mod ti I e s are provided 1 o e ti a b I e si I ii iI It ;I-
notable contributions in this diiection are neous tr;insnii:;sion and reception. This chip can be inter-
faced with any c:oninierciiil processor ;is host. using im X
The Message Driven Processor 141 bit data bus and ;I control bus. When the host processor
The Mad-Postman Network Cllip 1x1 wants t o send :I packet, i t is I-equired to write tlie packet
Wormhole Router Using Virtual Channels[ IO] into the packet buffer of the coprocessor through iiti inter-
face protocol. The coprocessor handles the subdivision of'
* Hardware Router for Star Graph 171
the packet into flits and their transmission; it ;iIso ;iswii-
bles the tlits that are received from another node. The
The hat-dwmr router discussed i n this piper is indepenclent
iisseni bled packet can be accessed by he host processor
of the underlying topology. This tlexibility is achieved
through ;I stnnclaid intcrfiice protocol. The gener;il p ~ r p o s o
through the use of routing tables. We also show that h e
natuie of the cle:;ign enables any network topology t'o be
ro uti ti g t ab1es can b'e ti1od iti ed t o ;IC h ie ve fa u I t - t o I e r;i n t
realized using th: coprocessor chip: choice of the topology
routing. We have verified the working of this hai-dware
i s limited only by thc nuniber of physical links.
I-out er through s i in u Iat io n i 11a 3 -cl i in e tis i o n al hype rc II be
network. I n the following scction. we describe the ai.chiteclui.c of'
the coproccssor chip including the fault-tolernnt routing
strategy used. The behavior modeling of the copi-ocessor

59

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
using Verilog HDL is described i i i Section 3. FinnlIy i n ment the coprocessor chip, i t will be difticiilt to support ;I
section 4 the conclusions ancl future n'ork are presenteil. of buffers on chip. In the present design. the
l a i y ni~nihet~
packer size is chosen t o be 40 bytes. which i s suificieiit for
2.0 Architecture of the communication ;I p11~~1I1el ng syctem to transfer clata at ;I high rate.
pi~ocecsi
coprocessor
The tlit size should be ;IS small as possible so thal the t i m e
Architecture of the pi-oposed coprocessor i s desct-ibed taken f o r its transrnicsion is reduced; ;it the s m i e time.
here. A careful examination of the routing process r e \ w l s other o v e r h e x s such ;IS the virtual channel nuniber and
that there are four impoi-tant subtasks involved: the tlit-type should not dominate the size of tlie d a t a lieltl.
I n the design. flit size is taken as 20 bits. inclusive of'2 bits
Receiving flits fi-om a neighboi-in: node for \,irtiiaI chmnel number and 2 bits for tlit type speciti-
Transmitting flits t o a neighboring node. cation. Thu\. each flit carries 2 bytes of infoimation.
Deciding the chainel throiigh which ;I f l i t niiist be tcir-
warded (routing). Currentl>. t h e nuiiibei~of physical chnnnels provirlerl i s 3.
since this will enable 11s t o implement ;I variety ol' inter-
Host-processor interface, which involves assembling
connection networks. such ;IS .?-D hypercube. Cubc coil-
of f l i t s into messages a n d disassembling a iiie\sage
tiected cycles. &stai-. 3-11 mesh, and so on.
into flits.
Siniulatioii results [ 2 , 3 ]have shown that as tlie number of
We have organized the coprocessor into four major blocks.
virtual channels is increased, the network throughput satu-
one corresponding to each o f the subtasks mentioned
rates. It W;IS observed iii 12.1 that the number of virtual
above. The above decomposition of the routing proce\s
chnnnels ai-e in the I-ange from 3 t o 5 gives better through-
into subtasks i s also useful i n inodeling c o p r o c e \ \ o r
put to cost I-atio. In the coprocessor, 3 virtual channels are
behavior using Verilog. The block cliagrnni of the copro-
p t-o\,i(led per pli y s ical chon ne1.
cessor i s shown in Figure 2. We nnw describe the futictioti
and the design o f each of the ~ntlividt~al block of t l i e
If the niinibei- of lines 1x1- physical l i n k is increiisctl. tlic
coprocessor. ti-;instiiissioti rate increases: but at tlie same time. the
iiii~~leiiieiitatioii cost goes up. On tlie other h a i i c l i f ' t h e
number 01. lines is I-educed, it will increase the transmis-
s i o n time. I n 0111' design, two lines are provided per physi-
cal channel since it niakes design of internal inodules easy.
Only one line is usecl for hand shake. Separate wires are
used for each direction. for both data transmission ;IS well
a s ti and shake,

Eight hits ai-e ii\ed t o repi-emit node acldress. Once again,


thew niany bits suffice foi- a number of' interconnectioii
\ 3-D hypercube, 3-D niesh. : I d 3 - S t ~
n e t ~ ~\ t ~d 1i ;IS
Figure 2. Block diagram of communication
coprocessor
The t-outitis table c;in store output channel niiiiibei- foi- a11
the 2.56possible addresses. Each I-outing table entry con-
2.1 Chip design considerations tains the oiitput physical channel number ancl virtual cliaii-
The host-processor interface bus width i s chosen to be 8 ne1 nuniber. Tliet-e ai-e N such routing tables to give N
bits so that the chip can be interfaced with any coiiiniercial alternati\.e I-outes required foi- fault-tolerant I-outing.In ;I
processor. This i iiterface is bidirectional , a n d in i 11i ni a1 3-D hyper-cube only X xldi-esses are ~ i s e dfrom the avail-
changes are required to redesign the chip to support ;I able 256 xlclresses.
larger bus width. The size of th.. packet does not 1i;ive
2.2 Protocol for conimunication
much impact on the perfoi-mancr of the network. except
that it will take inore t i m e to transinit ;I lxgei- packer coiii- All the blocks of'cotiiiiititiicati(~ncoprocessor chip operate
pletely; those packets which ai-e blocked will have to wait at 21 hxse clock ft-eqtiency. The baud irate for data transmis-
for a longei- time. In our design. the packet size is liiiiitcd sion ovei- the links is set to one fourth tlie base clock fre-
only by the buffei- size that c;111be suppoi-ted by tlie txget quency. This strategy will assure proper cotii~ii~~iiici~tio~i if
iiiipleiiientation. For example, if FPGAs ai-e ~iaedto iniple- the noclcs are opera t i ng a s y nc hro no LIS I y . p r o v i detl t hc

60

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
clock frequencies are same for all nodes. When a tlit is t o Fnult Vector
be trmisiiiitted, tirst start bits are sent, followed by the \,ir-
tun1 channel number. the flit type (liexiel-. data or tail), and
tlie information bits (see figure 3 I>&). The hancls1i;ike
signalling consists of ii s t a r t bit followed by the virtual
channel number i n bit-serial fashion. The routiny table at node i stores, for each possible desti-
nation,i, tl ~rlterixitepaths to reachi from i. Thus the size of
tlie routing table is
I.llog(t/) + log(v)] bits
,Sire = l(N-l),c/i-l

where N is the total number of nodes, tl is the degree ot'the


node and I ' is the number of virtual channels per physical
channel. For a I?-climensional hypercube the size o - F the
routing table i s :i~)proximately0.25 Mhits. Hence wc
believe t h a t the routing table approach i s practical with
present VLSI techiiology. The status of each of the physi-
(ti) Flit Transmission cal links of' ;I node is stored in each node which we call ;IS
f ~ l l l l tvecto1

(c) Handshake

Figure 3. Protocol for communication

A set of control signals Reset. ChipSelcct, Read, Write.


Bfi-Empty, and PktReady are pi-ovidetl to interact with ii
host processor. Thest: signals serve the obvious functions
that ciin be seen fi-on-ithe names.
2.3 Static f au It-to lerant routing a Igo r it hm
We now describe how ;I routing algorithm based o n ;I
lookup table approach can be enhanced to achieve fault-
tolerance in a simple and elegant way. We shall assume the
.strrtic,firult-to/erwnt model, where each node has apriori
information about the fault status of all its communication
links. The format of the routing table and the fault vector
are given below.
As can be seen, the algorithm returns the first non-faulty
physical cliiinncI ancl virtual channel pair coi-r-espc)iidingliiigto
the clestin;ition D . If all the physical links (of'cotii~se.,
-
except the l i n k through which the flit was received) are
Node I found t o bc faulty. the pucket is dropped.

I t ci111be o h a e r v e d t h i i t whet1 O U f~iLllt-tolel-~ilit


~ 1-(11ute1.
Dest. Forward N re.jects ;I picket a t ;in ititetmediate node i, there m a y 'exist
path # 11 an alternntiive path which does not go through the i i o d e i.
To find such an alternate path we will need to return the
packet to the forwxding node. This would in t u r n result i n
Dcst. the possibility of liiv~loc~k. i.e. the packet I-epeatetlly ti-avels
herween a set of ncxlcs, without e v e r reaching the dcstina-
tion. Avoiding livelock involves storing the history of
message routing along with the message itself. In the cur-

61

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
rent iin~~lcnientation,
we have avoitled livelocks b!' ili-op The tixi\iiiission includes sending the start bits, tlie v i r t d
ping the packet when the message I-exlies ;I blind :\lie). channel numbet- ir> and the data in the corresponding tlit-
buffet- f / i If the f l i t to be transmitted is a tail tlit. the status
2.4 Receiver block information of' i h changed so that i t is n o t assigned t o

Each physical l i n k is p r o v i d e d w i t h \eparate i-ecei\ t i . an! packet. after tlie ti-xisiiiissioii of tlie tlit. Foi- a n y i l i t .
blocks. A receiver block h a s N,, llit-hulf'ei-s. where N,, i \ aftei- i t \ tixn\iiiisaion the 5tatus of the virtual c1i;iiiiiel. I ) ( . is
tlie number of virtual channels supported per phy\iciil \et to blochecl state. x i c l will be reset only when i t receives
channel. Each of these flit-buffers is capable of storing cine ii handhake col-i-esl~oiiditig to IT.
fl it . The rece i \,er block C O iit i n ti o 11sI y iiion i tors the i 1113 u t
physical channel for incoining start bits. When i t sense\
2.6 Router
the arrival o f a new nit. that is stored in corresponding fit- There is only one I-outer. for routing purpose, in a copro-
buffer. It concurrently monitors tlie status of its flit-hufl- c e \ s o r. T 11i s 111o d u I e ;is s i g n s the output c h ii iine I t 11roU g h
ers: if any one of these flit-buffers j'h has been emptied ih! u.hicli ;I \cqueiice of tlits belonging t o a packet should be
the transmitter block, as will be seen later) the recti\ e r routecl. Thi\ decision is iniide when a header tlit ai-i-ivesi n
must send ;I handshake signal to indicate i t is i.eiict> to any o f tlie tlit-buffers. The mapping of tlit-buffei-s to vi]--
reccive the next flit of t h i s virtual channel ,j'i'i. Each llit- t u at c1i;iii tie Is i s i in plenien teil by tlie r o u t iiig fu i i c t ion.
bu ff'c r generates b ,!f%/.-=/I I / si g ti ;\I :I 11(1 flif-rj,p s i 2t i ;II\
, (2 h,Iiicli i \ \pecitic t o the interconnection network being
which are used by router and truii\iiiitters. This recei\.cr i in p I e in e 11t ed . I n o ti r cl es ig n. the 111a p p i 11g fu iic t i o t i is
block i s ;iIso capable of dropping the tlits if the router finds i 11113 le ine 111 eci t li I-ough ;I RAM- based lookup tab1e , 211 Io w -
no path to forward the packet. in; ;I cigniticant degree of tlexibility. The routing table is
initialized hj, tlie host i ~ ~ o c e s s wheneveror the chip is reset.
2.5 Transmitter block It u s e s three separate lookup tables for three alternative
There is one transmitter block f o r each physical l i n k . I t routes foi- ;I packet.
inultiplexes N, virtual channel to ;I single physical cliiiii- The router module uses a set offlaps which are set when ;I
ne1 in ;I I-ound I-obin fashion. The mqiping from the \ i i - t w l header tlit arrives in tlit buffer. These flags are scanned i n
channels to the tlit buffers is done by the router deact-ihecl n round-I-obin fashion. If ;I tlag,fis f o u n d to be set. it gets
next. The transmitter uses this mapping t o associate ;I p;it.- the dest i ii at ion address from the correspondi iig tl i t-buffer
ticular Ait-bui'fer with each of its virtii;iI channels. I t :I flit f h . Using t h i s ;itidress i t gets all possible routes from the
is present in the flit-buffer corresponding to a virtual chaii-
routing table\. Then f'or each of these output channels, oc,
ne1 wc, then the f i t is picked up for triiiisinissioii. Aftei- the
it checks t h e fault vector if or is faulty. If it finds that oc i s
flit has been tronsniitted. anotliet- ilit c;iii be tran\mittecl
not fault>. i t c h e c k \ if tlie output channel is ;ilready
through the virtual chmnel IT only after- I-eceiving a liand- assignet1 t o ;rny other ixicket. If it is n o t assigned the rout-
shake signal, signifying that the n e i g h b o r i n g node is I-eacl!
ers assign\ f h to o c . otherwise it waits till ( I ( ' is freed. I t the
to receive the next flit through I ' C
ixitttei' tincl\ that all the output channels except tlie one
through which i t came are faulty, i t informs the receiver
Tlie transmitter block iiia i nt ai lis stat115 i ti f o r m at io n c )f e ;IC ti block that tlie packet can n o t be forwarded a n d all the flits
of' its virtual channels. These inforination associatecl with ;I
in t h a t picket will be dropped by the receivei-.
virtual channel I'C includes
2.7 Processor interface block
whether w is currently ;issigned to a n y packet.
the flit-buffer address to which I T i \ presently There is one p i w ~ s s o rinterface block for each node. The
assigned. functiotis carried out by this block are the following.
whether vc is in the blocked state. (:pori I-czct. ;icccpi cliit;i f r o m the host-processor iititl pro-
gi-atii ( l i e i ' o i t t i i i ~tablc.
Foi- :I V i i - t u L i l channel i u the transmitter begins t r a n s i n i s Acccpi (I p x h c t 1roiii lhohi-pi-oceszoi- iiiid cliwszcinhlc thc
pirchci i n l o ii hcatlci-. data. ;incl tail Ilits. The Ilii\ ai-c sioretl
sion if tlie following conditions are met. i i i ;I flit-bul'l'cr otic by oiie.

I,(. is assiplied to a packet. Ahsciithle tiic (lit\ clcstinccl to thc cui i-eiit nodc into packctz
;Illcl lot \ \ w c l Illell1 i o host-pl-ocessor.
IY is not in the blocked state. i!td

there i s ii flit available i n the corrcspoiiciing buffer fii. There are two packet-buffers available, one t o store the
for t ran s i n iss ion through I'C. picket to be transinilted a s tlits, a n d another to buffer iill
the flit\ of ;I packet that are arriving. It divide.; the packet
to be transiiiitteil into tlits ancl stores them in ;I tlit-buffei-

62

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
which is treated similar t o tlit-bul'l'ers i n receivers. This I n atltlitioii to the iiiodule\ that are describcrl h c l o ~;I' I I U I I I -
b 1 oc k i n t erac ts wit li the host p I-oce s s o r ;in d ge 1-1 c r;i t c \ ber o t sripportin:; modules were reqLiiretI to complcte
required control sigiials. This interface d s o diverts the i n - bc h iiv i o I-. The y ;ire hi1 n d s h n ke sign ii I trans iii i I t c i ~;I11tI
ti al data writ ten on to the roti t i n g t ii b Ies a 11d fau 1t- vec tor. reccivers ;:incl niiiiilw of' hnsic gates.

3.0 Behavior modeling using Verilog HDL The iiiodeling of i m c h of' the functional blocks ciescribetl
above are descrihcd below. Vel-ilog-like psetido cocle is
Modeling of the corninuriication coprocessor described i n used to de:,cribe them instead of actual cocle for reasoiis of'
the previous section:; was carried out using Verilog Hard- brevity. The tasklike functions that are rcl'el-i-ed below may
ware Description Language[ I I ] _ Verilog HDL provides take nioi-c than on(: clock cycle to finish execution.
great amount of tlexibility t o model the behavior of ;I s y s -
tcm. A module is the basic u n i t i n Verilog. I t represents
3.1 Rccciver module
some logical entity that is ~isunllyiniplenientetl by ;I pi Three ixxt.i\,ei. blocks rcquired by a c o p i " x s s ( i t - to li,.indle
o f hardware. Using the various c k t a types and ~ ~ r o c c c l ~ i r a l three p l i y ~ ~ i c alinks.l The receiver block i \ motleleti i i \ i i
state111en ts av ai 1a b I e , ii I1ii rdw are c i 1.c ti i t can be 111ocI e I ecl t i 11i t e s t ;It e iiiiich i ne. w h ic h takes i npu ts 1'1-om t lie plij)\ i cii I
accurately. To con t 1.0 I sc hed ti 1 i n g of' exec U t ion, d i fferen t I i n k ancl tl i t - bti iYe1-s. ancl 2 i ves control si glial s to 11i t - hiiff-
timing structures are provided. The language has the capii- el-s to ston: (lata ancl to set header tlit flag i i i the rou1ci-. if
bility to apply inputs and display messages, values and there is Iit:;ider tlit. The pseudo code for this nioclule is
waveforms of various signals whenever required. g iv e n be Io w.

module Receiver (Reset. PhyI. clock, ..):


In the Verilog impleinentation of the coprocessor, various
// d ec Ix i t i o n s
functional blocks dlescribed earlier are transformed into
modules. These are i n t u r n instantiated i n a higher level
module to form the coprocessor with ports formin, IlltcI-- (' '
always @i Reset j
connection pins. In this packet ht:ffers. fiitilt vector. st;ittis if (Rcsct j
variables etc. which have to stoi-e data ai-e decliii-eti ;is // ;issi;gn default \~alues
array of register vectors. The physical links are cletjnetl ;IS else
separate input and output vectors. The bidirectional ititel.- // cleiiuign \ ~ i i / t i c s
face to the host-processor is impleniented usin,0 Inout
. vec-
tor. T h e modeling e n s u r e s that all blocks o p e r a t e s always @ iposeclgt: clock)
synchronously. In or'der to help this ;I global clock sigmil is begin
u ~ e dand
, its rising e'dges are used to synchronize the oper-
case (state)
ations. The module declaration of the top level module
0 : if (phyl == Stai-t-bits) state = I :
coprocessor is given below.
1 : get_\~irtiial_cli~iiiiiel-iiLiniber;
module Coprocessor (PhyI I , PhyO I , Hand1 I , HandQ 1. state = 2;
PhyI2, Phy02. H~indI2.Hand02. 2 : Stc,irePflitPlbtiffer:
Phy13, P hyO3. Hand 13. hand 0 7 . stale = 3;
ProcIO, 3 : UpclatePs ig n al s :
ChipSel, Read. Write. Reset, state = 0:
B frEin pt y, P k t Read y, CI oc k ) ; encl c ;Is c
end
The tlit-buffer is implemented 21s a module and this is
endmocl LI I C
instantiated ten times t o get so many tlit-buffers. This flit-
buffer module uses register variable to make the storage 3.2 Transmitter module
element for data. This also generates bufjii~r.-firl/ a n d
,flit-typr signals. The routing table is a l s o iniplementecl ;IS The nunilxi- of transmitters q u i r e c l is a l s o thi-ee. Thc
a module with register type storage elements. The input to transmitter module is formed as it FSM. This will w a n
this is the destinatioii address and the outputs are thiec di1'- throtigh status information of each virtual channel i l l one
fei-ent routes. After- reset this module intializes its stoi';ige state. ancl will niovc t o othcr states if a tlit ciin be t r m m i t -
e leme I1t s. tetl. I t takes input:; froin status signals and tlit-bufftrs. It
o ~ i t p ~ i t di ;a t a t o phy\ic:il l i n k a n d c o n t r o l sign:ils to tlit-
buffera, router and f'or handshake. The pseutlocotle for the
transmitter modulc is givcn below.

63

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
module Txmitter (Reset. BfrData. <tatus. phyO. clock .. j: <;l;rte= 3:
I/ d ec 1aratio tis 3 : 101-(e~icIi-~iltci.ii~itive_paths)
/I Reset t x g i ii
j state = 3:
i f (cli~iiitiel~not~faitlty
always @(posedge clock) end
begi t i set-tlro 11-pac ke t :
case (state) state = 0:
0 : foreach (virtual channel j 3 : if (clinnnel_not_~issigned)state = 4:
begin else state = 0:
..
it (can-be-txmi tted) 4 : assiyn status:
state = I ; i t a t e = 0:
en tl e I1de ;iic
I : send-start-bit: end
send-vit-tual~chatitiel_nLlriiii~~e~: enilmoilLllc
state = 2;
2 : transmit-fit: 3.4 Processor interface
state = 3: The pi-ocessot- interface is divided into two niodules -
3 : Update signals; proc-I-x atid pi-oc-tx. Module proc-I-x receives the packet
Update status: froin host-processor and divides that i n t o flits and stores
state = 0; the flit i n ;I flit-buffer. Module proc-tx assembles the tlits
endcase that are arri\,ing and delivers it as a packet to the hostpro-
e I1d cessor. Pseudocodes foi. these two modules are given
be low.
end modu1 e
inoiltile Proc-rx (ChipSel. Write. BfrEtnpty. ..);
3.3 Router I/ t l e c l x a t i o ti s
111 acidition to the FSM for i-outing f'unction. i t acts a i the I/ Reict
ce t i t I-aI I-eposi tory fo I' status inf o I-t i i at i o n assoc ia ted w i t li
all virtual channels. It takes inputa from all the modules. illLvays @ (poseecl~eclock)
i iic I id i ng I-out i iig tab I e , a n d fau I t-vec t o I- and gen e i ~eii case ( \ t i l i e )
outputs in the form of status signals. The pseudocode for 0 : if iChipSel && Write) state = I :
this module is given below. I : recei\e_packet:
module Router (Reset. BfrFull, statu-control. clock. \t;ite = 2 :
Desti 11at ion-ad&. stat tis, Tab1e-add r, 2 : itoi-e-flit-type:
OutPut-Paths, fault-vector. ..): \talc = 3:
11 declarations 3 : stoi-e-flit;
/I Reset state = 4:
1 : M~ait_foi--flit-tr;insmission;
e
a1ways (status-col1 t rols ) state = 5 :
I/ upda te-s ta t us : 5 : if (p;lcket-eiiipty) state = 0;
else state = 2:
i d ways 8 (Bti-FuI I) endcase
/I Set Flags end
endmoclule
always @(posetlgeclock)
hepi i i nioclule Proc.-tx (ChipSel, Read, PktReady. ...)
CLISC:(state) I/ dec I x
it io l i s
0 : foreach(F1ag) I1 Reset
if (Flag) state = I :
I : get-destination address; a l w a y s @(po\eclgt: clock)

64

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
begin the respective nocles. These tiles are read by / i o r / r modules
case (state) to initiali;re their respective routing tables and fault vec-
0 : if(Packet_can_be_asseiiiblc~l)state = 1 : tors. The disk tile corresimiding to node i also contains the
I : assetnble-Hit_ititt,_l,acket: nieswge tl:) be transmitted by node i.
state = 2;
OLII- intention h a \ becn t o use the Verilog model of'the
2 : if (packet-coniplete) pktRexly = I ; state := 3 ;
c o l ~ r o c e s w rto vel-ify the functional correctness of the
else state =: I ;
wormholc routing algorithm. The model has alw helped
3 : transmit-packettohostpl-ocessot-~ l i s i n logically orgnnizing the various functions of the
state = 4; coprocessoi- into separate building blocks. The simulotion
4 : Update-status; of the 3-I3 hypercube network itself has given us a high
state = 0; degree of contidence i n o u r coprocessor design. The net-
endcase work sinilllation Iias enabled us to understnncl and debug
end the fiiu I t- toleran t rou t i ng algorithm .
end modti I e
Each noclr. records all important events onto a log file with
3.5 Testing of the communication coprocessor
These events can be of' the l'ollowing
timing inf'ortii~itic~~ii.
In order t o test the router by siniiil:ition :I top level niodiile type.
NETWORK is formed. This instantiates many nodes
Split ;I packet into flits
which i n turn is a combination 01' coprocessoi. a n c l ;I
pseeudo processor which controls the coprocessor. These selecl routing path
nodes are interconnected to f o r m the recluii-ed topology. Transtiiission of ii llit along with line of transniission
The pseudo procesmr will initialize the coprocessor after * Reccp~ionof flit and handshake
reset, and will write the required t'aault-vector and ]packets
Complete reception of a packet
for t ra ti s m i ss i o t i . The o vera I I h i e r ~h i c a 1 s t ru c t u re 11 sed
for simulation is shown in figure 4. Discarding ii packet if no path is found.

I module NETWORK I By analyzing the log file. we are able to trace the history of
each packet. I n other words we obtain snap shots of the
network through the analysis of the log file antl are able to
1 ocate i ni pI e inen tat i o ti ii I bugs, We found the $cl i sp I ay

AA+- conimmd to be niore usef'ul in this context than the wove-


form display method.

To verify the be routing algorithm i n the presencc of


faults. we purposefully created l i n k faults and observed
the routin>>xitterns.
1'

4.0 Conclusions and future work


In this poper, we have discussed the Vel-ilog simulation of
the coni ni LI ii icatii.)ti fu tic tions of it hypercube ti1 LI I t i c o n -
puter. The ititerc(:iiitiectioti network has been simulnted at
the t h tree Io w e I- ti1 o s t n e t w o r k 1ay ers , n a ni e I y physic a I
layer, d a t a l i n k layer, and the network layer. At each node
Figure 4. Verilog structure for simulation of the ni ti It i c o ni 13IIte I- i s a C O ni ni ti n i c ;ct i o ti coprocessor
w h i ch hand I es t he ap pi-op i ate fu nc t i on s ;it ;ill the three
To debug the design of the coprocessor, we siinulalted ii 3- layers nientionetl iibove. We have designed sLicb ii coniniit-
D hypercube network i n which the nodes of the hypercube nication co~>roces~sor and verified its behavior. The inter-
utilize the fault-tolerant routing algorithm for message esting feature of the communication cop~-ocesso~- is that i t
routing. The module / I P ~ L I V J Jinst:intiates
~ 8 copies of t n o c l - employs ;I fault-tolerxit routing algot-ithiii. The wormhole
ule ~ i n r l c .We used X disk tiles, one f o r each node of t h c rout i ti g tec h n i cl LIe g i ves ni i t i i muin message t I-iinslei- t i me,
hypercube, t o store the routing table x i d fault-vectors f'or and the use of \,iitual channels ensures niaxiniuni utiliza-

65

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.
tion of the physical resources such as comniunic;itioii W I Dally et iil . T h e M e s s a p c - D r i \ i c n Proccswr. IEEE
links. The communication coproceswr has b e e n desifnetl MICRO.Api-il 1092. 171123-39.
a'r ;I general purpose chip, and is capable of implenienting
a t i y s t ;it i c fau 1 t- to1era t i t r o u t i ti g al go 1. it h in by pro i d i t i g
\)
P E:is\v;ii.. Adapiivc Dc;idlock-l'i-cc Roiitiiig Algoritlinia loi
progrmniable lookup table and ;I f a u l t vector. By iusiiig SI ;I I G i-;i p h 5 . M - ' k c h 'I'h cs is. Dc pal-tmen t of' M at hc m;it ic s .
Indian Institiitc ol'Tcchnolopy N c w Delhi. Intli;i. 1001,
in d t ra ti s ni it ters for e x h p h y s i ca I c li ;i t i -
tiel, this chip will be able to give high throughput. E \ e n
though for testing the design. 3-D hypet-cube topology 1i;is
been used. the overall design is n o t restricted t o any p x t i c -
u l a r topology. It is pi-ovided with ;I standard interface \o A Kticlilous. VLSl Implcinenation 01' a Fault-tolcrxil Also-
that it can used along with m y of the cominercial 1>roces';- I-irhiii ior Star Gi-aphs. B-Tech Thcsis. Dcpartnicnt 01. Elcc-
sors. Ii-ic;il E t i g i n c e i ~ i n g ,I n i l i a n Institute 01' T e c h n o l o g y Nc\u
Dclhi. liiclia. 1993.
In our future work, we intend we intend to synthesizz t h i s
design and map them onto FPGA. By using layout gene].- P R Millcr. C R Jcsshope a n d J T Yantchcv, The Mncl-Post-
ating tools we plan to obtain the mask level layout of the n i m Sctiioi-h Chip. Proc Transputing l 9 O l . Vol 2 IOS U c l .
circuit. Also we want to make the fuult-tolei-ant algorithm pp 55 17-5536
more robust by ensuring packet delivery as long as ii path
exists between the source and destination. We are a l s o L M Ni ancl P K McKinley. A Sui-vey 01'Worinholc Rotitins
studying possibi 1 i ty of hand I ing other network cotid i tioiis Tcchnicltics in Direct Networks, IEEE C O M P U T E R . Fcbi-ti-
such ;is congestion. ill!) I 0 0 3 . ] p p 62-76.

A Shy;iiiipr:ihash iiiicl C P Ravikumar. VLSl Iniplcnicnla-


References iiori o l ;I Worniholc Router Usins Virtual Chaiincla. PI-0-
S B Akci-s, B ki-ixhiianiui-thy ;incl D tlarcl. The Stai. ;!rapli: ccccling\ of IEEE T E N C O N Confcrcncc. 1994. p p 1035-
An atti-active Alternative to the ii-Cube. Pi-occcclinp 01. l l i c l03C).
liilcrnational Confcrcncc on P;irnllcl Pi-oceasinp. I OX7. j p p
191-400. E Stci.iilicini. Riijvii- Singh and Yatin Ti-ivccli. Hal-clwarc
Motlcling w i t h Vcrilog HDL. A u t o m a t a Puhliahiiig C o n -
W J Dally, Vii-tual Channel F l o w Conti-01. IEEE Ti-:iiiuc- p;iny. Cupci-tino. C A . 1990.
tioiis 011 Pai-allel a n d Disti-ihutetlSystcins. 553.Mai-ch 1902.
pp 194-205. n in. Computer N e t w o r k s . Sccoiicl Ed i t i oil.
A S T ~ i i c bnu
Engle\\~ooclClii'Vs. N I : Pi-entice Hall Inc. 1992.
W J D a l l y and C L Scit/.. Dcaclloch-Fi-cc Message Rouiiiig
i n Multiprocessor Intcrconncction N c t w o r k a . IEEE T-;111\- Veri W e l l b1;inual. WcllSpring Solulions. CA. I902
aclions on Cotnputci-a. M a y 1987. pp ,547-553.

66

Authorized licensed use limited to: National University of Singapore. Downloaded on October 31,2022 at 11:59:36 UTC from IEEE Xplore. Restrictions apply.

You might also like