Professional Documents
Culture Documents
Small A: 0 Reduction of Hardware Cost
Small A: 0 Reduction of Hardware Cost
CHAMP
SRI I n t e r n a t i o n a l
Menlo Park, C a l i f o r n i a
349
CH1465-4/79/0000-0349$00.75 @ 1979 IEEE
Problem
_ _ _ -S i z e Computational R e l i a b i l i t y
i n c r e a s e s . The r e l i a b i l i t y of i n t e r c o n n e c t i o n s on
a semiconductor chip g r e a t l y exceeds t h e r e l i a b i l -
i t y of i n t e r - c h i p connections ( e s p e c i a l l y i f sock-
I CHAMP ARCHITECTURE
I
ets o r connectors are used). Without t r y i n g t o
d e l v e deeply i n t o r e l i a b i l i t y theory i n t h i s paper,
it is s u f f i c i e n t t o observe t h a t o v e r a l l system
r e l i a b i l i t y can be enhanced by reducing t h e number
of i n t e r c o n n e c t i o n s between chips. This i m p l i e s ,
I RELIABLE HARDWARE
350
almost without regard f o r t h e s p e c i f i c hardware The PCs are i n t e r c o n n e c t e d by dedicated
i n t e r c o n n e c t i o n p a t t e r n . CHAMP hardware a r c h i t e c - point-to-point connections between PCs. Each PC
t u r e is shown by t h e second l a y e r i n Figure 1. It h a s a number of 110 l i n e s ( u s u a l l y 4 t o 8 i n
w i l l be d i s c u s s e d i n g r e a t e r d e t a i l l a t e r on. number) f o r t h i s purpose. I n order t o achieve
f a u l t t o l e r a n c e t h e network must be a b l e t o assume
The t h i r d l a y e r i n t h e f i g u r e r e f e r s t o t h e a v a r i e t y of c o n f i g u r a t i o n s , and t h e o p e r a t i n g sys-
CHAMP o p e r a t i n g system t h a t d i s t r i b u t e s u s e r code, t e m must be a b l e t o handle t h i s v a r i e t y , even
d e t e c t s f a u l t s , r e d i s t r i b u t e s code for fault though t h e base-line c o n n e c t i v i t y i s a r e g u l a r l a t -
avoidance and a c t i v a t e s roll-back procedures. T h i s t i c e . F i g u r e 3 is a n example l a t t i c e connected i n
f a u l t handling o p e r a t i n g system provides t h e most a r e g u l a r two dimensional a r r a y f o r convenience of
fundamental l e v e l of f a u l t t o l e r a n c e . Although it illustration.
does n o t guarantee a n u n i n t e r r u p t e d stream of
c o r r e c t r e s u l t s i t c o n t i n u a l l y manipulates t h e
a v a i l a b l e r e s o u r c e s i n s e a r c h of p o s s i b l e l u r k i n g
faults. It a c t i v a t e s s p a r e c o p i e s of code and
i n s t i t u t e s roll-back t o (presumably) good computa-
t i o n a l r e s u l t s . The f a u l t handling o p e r a t i n g sys-
t e m is described i n g r e a t e r d e t a i l l a t e r on.
F i g u r e 1 shows t h a t t h e s e t h r e e l e v e l s of
b u i l d i n g blocks form t h e foundation upon which more
e l a b o r a t e software measures t h a t go beyond CHAMP
may be imposed t o o b t a i n high confidence i n t h e
c o r r e c t n e s s of a continuous output d a t a stream. An
0
example of such a d d i t i o n a l measures is SIFT
(Software Implemented F a u l t Tolerance) which i s an
SRI developed technique u t i l i z i n g m u l t i p l e (redun-
d a n t ) execution and m u l t i p l e (redundant) v o t i n g
a l g ~ r i t h m s . ~I n t h i s case a hardware f a i l u r e i s
Io l o i l o Io PROCESSING CENTER
351
s u p e r v i s o r d i r e c t s t h e performance of d i a g n o s t i c The s u p e r v i s o r p r o c e s s o r is shown i n F i g u r e 3
routines. Because e x h a u s t i v e d i a g n o s t i c s are n o t performing d i a g n o s t i c s on t h e DFT c h i p , p a r t of t h e
p r a c t i c a l , e f f e c t i v e d i a g n o s t i c s are c o n s t r u c t e d of t a s k p r o c e s s o r assets, which i n t u r n d e p o s i t s t h e
small routines t h a t e x i s t i n t h e a p p l i c a t i o n t a s k r e s u l t s i n a cache memory. L a t e r i n t h e c y c l e t h e
code ( f o r example, a matrix i n v e r s i o n ) . The pur- s u p e r v i s o r p r o c e s s o r w i l l examine t h e cache f o r t h e
pose i s t o s e a r c h f o r t h e f a u l t b e f o r e i t induces correct results.
e r r o r s i n t o t h e a p p l i c a t i o n p r o c e s s i n g o r , perhaps
more s e r i o u s l y , i n t o t h e recovery process. Because
of t h i s , much importance is p l a c e d on t h e p e r f o r - NETWORK OF TASK CODE MODULES
mance of d i a g n o s t i c s .
The CHAMP system d e s i g n u s e s a programming
Another f e a t u r e handled by t h e s u p e r v i s o r pro- methodology based on what we c a l l M-modules
c e s s o r is t h e f a u l t recovery process. This f e a t u r e (model-driven modules) developed a t S R I I n t e r n a -
i s covered l a t e r . tional. An M-Module i s an autonomous program
module c o n t a i n i n g a model, a set of v a l u e s , and a
-Task Processor--The user's a p p l i c a t i o n pro- set of procedures. The model encodes knowledge
grams a r e executed as t a s k code modules i n t h e t a s k about t h e real world domain of t h e module. Its set
processors. The t a s k p r o c e s s o r u t i l i z e s assets of v a l u e s i n c l u d e d a t a about t h a t domain and v a r i -
t h a t e x p e d i t e t h e t a s k execution. For example, a b l e assignments t h a t are made by t h e module
cache memories and d i s c r e t e f o u r i e r t r a n s f o r m (DFT) i t s e l f . The procedures i n a module are g e n e r a l i z e d
c i r c u i t r y can be included i n t h e t a s k p r o c e s s o r ones; t h e model c o n t r o l s t h e i r a c t u a l behavior.
d e s i g n when t h e a p p l i c a t i o n t a s k r e q u i r e s f a s t
f o u r i e r t r a n s f o r m (FFT) p r o c e s s i n g . When appropri- The concept of M-modules w a s developed f o r a n
a t e t o t h e a p p l i c a t i o n , t h e t a s k p r o c e s s o r may a l s o experimental system c a l l e d ACS.1 (Automated Command
be designed t o e x e c u t e d i r e c t l y a h i g h - l e v e l Support)6 t o e x p l o r e methods of providing automated
language. P e r i p h e r a l s may be a t t a c h e d t o t h e n e t - s u p p o r t f o r management. It w a s designed t o a l l o w
work as though they w e r e a d d i t i o n a l PCs o r even as u s e r s t o t u n e and modify t h e system t o s p e c i f i c
t a s k p r o c e s s o r assets. management needs. This a b i l i t y t o c o n t r o l t h e
behavior of a system's M-modules i s u s e f u l i n
The t a s k code modules processed by t h e t a s k a p p l i c a t i o n s of CHAMP. Of even g r e a t e r importance,
p r o c e s s o r s c o n s t i t u t e a h i e r a r c h i c a l network which however, is t h e f a c t t h a t t h e M-module approach
is mapped onto t h e l a t t i c e of homogenous PCs. makes a program a d a p t a b l e t o changes i n CHAMP
These t a s k modules i n t e r a c t w i t h one a n o t h e r by i t s e l f . I f t h e changes are t h e r e s u l t of f a u l t s o r
communicating messages either directly (if damage, r e s p o n s i v e n e s s and a d a p t a b i l i t y are neces-
i n t e r a c t i n g t a s k modules a r e i n a d j o i n i n g PCs) o r s a r y t o i n s u r e r e l i a b i l i t y and s u r v i v a b i l i t y . If
v i a i n t e r m e d i a t e PCs i n a scheme similar t o t h e t h e changes are t h e r e s u l t of adding r e s o u r c e s , t h e
r o u t i n g of long-distance phone communications. e f f e c t is t o make t h e system e a s i l y expandable.
0 DOMAIN 0 INFORMATION
ABOUT DOMAIN
0 CONSTRAINTS
MAINTAINED
AUTOMATICALLY
352
A s shown i n Figure 4, an M-module is a n auto- t o a n o t h e r , r a t h e r i t is r e c r e a t e d a t t h e new loca-
nomous program module t h a t is given r e s p o n s i b i l i t y tion. To provide fault-avoidance c a p a b i l i t y , f o r
f o r a p a r t i c u l a r segment of t h e system's activity. example, we maintain a t l e a s t one backup v e r s i o n of
It accomplishes t h i s by means of i t s t h r e e com- a n M-module's model a t another PC. Checkpoint d a t a
ponents. The model c o n t a i n s t h e knowledge used by f o r t h a t M-module a r e a l s o maintained a t t h e backup
t h e M-module (i.e., t h e r e l a t i o n s , g o a l s and con- location. Should t h e PC running t h e M-module be
s t r a i n t s t h a t t h e module's v a l u e s are r e q u i r e d t o found t o be f a u l t y , a s i n g l e command can cause t h e
obey). The s e t of v a l u e s give t h e d e t a i l s of t h e backup model t o be a c t i v a t e d and t h e new M-module
p a r t of t h e world f o r which t h e module is responsi- t o be i n i t i a l i z e d w i t h t h e checkpoint data.
ble. The procedures make t h e module i n t o an auto-
nomous u n i t a b l e t o respond t o e x t e r n a l r e q u e s t s o r So f a r , we have d i s c u s s e d i s o l a t e d M-modules.
commands and t o maintain t h e s e l f - c o n s i s t e n c y of The e n t i r e system of M-modules, however, can be
i t s set of v a l u e s according t o t h e r e l a t i o n s con- regarded as an M-module i t s e l f . For example, t h e
t a i n e d or implied i n i t s model. a i r s p a c e c o n t r o l system mentioned above can be con-
s i d e r e d as a s i n g l e huge, M-module whose model is
To i l l u s t r a t e t h e u s e of M-modules, c o n s i d e r a t h e union of a l l t h e models of t h e s e p a r a t e track-
system intended f o r t r a f f i c c o n t r o l i n a n a i r s p a c e . i n g and pair-watching modules w i t h d u p l i c a t e con-
One M-module can be c r e a t e d f o r each a i r c r a f t i n s t r a i n t s being e l i m i n a t e d . S i m i l a r l y , i t s set of
t h e a i r s p a c e and made r e s p o n s i b l e f o r maintaining v a l u e s is t h e product of t h e s t a t e v e c t o r s of t h e
information about t h a t a i r c r a f t ' s p o s i t i o n , d i r e c - aircraft. Its procedures are t h e union of t h e
t i o n , speed, and whatever o t h e r information i s a v a i l a b l e procedures, a g a i n after eliminating
needed. The models used by t h e s e M-modules encode duplications.
what i s known about t h e c h a r a c t e r i s t i c s of t h e air-
c r a f t being tracked--i.e., the relationships that I n t h e o t h e r d i r e c t i o n , a given M-module can
l i m i t its l o c a t i o n , speed and a t t i t u d e , given i t s o f t e n be decomposed i n t o a set of s u b o r d i n a t e M-
history. modules, p l u s t h e a d d i t i o n a l components needed t o
control t h e i r interactions. For example, a n air-
This a p p l i c a t i o n environment r e q u i r e s t h a t M- c r a f t t r a c k i n g module may a l s o need t o maintain
modules be c r e a t e d and destroyed as needed. For information about f u e l s t a t u s . I n t h a t case, i t
example, as a new a i r c r a f t l e a v e s t h e a i r s p a c e , i t s would be convenient t o consider t h e p o s i t i o n -
module i s no longer needed and can be destroyed. t r a c k i n g and t h e f u e l e s t i m a t i n g , f u n c t i o n s as
A s a n a i r c r a f t e n t e r s t h e space, a new module must e x e r c i s e d by d i f f e r e n t s u b o r d i n a t e M-modules.
be c r e a t e d . These s u b o r d i n a t e modules i n t e r a c t , of course, b u t
t h e i r i n t e r a c t i o n s can be handled conveniently by
I n a n a i r c r a f t t r a c k i n g M-module, t h e model messages.
encodes t h e dynamic c h a r a c t e r i s t i c s of t h e air-
c r a f t . The set of v a l u e s s p e c i f i e s t h e s t a t e of The advantages of decomposing t h e system
t h e a i r c r a f t as l a s t observed p l u s any e s t i m a t e d h i e r a r c h i c a l l y are twofold. F i r s t , by i m i t a t i n g
updates. Some of t h e s e v a l u e s are o b s e r v a t i o n a l ; t h e way i n which a person t h i n k s about a problem,
o t h e r s are e s t i m a t e d update v a l u e s generated by t h e i t makes i t easier f o r t h e non-programmer t o under-
modules using, perhaps, Kalman's algorithm. The s t a n d and c o n t r o l system behavior. Second, it
set of procedures i n c l u d e s t h o s e t h a t update t h e allows t h e imposition of a r i g o r o u s communication
values as w e l l as t h o s e t h a t a c c e p t d a t a , respond d i s c i p l i n e . I n p a r t i c u l a r w e are a b l e t o l i m i t
t o r e q u e s t s f o r information, and e x e c u t e o t h e r inter-module communications t o M-modules t h a t are
t a s k s t h a t may be needed. The procedures, however, s i b l i n g s , components of a s i n g l e M-module. In
are w r i t t e n i n g e n e r a l terms and s p e c i a l i z e d t o a d d i t i o n w e can s a y without going i n t o g r e a t d e t a i l
immediate needs by t h e model. The s i t u a t i o n is t h a t i t is possible t o specify t i g h t conditions
analogous t o t h e person who, upon r e c e i v i n g a under which we a l l o w two s i b l i n g s t o communicate.
r e q u e s t f o r a c t i o n , c o n s u l t s a p o l i c y manual ( t h e These c o n d i t i o n s a l l o w t h e system t o run completely
model), t o determine how t o respond t o t h e r e q u e s t . asynchronously w h i l e providing assurance a g a i n s t
deadlock, a c o n d i t i o n i n which, f o r example, Pro-
A s has been i n d i c a t e d , M-modules can be e a s i l y cess A must complete some a c t i o n before Process B
modified t o meet new o r changing conditions. This can act, and v i c e versa.
is because t h e models are e x p l i c i t and remain
a v a i l a b l e f o r s t u d y or change. I n t h e example d i s - We have n o t been a b l e t o s p e c i f y c r i t e r i a t h a t
cussed, i f new information should become a v a i l a b l e w i l l a l s o a s s u r e freedom from s t a r v a t i o n , a weaker
about t h e c a p a b i l i t i e s of one of t h e a i r c r a f t being c o n d i t i o n t h a n deadlock i n which, although t h e r e
tracked, t h i s information can be added t o t h e model may be no l o g i c a l reason why a process cannot be
without r e q u i r i n g changes elsewhere i n t h e module. completed, t h e r e is a f i n i t e p r o b a b i l i t y t h a t i t
The procedures used by t h e module are c o n t r o l l e d by never does. However, i t does seem p o s s i b l e and
i t s model, and b e h a v i o r a l changes can be e f f e c t e d p r a c t i c a l t o avoid s t a r v a t i o n under many condi-
by modifying t h e model. tions.
353
The f i r s t l a y e r of f a u l t t o l e r a n c e is obtained --
Basic F a u l t D e t e c t i o n
by using t h e most r e l i a b l e hardware f o r t h e a p p l i - -
and Recovery Algorithm
cation. There a r e o t h e r hardware measures needed
t h a t c o n t r i b u t e t o a f a u l t t o l e r a n t system. For The algorithm discussed h e r e d e a l s with t h e
i n s t a n c e , multi-ported c o n t r o l l e r s must be provided d e t e c t i o n and management of f a u l t s t h a t a r e assumed
t o allow p e r i p h e r a l s t o be a t t a c h e d t o more than on t o occur s i n g l y . M u l t i p l e or massive f a i l u r e s are
PC. C r i t i c a l p e r i p h e r a l s m u s t , of course, be discussed i n t h e next s e c t i o n . We allow t h e possi-
duplicated. b i l i t y of such f a u l t s accumulating t o a s u b s t a n t i a l
degree, though of course not t o t h e e x t e n t of form-
The second f a u l t t o l e r a n t l a y e r i s t h e b a s i c ing a cut-set f o r t h e network. This algorithm
CHAMP s t r u c t u r e which i s a r i c h l y connected network assumes t h a t a l l f a u l t s are permanent, t h a t they
designed t o have generous r e s e r v e capacity. A r i c h occur i n t h e PCs r a t h e r than i n a communication
c o n n e c t i v i t y is needed i n a s u r v i v a b l e a r c h i t e c t u r e l i n e , and t h a t they are d e t e c t a b l e by a t e s t pro-
t o reduce t h e p o s s i b i l i t y of fragmentation or gram r e s i d e n t i n t h e system.
(maybe worse) near fragmentation of t h e network as
a r e s u l t of damage. Reserve c a p a c i t y i s needed i n There are t h r e e main modules i n t h e f a u l t
order t o allow generous t i m e f o r d i a g n o s t i c s and t o d e t e c t i o n and management system. (1) A F a u l t
provide resources t o replace damaged units. Manager (FM), r e s i d e n t i n t h e s u p e r v i s o r of each
Because of t h e low c o s t and s m a l l s i z e of hardware, PC, contains and d i r e c t s t h e running of f a u l t
a CHAMP l a t t i c e may be constructed w i t h more PCs d e t e c t i o n programs both i n i t s own PC and i n i t s
than necessary f o r t h e b a s i c u s e r t a s k s . For exam- d i r e c t neighbors. ( 2 ) The Relocation C o n t r o l l e r
p l e , it is p r a c t i c a l t o u s e two t o f i v e times t h e (RC), r e s i d e n t i n a d i f f e r e n t PC than t h e one f o r
number of PCs a c t u a l l y needed. The excess c a p a c i t y which i t is r e p o n s i b l e , c o n t a i n s t h e l o c a t i o n s of
w i l l provide backup f o r f a u l t avoidance and sur- t h e backup modules and checkpoint d a t a f o r t h e
vivability. modules o p e r a t i n g i n t h a t PC f o r which i t is
responsible. For example, t h e RC f o r PC N , RC(N),
The next two l a y e r s of f a u l t t o l e r a n c e are is r e s i d e n t i n M, M not equal t o N. ( 3 ) The
supplied by software. The o p e r a t i n g system maps Activation C o n t r o l l e r (AC), holds t h e information
t a s k modules i n t o t h e l a t t i c e of PCs and provides necessary t o r e c r e a t e one of a designated set of
f o r backup execution t o reduce v u l n e r a b i l i t y t o modules and t h e checkpoint d a t a necessary f o r res-
f a i l u r e s . The u s e r is expected t o a s s i g n p r i o r i t y t a r t i n g t h o s e modules. It responds t o an esta-
l e v e l s t o each t a s k module. These p r i o r i t y l e v e l s b l i s h e d need f o r one of t h o s e modules by r e c r e a t i n g
are used by t h e system t o manage g r a c e f u l degrada- and r e s t a r t i n g i t , and by updating i t t o c u r r e n t
t i o n and t o determine t h e e x t e n t t o which backup i s t i m e i f necessary.
needed f o r a n i n d i v i d u a l task. When a massive
f a i l u r e is d e t e c t e d , p r i o r i t y must be given t o Additional f a u l t d e t e c t i o n and management con-
preserving computations on t h e h i g h e s t p r i o r i t y s t r a i n t s are t h a t backup modules be l o c a t e d one
t a s k modules. s t e p away from t h e primary module, and t h e RC f o r a
p a r t i c u l a r PC be l o c a t e d two s t e p s away from it.
The f a u l t recovery process is managed by t h e For example, imagine a s e c t i o n of a CHAMP l a t t i c e ,
supervisor processor. The s u p e r v i s o r d e t e c t s a as shown i n Figure 6 , where modules a, b y and c are
f a u l t by d i a g n o s t i c procedures, by examination of t a s k s of PC 1, backup copies f o r a, b, and c are
message e r r o r c o r r e c t i o n code, o r by comparing t h e l o c a t e d as shown i n PCs 2, 4 , and 5 r e s p e c t i v e l y ,
r e s u l t s of s e v e r a l independent but i d e n t i c a l compu- and t h e RC f o r PC 1 i s i n PC 3.
t a t i o n s . When a f a u l t is d e t e c t e d t h e offending PC
i s i s o l a t e d by stopping a l l communication t o t h a t
PC r e l a t i n g t o t a s k code execution. The s u p e r v i s o r
then i n s t i t u t e s recovery procedures.
1
eliminated from use, while t h e a s s o c i a t e d 110 pro-
cessor i s r e t a i n e d as a communication p a t h ) , ( 2 ) i f
necessary, r e p l a c e t h e f a u l t y PC with a new PC, o r
( 3 ) i f salvage i s not p o s s i b l e and r e p a i r is n o t
f e a s i b l e ( f o r example, i f t h e CHAMP system i s func-
PROCESSING
t i o n i n g as a f l i g h t c o n t r o l l e r f o r a space vehi- b' CENTERS
c l e ) , t h e PC is simply n o t u s a b l e and t h e excess
processing p o t e n t i a l is consequently reduced.
FIGURE 5 FAULT RECOVERY EXAMPLE
354
The FM i n each PC p e r i o d i c a l l y runs test pro- I n t h e second s i t u a t i o n , a s u p e r v i s o r proces-
grams. Assume t h a t t h e FM i n PC 4 d i s c o v e r s a s o r f i n d s two o r more f a u l t y neighbors i n t h e
f a u l t i n 1. The FM sends t h i s information t o RC course of r e g u l a r d i a g n o s t i c s . This i s grounds f o r
(1). RC (1) s e e k s confirmation by sending a m e s - concern i n t h a t more s e r i o u s l o s s e s are p o s s i b l e
sage t o t h e neighbors of PC 1 d i r e c t i n g t h a t 1 be s i n c e two f a i l u r e s a t v i r t u a l l y t h e same time (or
tested. The neighbors t e s t 1 and r e p o r t t h e w i t h i n a very s h o r t time i n t e r v a l ) is an improbable
r e s u l t s back t o R C ( 1 ) . occurrence.
I f a l l neighbors of 1 f i n d 1 f a u l t y , R C ( 1 ) The t h i r d s i t u a t i o n i s a n u n l i k e l y i n i t i a l
concludes t h a t 1 i s f a u l t y . Otherwise, i t con- d e t e c t o r of a massive f a i l u r e problem, b u t i t i s
cludes t h a t 4 i s f a u l t y and sends t h i s information important s i n c e i t is a p o s s i b l e consequence of
t:o R C ( 4 ) . Note t h a t t h e RC cannot a c c e p t informa- s e c t i o n a l damage. The main o b j e c t i v e is t o d e t e c t
t i o n without confirmation, as t h e PC c o n t a i n i n g t h e and d e a l w i t h a message t h a t is c i r c u l a t i n g end-
P.C may be f a u l t y . So t h e c y c l e s t a r t s again. l e s s l y looking f o r a missing ( l o s t ) PC. The pro-
c e s s t h a t d e t e c t s t h e c i r c u l a t i n g message could
Suppose i t i s confirmed t h a t 1 i s f a u l t y . a l s o a c t i v a t e t h e recovery.
R C ( 1 ) c o n t a i n s a l i s t i n d i c a t i n g where t h e backup
modules are l o c a t e d f o r PC 1, which i n t h i s example Once d e t e c t e d , a massive f a i l u r e invokes a
are i n P C s 2 , 4 , and 5. R C ( 1 ) sends messages t o recovery process similar t o t h e i n i t i a l s t a r t i n g up
t h e ACs of t h e s e P C s c a l l i n g f o r t h e a c t i v a t i o n of of t h e network. Two phases a r e involved: f i r s t ,
t:he a p p r o p r i a t e backup programs and t h e shutdown of r e c o n s t i t u t i n g t h e network and producing an inven-
communications w i t h 1. The l a t t e r a c t i o n p r e v e n t s t o r y of t h e u s e r code modules a v a i l a b l e and,
any messages from being s e n t t o 1 o r being received second, connecting and r e s t a r t i n g t h e u s e r code
from 1. I t a l s o p r e v e n t s any e f f o r t t o move any modules i n o r d e r of t h e p r e v i o u s l y d e c l a r e d p r i o r i -
module t o 1. However, following recovery t h e ties. Our work i n t h i s complex area i s embryonic
neighbors p e r i o d i c a l l y query t h e s h u t o u t c e n t e r i n and w i l l be d e s c r i b e d i n a l a t e r paper.
c a s e i t h a s been r e p a i r e d .
A p o s s i b l e massive f a i l u r e i s i n d i c a t e d when:
0 expected d a t a do not a r r i v e
6 a s u p e r v i s o r f i n d s two or more faulty
neighbors
+ a message cannot reach the intended
destination. FIGURE 6 PROCESSING CENTER BOARD
355
have 4k b y t e s of RAM and 8k b y t e s of ROM. The 1/0 ACKNOWLEDGEMENT
switching f u n c t i o n f o r communications among proces-
s o r s on t h e board as w e l l as off t h e board t o The a u t h o r s wish t o acknowledge t h e work of
neighboring PCs i s simulated by e i g h t 8-bit peri- Lynn Jacobson and Fred Sommer f o r t h e i r e f f o r t s i n
pheral interface circuits. There are e i g h t 110 t h e development and t e s t i n g of o p e r a t i n g s o f t w a r e
p o r t s under c o n t r o l of t h e 1/0 p r o c e s s o r ; s i x 8-bit f o r t h e CHAMP system hardware.
p a r a l l e l 1 / 0 p o r t s f o r connection t o neighbor PCs,
one 8-bit p a r a l l e l p o r t f o r connection t o t h e
s u p e r v i s o r p r o c e s s o r , and one 8-bit p a r a l l e l p o r t
f o r connection t o t h e t a s k p r o c e s s o r . In this REFERENCES
design, t h e 1/0 processor and t h e t a s k p r o c e s s o r
can a l s o communicate v i a 32 b y t e s of two-ported R. J. Swan, "The Switching S t r u c t u r e and Address-
shared RAM, allowing f a s t d a t a t r a n s f e r between i n g A r c h i t e c t u r e of a n E x t e n s i b l e Multiproces-
these processors. s o r : Cm*," D o c t o r a l T h e s i s , CMU-CS-78-138,
Department of Computer Science, Carnegie-
P r e s e n t e f f o r t s r e l a t i n g t o hardware experi- Mellon U n i v e r s i t y , P i t t s b u r g h , PA.
mentation a r e i n t h e i n i t i a l s t a g e of development.
Operating procedures a r e being developed t o p r o v i d e A. M. Despain and D. A. P a t t e r s o n , "X-Tree: A
a b a s i c l e v e l communication s t r u c t u r e . These pro- Tree-Structured M u l t i p r o c e s s o r Computer Archi-
cedures a r e being designed s o t h a t they can be t e c t u r e , " 5 t h Symposium on Comp. Arch., Palo
expanded t o i n c l u d e more s o p h i s t i c a t e d t e c h n i q u e s A l t o , CA, A p r i l 3-5, 1978, Conf. Proc. pp
as t h e p r o j e c t p r o g r e s s e s . 144-151.
356