Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

A Cooperative Highly-Available Multi-Processor A r c h i t e c t u r e

CHAMP

C h r i s t i n e A. Monson, P h i l i p R. Monson, and Marshall C. Pease

SRI I n t e r n a t i o n a l
Menlo Park, C a l i f o r n i a

ABSTRACT network. There a r e no c e n t r a l hardware o r system


s o f t w a r e r e s o u r c e s and u s e r code is d i s p e r s e d
This paper d e s c r i b e s a r e s e a r c h e f f o r t t o throughout t h e e n t i r e network i n such a way a s t o
design and c o n s t r u c t a computer hardware a r c h i t e c - avoid t h e c r e a t i o n of v i r t u a l c e n t r a l r e s o u r c e s ,
t u r e capable of expanding t o accommodate a d d i t i o n a l t h u s enhancing s u r v i v a b i l i t y .
processing needs and s u s t a i n i n g massive hardware
f a i l u r e s while s t i l l r e t a i n i n g a u s e a b l e p r o c e s s i n g Among t h e reasons most o f t e n c i t e d f o r u s i n g
ability. The SRI proprietary architecture l a r g e numbers of s e l f - c o n t a i n e d small computers
described h e r e has been developed t o u s e a l a r g e connected i n t o a network t o f u n c t i o n a s a s i n g l e
number of p r o c e s s o r s i n an a r b i t r a r i l y connected computer a r e :
l a t t i c e . This a r c h i t e c t u r e is c a l l e d CHAMP f o r
Cooperative Highly A v a i l a b l e Multi-Processor. The 0 r e d u c t i o n of hardware c o s t
reasoning behind, and t h e j u s t i f i c a t i o n f o r , such a
problem s i z e (throughput)
design i s explored. CHAMP w i l l t o be programmed
using a design methodology based on t h e M-module 0 expandability
(model-driven module), which is an autonomous pro-
0 r e d u c t i o n of components and p i n count
gram module c o n t a i n i n g a model, a s e t of v a l u e s ,
and a s e t of procedures. The u s e of t h i s methodol- 0 greater computational reliability
ogy is explored along with i t s a p p l i c a t i o n t o f a u l t (fault tolerance)
tolerance. Basic f a u l t t o l e r a n c e a l g o r i t h m s a r e
0 hardware s u r v i v a b i l i t y .
d e s c r i b e d i n r e l a t i o n t o t h e CHAMP a r c h i t e c t u r e .
Hardware has been designed and c o n s t r u c t e d f o r a n
i n i t i a l t e s t of t h e concepts.
There are o t h e r r e l a t e d designs t h a t address t h e s e
i s s u e s somewhat d i f f e r e n t l y . In particular, the
INTRODUCTION Cm*' and X-Tree' d e s i g n s a r e of t h i s category. The
p r e s e n t d e s i g n is intended t o e x p l o r e an approach
Because of t h e expanding c o m p l e x i t i e s of t h a t p u t s s p e c i a l emphasis on t h e i s s u e s of r e l i a -
modern s c i e n t i f i c and m i l i t a r y needs, t h e r e i s a bility, survivability and e x p a n d a b i l i t y . The
need f o r a computer a r c h i t e c t u r e t h a t allows both i m p l i c a t i o n s of t h e s e g o a l s f o r CHAMP a r e con-
f o r hardware e x p a n d a b i l i t y and f o r s u r v i v a b i l i t y . s i d e r e d s e p a r a t e l y i n t h e following s u b s e c t i o n s .
Expandability i s t h e a b i l i t y of t h e computer system
t o f i l l t h e growing needs of t h e user. Survivabil-
i t y i s a m i l i t a r y term t o d e s c r i b e a s i t u a t i o n
where t h e goal i s t o have t h e subject--in t h i s
case, t h e computer system--survive damage i n some
u s e a b l e form. The CHAMP a r c h i t e c t u r e has been
--
Hardware Cost

developed a s an answer t o t h e s e needs. CHAMP was I f l a r g e numbers of small computers a r e t o be


designed t o be f l e x i b l e enough t o make u s e of tech- j o i n e d i n a c o o p e r a t i v e network t o t a k e advantage
n o l o g i c a l advances by u s i n g a l a r g e number of pro- of t h e p o t e n t i a l hardware c o s t savings, a general-
c e s s o r s i n an a r b i t r a r i l y connected l a t t i c e t o ized networking and programming technique i s
f u n c t i o n a s a s i n g l e computer. needed. Such a technique must permit t h e c r e a t i o n
of a system of a r b i t r a r y s i z e from small b u i l d i n g
CHAMP, an S R I p r o p r i e t a r y a r c h i t e c t u r e , is a blocks without imposing any s i z e dependent r e s t r i c -
network or l a t t i c e of f u n c t i o n a l l y i d e n t i c a l t i o n s on t h e programmer who is t o w r i t e t h e code t o
stand-alone microcomputers. This network is u s e t h e system. That is t o s a y , t h e code f o r a
managed by a d i s t r i b u t e d o p e r a t i n g system which is given t a s k should look p r e c i s e l y t h e same f o r a
s t r u c t u r e d h i e r a r c h i c a l l y and makes t h e l a t t i c e system of 10 computers a s i t does f o r a system of
behave a s a s i n g l e computer with t h e unique f e a t u r e 1,000 computers.
of hardware e x p a n d a b i l i t y o r c o n t r a c t a b i l i t y t h a t
can occur i n real-time w h i l e t h e computer is i n Using t h e M-module programming approach
o p e r a t i o n . This network may be arranged as a d e s c r i b e d l a t e r , t h e CHAMP a r c h i t e c t u r e p e r m i t s t h e
geometrically regular lattice (e.g., a 3- c o n s t r u c t i o n of a s i n g l e computer r e s o u r c e from
dimensional r e c t a n g u l a r l a t t i c e w i t h a computer a t l a r g e numbers of independent microcomputers without
each c o o r d i n a t e node) o r an a r b i t r a r i l y connected c r e a t i n g an impact on t h e software.

349
CH1465-4/79/0000-0349$00.75 @ 1979 IEEE
Problem
_ _ _ -S i z e Computational R e l i a b i l i t y

There are some problems t h a t are f a r t o o l a r g e There a r e many computing a p p l i c a t i o n s t h a t


t o be solved on any of today's main-frame computers need a h i g h l y r e l i a b l e response ( i . e . , c o r r e c t ,
-- even t h e l a r g e s t and f a s t e s t -- within a f i n i t e with high p r o b a b i l i t y ) and s i n c e a computational
time. Wide a r e a a i r t r a f f i c c o n t r o l i s an example f a i l u r e i n such a p p l i c a t i o n s g e n e r a l l y h a s grave
of such a problem. Most such problems, however, consequences, t h i s is an extremely important sub-
are composed of sub-units (e.g., t h e s t a t e of one ject. Often, however, t h e n o t i o n of computational
a i r c r a f t ) t h a t are e a s i l y t r a c t a b l e on even a small r e l i a b i l i t y is confused with hardware r e l i a b i l i t y .
computer -- t h e d i f f i c u l t y is t h a t r e a l l i f e s i t u a - To make matters more d i f f i c u l t , " f a u l t t o l e r a n c e "
t i o n s r e q u i r e t h a t t o o many such sub-units be con- i s an imprecise term t h a t can cover a wide range of
s i d e r e d f o r such a s o l u t i o n t o be f e a s i b l e . CHAMP characteristics. Extra space w i l l be devoted h e r e
is i d e a l l y s u i t e d t o t h e s o l u t i o n of such problems t o e x p l a i n i n g t h e CHAMP p r o j e c t view of computa-
because i t is made up of a communicating network of t i o n a l r e l i a b i l i t y i n o r d e r t o h e l p avoid confusion
p h y s i c a l l y s m a l l computers, each of which can com- on t h i s important point.
p u t e sub problems of t h i s kind.
The c r i t e r i a are t h a t f i r s t , a hardware
Expandability f a i l u r e does not h a l t t h e computational flow, and
second, t h a t t h e r e s u l t s a r r i v e "on t i m e " and i n
There i s a growing need f o r a g e n e r a l i z e d t h e same form as they would have been had t h e r e
approach t o computer growth t h a t makes no repro- been no hardware f a i l u r e .
gramming demand on e x i s t i n g software. Consider t h e
c a s e of a l a r g e i n s t a l l a t i o n dedicated t o real-time
The components of computational r e l i a b i l i t y
information p r o c e s s i n g , such as a r a d a r t r a c k i n g
can be v i s u a l i z e d i n l a y e r s as a h i e r a r c h y of
system. Often, t h e i n i t i a l system w a s designed and
b u i l d i n g blocks (shown i n Figure 1). A s shown by
configured t o meet a l i m i t e d budget as w e l l as t o
t h e bottom l a y e r of Figure 1 t h e hardware must be
s a t i s f y t h e assigned mission. I n succeeding y e a r s
c o n s t r u c t e d as r e l i a b l y a s t h e technology permits.
t h e mission, t h e c a p a b i l i t y , and t o some e x t e n t ,
Neither CHAMP nor higher l e v e l measures are
t h e money a v a i l a b l e t e n d s t o grow. It becomes
intended t o t a k e t h e p l a c e of c a r e f u l l y engineered
necessary t o expand t h e computer. Such expansion
o f t e n r e q u i r e s r e w r i t i n g t h e e x i s t i n g code t o components. For example, p i n count r e d u c t i o n h a s
t a i l o r i t t o t h e expanded computer. The c o s t of a l r e a d y been mentioned. Other measures such as
changing t h e s o f t w a r e a l o n e can be s o burdensome redundant t r a n s i s t o r s , element s i z i n g , and h e r m e t i c
t h a t expansion i s delayed. With CHAMP a d d i t i o n a l s e a l i n g , e t c . a r e a l s o required. Where r a d i a t i o n
computing c a p a c i t y can be added t o t h e network hardness i s needed, a l l p o s s i b l e measures must be
without r e q u i r i n g changes t o e x i s t i n g u s e r pro- taken t o i n s u r e it.
grams. "RE L I AB1LITY"

Components and P i n Counts


WITH VOTING
System r e l i a b i l i t y is, i n l a r g e measure, a (For example, SIFT)
f u n c t i o n of t h e number of s e p a r a t e components
t h e number of connections necessary t o s e r v i c e
t h o s e components. L S I components are i n c r e a s i n g i n
c a p a b i l i t y and (somewhat) i n b a s i c r e l i a b i l i t y as
t h e e f f i c i e n c y of u s i n g semiconductor real e s t a t e
I FAULT HANDLING
OPERATING SYSTEM

i n c r e a s e s . The r e l i a b i l i t y of i n t e r c o n n e c t i o n s on
a semiconductor chip g r e a t l y exceeds t h e r e l i a b i l -
i t y of i n t e r - c h i p connections ( e s p e c i a l l y i f sock-
I CHAMP ARCHITECTURE
I
ets o r connectors are used). Without t r y i n g t o
d e l v e deeply i n t o r e l i a b i l i t y theory i n t h i s paper,
it is s u f f i c i e n t t o observe t h a t o v e r a l l system
r e l i a b i l i t y can be enhanced by reducing t h e number
of i n t e r c o n n e c t i o n s between chips. This i m p l i e s ,
I RELIABLE HARDWARE

for instance, that bit-serial communications from


FIGURE 1 FAULT TOLERANT LAYERS
chip t o c h i p are b e t t e r t h a n p a r a l l e l communica-
t i o n s from a p i n count p o i n t of view. More impor-
t a n t l y , t h e g e n e r a l need t o reduce p i n count
r e q u i r e s reducing t h e frequency of i n t e r - c h i p com- Aside from engineering t h e components s o t h a t
munications. CHAMP w i l l a l l o w t h e achievement of they tend n o t t o f a i l , t h e r e l i a b i l i t y problem can
t h i s goal when i t i s p o s s i b l e t o make c h i p s t h a t be reduced t o one of providing and managing excess
c o n t a i n complete computers w i t h a l l necessary RAM o r r e s e r v e resources. These r e s e r v e s must be h e l d
and ROM. The r e q u i r e d technology w i l l become i n a n a c c e s s i b l e c o n f i g u r a t i o n so t h a t when a sub-
a v a i l a b l e as a r e s u l t of t h e commercial VLSI s t i t u t i o n i s made t h e new arrangement of hardware
e f f o r t s and t h e government sponsored Very High can behave s u b s t a n t i a l l y as t h e o l d one did. CHAMP
Speed I n t e g r a t e d C i r c u i t s (VHSIC) programn3 CHAMP, a r c h i t e c t u r e meets t h i s c r i t e r i o n because it is a n
t h e r e f o r e , w i l l be a b l e t o make e f f e c t i v e u s e of expandable a r b i t r a r y network of f u n c t i o n a l l y i d e n t -
f u t u r e t e c h n o l o g i c a l developments. i c a l computers onto which t h e u s e r code i s mapped

350
almost without regard f o r t h e s p e c i f i c hardware The PCs are i n t e r c o n n e c t e d by dedicated
i n t e r c o n n e c t i o n p a t t e r n . CHAMP hardware a r c h i t e c - point-to-point connections between PCs. Each PC
t u r e is shown by t h e second l a y e r i n Figure 1. It h a s a number of 110 l i n e s ( u s u a l l y 4 t o 8 i n
w i l l be d i s c u s s e d i n g r e a t e r d e t a i l l a t e r on. number) f o r t h i s purpose. I n order t o achieve
f a u l t t o l e r a n c e t h e network must be a b l e t o assume
The t h i r d l a y e r i n t h e f i g u r e r e f e r s t o t h e a v a r i e t y of c o n f i g u r a t i o n s , and t h e o p e r a t i n g sys-
CHAMP o p e r a t i n g system t h a t d i s t r i b u t e s u s e r code, t e m must be a b l e t o handle t h i s v a r i e t y , even
d e t e c t s f a u l t s , r e d i s t r i b u t e s code for fault though t h e base-line c o n n e c t i v i t y i s a r e g u l a r l a t -
avoidance and a c t i v a t e s roll-back procedures. T h i s t i c e . F i g u r e 3 is a n example l a t t i c e connected i n
f a u l t handling o p e r a t i n g system provides t h e most a r e g u l a r two dimensional a r r a y f o r convenience of
fundamental l e v e l of f a u l t t o l e r a n c e . Although it illustration.
does n o t guarantee a n u n i n t e r r u p t e d stream of
c o r r e c t r e s u l t s i t c o n t i n u a l l y manipulates t h e
a v a i l a b l e r e s o u r c e s i n s e a r c h of p o s s i b l e l u r k i n g
faults. It a c t i v a t e s s p a r e c o p i e s of code and
i n s t i t u t e s roll-back t o (presumably) good computa-
t i o n a l r e s u l t s . The f a u l t handling o p e r a t i n g sys-
t e m is described i n g r e a t e r d e t a i l l a t e r on.

F i g u r e 1 shows t h a t t h e s e t h r e e l e v e l s of
b u i l d i n g blocks form t h e foundation upon which more
e l a b o r a t e software measures t h a t go beyond CHAMP
may be imposed t o o b t a i n high confidence i n t h e
c o r r e c t n e s s of a continuous output d a t a stream. An
0
example of such a d d i t i o n a l measures is SIFT
(Software Implemented F a u l t Tolerance) which i s an
SRI developed technique u t i l i z i n g m u l t i p l e (redun-
d a n t ) execution and m u l t i p l e (redundant) v o t i n g
a l g ~ r i t h m s . ~I n t h i s case a hardware f a i l u r e i s
Io l o i l o Io PROCESSING CENTER

discovered when a component of t h e network is found FIGURE 2 CHAMP LATTICE ARCHITECTURE


to produce i n c o r r e c t ( i n c o n s i s t e n t ) r e s u l t s . The (TWO-DIMENSIONAL EXAMPLE)
recovery a l g o r i t h m t o r e p l a c e t h e f a u l t y u n i t
depends on a h i g h l e v e l of a s s u r a n c e t h a t t h e P r o c e s s i n g Center
replacement u n i t is f r e e of f a u l t s . CHAMP'S con-
tinuous s e a r c h f o r p o t e n t i a l f a u l t s h e l p s t o pro- Each PC of t h e CHAMP l a t t i c e is a r c h i t e c t u r -
v i d e t h i s assurance. a l l y i d e n t i c a l and c o n t a i n s a t least t h r e e proces-
s o r s t h a t perform t h e f u n c t i o n s of communications,
-Hardware Survivability system s u p e r v i s i o n , and u s e r t a s k module execution.
The communications processor and t h e s u p e r v i s o r
S u r v i v a b i l i t y is a d i f f e r e n t a s p e c t of t h e processor perform a l l t h e overhead f u n c t i o n s , nor-
r e l i a b i l i t y question. I n t h e f a c e of p o s s i b l y mas- mally described as system f u n c t i o n s , t h u s f r e e i n g
s i v e damage t o t h e hardware t h e need i s t o r e t a i n t h e t a s k processor t o c o n c e n t r a t e on t h e u s e r
some u s e f u l l e v e l of o p e r a b i l i t y . application.

I n a d d i t i o n , t h e o p e r a t i n g system must t a k e Communications Processor--The communications


i n t o account t h e r e l a t i v e p r i o r i t i e s of assigned o r t h e 110 processor is t h e message t r a f f i c con-
t a s k s and apply t h e remaining r e s o u r c e s t o t h e most troller. It examines incoming messages, checks f o r
c r i t i c a l tasks. The l a r g e r t h e f r a c t i o n damaged, s y n t a c t i c e r r o r s and makes t h e r e q u i r e d c o r r e c t i o n s
t.he more d i f f i c u l t i t may be t o a s s u r e continuously i f p o s s i b l e . Message header information i s decoded
v e r i f i e d output r e s u l t s . Many c r i t i c a l a p p l i c a - t o determine i f p r o c e s s i n g o r r e t r a n s m i s s i o n i s
t i o n s r e q u i r e as a l a s t r e s o r t only t h a t e r e q u i r e d , what type of processing o r where t o send
u s a b l e computation be p o s s i b l e . the (processed o r unprocessed) message. When
transmission is r e q u i r e d t h e communications proces-
CHAMP i s p r i m a r i l y configured t o meet t h e s u r - s o r determines t h e c u r r e n t b e s t r o u t e ( r ) f o r mes-
v i v a b i l i t y requirements by providing both a r i c h l y s a g e t r a n s m i s s i o n based on information i t maintains
Interconnected network of i n d i v i d u a l computing on i t s neighboring environment.
u n i t s and t h e a b i l i t y of t h e o p e r a t i n g system t o
d e a l with a dynamically changing a r b i t r a r i l y con- Supervisor Processor--The s u p e r v i s o r p r o c e s s o r
f i g u r e d network. i n s u r e s t h a t t h e network f u n c t i o n s as a u n i t .
There is no c e n t r a l hardware r e s o u r c e o r c e n t r a l
'executive' i n t h e CHAMP l a t t i c e ; t h i s i s an impor-
CHAMP ARCHITECTURE t a n t feature i n obtaining f a u l t tolerant operation,
e s p e c i a l l y i n t h e case of massive f a i l u r e s . The
The CHAMP a r c h i t e c t u r e may be thought of as e x e c u t i v e f u n c t i o n e x i s t s e q u a l l y i n a l l PCs.
c o n s i s t i n g of two p a r t s . The f i r s t is t h e b a s i c
hardware a r c h i t e c t u r e , a homogeneous l a t t i c e of Each PC m a i n t a i n s a f i l e of information about
p r o c e s s o r s , c a l l e d processing c e n t e r s (PCs). The i t s own immediate environment as w e l l as l i m i t e d
second is a h i e r a r c h i c a l network of t a s k code g l o b a l information. I n o r d e r t o determine t h e
modules which is mapped onto t h e l a t t i c e of PCs. h e a l t h s t a t u s of neighbors, as w e l l as i t s own, t h e

351
s u p e r v i s o r d i r e c t s t h e performance of d i a g n o s t i c The s u p e r v i s o r p r o c e s s o r is shown i n F i g u r e 3
routines. Because e x h a u s t i v e d i a g n o s t i c s are n o t performing d i a g n o s t i c s on t h e DFT c h i p , p a r t of t h e
p r a c t i c a l , e f f e c t i v e d i a g n o s t i c s are c o n s t r u c t e d of t a s k p r o c e s s o r assets, which i n t u r n d e p o s i t s t h e
small routines t h a t e x i s t i n t h e a p p l i c a t i o n t a s k r e s u l t s i n a cache memory. L a t e r i n t h e c y c l e t h e
code ( f o r example, a matrix i n v e r s i o n ) . The pur- s u p e r v i s o r p r o c e s s o r w i l l examine t h e cache f o r t h e
pose i s t o s e a r c h f o r t h e f a u l t b e f o r e i t induces correct results.
e r r o r s i n t o t h e a p p l i c a t i o n p r o c e s s i n g o r , perhaps
more s e r i o u s l y , i n t o t h e recovery process. Because
of t h i s , much importance is p l a c e d on t h e p e r f o r - NETWORK OF TASK CODE MODULES
mance of d i a g n o s t i c s .
The CHAMP system d e s i g n u s e s a programming
Another f e a t u r e handled by t h e s u p e r v i s o r pro- methodology based on what we c a l l M-modules
c e s s o r is t h e f a u l t recovery process. This f e a t u r e (model-driven modules) developed a t S R I I n t e r n a -
i s covered l a t e r . tional. An M-Module i s an autonomous program
module c o n t a i n i n g a model, a set of v a l u e s , and a
-Task Processor--The user's a p p l i c a t i o n pro- set of procedures. The model encodes knowledge
grams a r e executed as t a s k code modules i n t h e t a s k about t h e real world domain of t h e module. Its set
processors. The t a s k p r o c e s s o r u t i l i z e s assets of v a l u e s i n c l u d e d a t a about t h a t domain and v a r i -
t h a t e x p e d i t e t h e t a s k execution. For example, a b l e assignments t h a t are made by t h e module
cache memories and d i s c r e t e f o u r i e r t r a n s f o r m (DFT) i t s e l f . The procedures i n a module are g e n e r a l i z e d
c i r c u i t r y can be included i n t h e t a s k p r o c e s s o r ones; t h e model c o n t r o l s t h e i r a c t u a l behavior.
d e s i g n when t h e a p p l i c a t i o n t a s k r e q u i r e s f a s t
f o u r i e r t r a n s f o r m (FFT) p r o c e s s i n g . When appropri- The concept of M-modules w a s developed f o r a n
a t e t o t h e a p p l i c a t i o n , t h e t a s k p r o c e s s o r may a l s o experimental system c a l l e d ACS.1 (Automated Command
be designed t o e x e c u t e d i r e c t l y a h i g h - l e v e l Support)6 t o e x p l o r e methods of providing automated
language. P e r i p h e r a l s may be a t t a c h e d t o t h e n e t - s u p p o r t f o r management. It w a s designed t o a l l o w
work as though they w e r e a d d i t i o n a l PCs o r even as u s e r s t o t u n e and modify t h e system t o s p e c i f i c
t a s k p r o c e s s o r assets. management needs. This a b i l i t y t o c o n t r o l t h e
behavior of a system's M-modules i s u s e f u l i n
The t a s k code modules processed by t h e t a s k a p p l i c a t i o n s of CHAMP. Of even g r e a t e r importance,
p r o c e s s o r s c o n s t i t u t e a h i e r a r c h i c a l network which however, is t h e f a c t t h a t t h e M-module approach
is mapped onto t h e l a t t i c e of homogenous PCs. makes a program a d a p t a b l e t o changes i n CHAMP
These t a s k modules i n t e r a c t w i t h one a n o t h e r by i t s e l f . I f t h e changes are t h e r e s u l t of f a u l t s o r
communicating messages either directly (if damage, r e s p o n s i v e n e s s and a d a p t a b i l i t y are neces-
i n t e r a c t i n g t a s k modules a r e i n a d j o i n i n g PCs) o r s a r y t o i n s u r e r e l i a b i l i t y and s u r v i v a b i l i t y . If
v i a i n t e r m e d i a t e PCs i n a scheme similar t o t h e t h e changes are t h e r e s u l t of adding r e s o u r c e s , t h e
r o u t i n g of long-distance phone communications. e f f e c t is t o make t h e system e a s i l y expandable.

Examule--Figure 3 i s a block diagram of a PC Three c h a r a c t e r i s t i c s of t h e M-module metho-


w i t h emphasis on t h e 110 f u n c t i o n f o r i l l u s t r a t i o n . dology a r e important t o CHAMP. F i r s t , t h e approach
This PC i s shown w i t h e i g h t 110 l i n e s . Communica- l e a d s t o a h i e r a r c h i c a l , top-down, h i g h l y modular
t i o n s are handled by a s w i t c h c o n t r o l l e d by t h e design. Second, i t imposes a d i s c i p l i n e on i n t e r -
communications p r o c e s s o r ( l a b e l l e d IOP). For exam- module communications, determining which modules
p l e , i f a message i s r e c e i v e d a t 110 3 r e q u e s t i n g can communicate w i t h each o t h e r and under what con-
r o u t i n g t o a a n o t h e r d e s t i n a t i o n , t h e communica- d i t i o n s . Third, it minimizes t h e amount of i n f o r -
t i o n s p r o c e s s o r determines t h a t t h e b e s t r o u t e is mation t h a t must be t r a n s f e r r e d i n o r d e r t o relo-
through t h e neighbor connected t o 110 5. Note t h e c a t e a program module.
heavy p a t h i n t h e f i g u r e . Therefore, t h e s w i t c h e s
connecting 110 3 and 110 5 are c l o s e d ( t h e d o t s i n
t h e f i g u r e ) t o permit t h e message t o flow through.
I10 TO
OTHER M-MODULES,
USER, DATABASE. etc
110, 1/02 1/03 004 1/05 1/06 1/07 1/08

0 DOMAIN 0 INFORMATION
ABOUT DOMAIN
0 CONSTRAINTS
MAINTAINED
AUTOMATICALLY

M-MODULE ACTIVATED BY:


0 PRESENCE OF INPUT DATA
0 DEMAND FOR DATA

FIGURE 3 A PROCESSING CENTER FIGURE 4 MODEL-DRIVEN PROGRAM MODULE

352
A s shown i n Figure 4, an M-module is a n auto- t o a n o t h e r , r a t h e r i t is r e c r e a t e d a t t h e new loca-
nomous program module t h a t is given r e s p o n s i b i l i t y tion. To provide fault-avoidance c a p a b i l i t y , f o r
f o r a p a r t i c u l a r segment of t h e system's activity. example, we maintain a t l e a s t one backup v e r s i o n of
It accomplishes t h i s by means of i t s t h r e e com- a n M-module's model a t another PC. Checkpoint d a t a
ponents. The model c o n t a i n s t h e knowledge used by f o r t h a t M-module a r e a l s o maintained a t t h e backup
t h e M-module (i.e., t h e r e l a t i o n s , g o a l s and con- location. Should t h e PC running t h e M-module be
s t r a i n t s t h a t t h e module's v a l u e s are r e q u i r e d t o found t o be f a u l t y , a s i n g l e command can cause t h e
obey). The s e t of v a l u e s give t h e d e t a i l s of t h e backup model t o be a c t i v a t e d and t h e new M-module
p a r t of t h e world f o r which t h e module is responsi- t o be i n i t i a l i z e d w i t h t h e checkpoint data.
ble. The procedures make t h e module i n t o an auto-
nomous u n i t a b l e t o respond t o e x t e r n a l r e q u e s t s o r So f a r , we have d i s c u s s e d i s o l a t e d M-modules.
commands and t o maintain t h e s e l f - c o n s i s t e n c y of The e n t i r e system of M-modules, however, can be
i t s set of v a l u e s according t o t h e r e l a t i o n s con- regarded as an M-module i t s e l f . For example, t h e
t a i n e d or implied i n i t s model. a i r s p a c e c o n t r o l system mentioned above can be con-
s i d e r e d as a s i n g l e huge, M-module whose model is
To i l l u s t r a t e t h e u s e of M-modules, c o n s i d e r a t h e union of a l l t h e models of t h e s e p a r a t e track-
system intended f o r t r a f f i c c o n t r o l i n a n a i r s p a c e . i n g and pair-watching modules w i t h d u p l i c a t e con-
One M-module can be c r e a t e d f o r each a i r c r a f t i n s t r a i n t s being e l i m i n a t e d . S i m i l a r l y , i t s set of
t h e a i r s p a c e and made r e s p o n s i b l e f o r maintaining v a l u e s is t h e product of t h e s t a t e v e c t o r s of t h e
information about t h a t a i r c r a f t ' s p o s i t i o n , d i r e c - aircraft. Its procedures are t h e union of t h e
t i o n , speed, and whatever o t h e r information i s a v a i l a b l e procedures, a g a i n after eliminating
needed. The models used by t h e s e M-modules encode duplications.
what i s known about t h e c h a r a c t e r i s t i c s of t h e air-
c r a f t being tracked--i.e., the relationships that I n t h e o t h e r d i r e c t i o n , a given M-module can
l i m i t its l o c a t i o n , speed and a t t i t u d e , given i t s o f t e n be decomposed i n t o a set of s u b o r d i n a t e M-
history. modules, p l u s t h e a d d i t i o n a l components needed t o
control t h e i r interactions. For example, a n air-
This a p p l i c a t i o n environment r e q u i r e s t h a t M- c r a f t t r a c k i n g module may a l s o need t o maintain
modules be c r e a t e d and destroyed as needed. For information about f u e l s t a t u s . I n t h a t case, i t
example, as a new a i r c r a f t l e a v e s t h e a i r s p a c e , i t s would be convenient t o consider t h e p o s i t i o n -
module i s no longer needed and can be destroyed. t r a c k i n g and t h e f u e l e s t i m a t i n g , f u n c t i o n s as
A s a n a i r c r a f t e n t e r s t h e space, a new module must e x e r c i s e d by d i f f e r e n t s u b o r d i n a t e M-modules.
be c r e a t e d . These s u b o r d i n a t e modules i n t e r a c t , of course, b u t
t h e i r i n t e r a c t i o n s can be handled conveniently by
I n a n a i r c r a f t t r a c k i n g M-module, t h e model messages.
encodes t h e dynamic c h a r a c t e r i s t i c s of t h e air-
c r a f t . The set of v a l u e s s p e c i f i e s t h e s t a t e of The advantages of decomposing t h e system
t h e a i r c r a f t as l a s t observed p l u s any e s t i m a t e d h i e r a r c h i c a l l y are twofold. F i r s t , by i m i t a t i n g
updates. Some of t h e s e v a l u e s are o b s e r v a t i o n a l ; t h e way i n which a person t h i n k s about a problem,
o t h e r s are e s t i m a t e d update v a l u e s generated by t h e i t makes i t easier f o r t h e non-programmer t o under-
modules using, perhaps, Kalman's algorithm. The s t a n d and c o n t r o l system behavior. Second, it
set of procedures i n c l u d e s t h o s e t h a t update t h e allows t h e imposition of a r i g o r o u s communication
values as w e l l as t h o s e t h a t a c c e p t d a t a , respond d i s c i p l i n e . I n p a r t i c u l a r w e are a b l e t o l i m i t
t o r e q u e s t s f o r information, and e x e c u t e o t h e r inter-module communications t o M-modules t h a t are
t a s k s t h a t may be needed. The procedures, however, s i b l i n g s , components of a s i n g l e M-module. In
are w r i t t e n i n g e n e r a l terms and s p e c i a l i z e d t o a d d i t i o n w e can s a y without going i n t o g r e a t d e t a i l
immediate needs by t h e model. The s i t u a t i o n is t h a t i t is possible t o specify t i g h t conditions
analogous t o t h e person who, upon r e c e i v i n g a under which we a l l o w two s i b l i n g s t o communicate.
r e q u e s t f o r a c t i o n , c o n s u l t s a p o l i c y manual ( t h e These c o n d i t i o n s a l l o w t h e system t o run completely
model), t o determine how t o respond t o t h e r e q u e s t . asynchronously w h i l e providing assurance a g a i n s t
deadlock, a c o n d i t i o n i n which, f o r example, Pro-
A s has been i n d i c a t e d , M-modules can be e a s i l y cess A must complete some a c t i o n before Process B
modified t o meet new o r changing conditions. This can act, and v i c e versa.
is because t h e models are e x p l i c i t and remain
a v a i l a b l e f o r s t u d y or change. I n t h e example d i s - We have n o t been a b l e t o s p e c i f y c r i t e r i a t h a t
cussed, i f new information should become a v a i l a b l e w i l l a l s o a s s u r e freedom from s t a r v a t i o n , a weaker
about t h e c a p a b i l i t i e s of one of t h e a i r c r a f t being c o n d i t i o n t h a n deadlock i n which, although t h e r e
tracked, t h i s information can be added t o t h e model may be no l o g i c a l reason why a process cannot be
without r e q u i r i n g changes elsewhere i n t h e module. completed, t h e r e is a f i n i t e p r o b a b i l i t y t h a t i t
The procedures used by t h e module are c o n t r o l l e d by never does. However, i t does seem p o s s i b l e and
i t s model, and b e h a v i o r a l changes can be e f f e c t e d p r a c t i c a l t o avoid s t a r v a t i o n under many condi-
by modifying t h e model. tions.

I n g e n e r a l , t h e procedures can be expected t o FAULT TOLERANCE


be common t o many M-modules and d i s t r i b u t e d
throughout t h e network. I n consequence, t o move a n As explained e a r l i e r , t h e f a u l t tolerant
M-module s a y from PC I t o J, we need t o move only c h a r a c t e r i s t i c s of t h e CHAMP a r c h i t e c t u r e (Figure
i t s model and i t s set of values. It should be 1) provide a s t r u c t u r e d approach t o o b t a i n i n g t h e
observed t h a t a n M-module does n o t move from one PC degree of f a u l t t o l e r a n c e needed.

353
The f i r s t l a y e r of f a u l t t o l e r a n c e is obtained --
Basic F a u l t D e t e c t i o n
by using t h e most r e l i a b l e hardware f o r t h e a p p l i - -
and Recovery Algorithm
cation. There a r e o t h e r hardware measures needed
t h a t c o n t r i b u t e t o a f a u l t t o l e r a n t system. For The algorithm discussed h e r e d e a l s with t h e
i n s t a n c e , multi-ported c o n t r o l l e r s must be provided d e t e c t i o n and management of f a u l t s t h a t a r e assumed
t o allow p e r i p h e r a l s t o be a t t a c h e d t o more than on t o occur s i n g l y . M u l t i p l e or massive f a i l u r e s are
PC. C r i t i c a l p e r i p h e r a l s m u s t , of course, be discussed i n t h e next s e c t i o n . We allow t h e possi-
duplicated. b i l i t y of such f a u l t s accumulating t o a s u b s t a n t i a l
degree, though of course not t o t h e e x t e n t of form-
The second f a u l t t o l e r a n t l a y e r i s t h e b a s i c ing a cut-set f o r t h e network. This algorithm
CHAMP s t r u c t u r e which i s a r i c h l y connected network assumes t h a t a l l f a u l t s are permanent, t h a t they
designed t o have generous r e s e r v e capacity. A r i c h occur i n t h e PCs r a t h e r than i n a communication
c o n n e c t i v i t y is needed i n a s u r v i v a b l e a r c h i t e c t u r e l i n e , and t h a t they are d e t e c t a b l e by a t e s t pro-
t o reduce t h e p o s s i b i l i t y of fragmentation or gram r e s i d e n t i n t h e system.
(maybe worse) near fragmentation of t h e network as
a r e s u l t of damage. Reserve c a p a c i t y i s needed i n There are t h r e e main modules i n t h e f a u l t
order t o allow generous t i m e f o r d i a g n o s t i c s and t o d e t e c t i o n and management system. (1) A F a u l t
provide resources t o replace damaged units. Manager (FM), r e s i d e n t i n t h e s u p e r v i s o r of each
Because of t h e low c o s t and s m a l l s i z e of hardware, PC, contains and d i r e c t s t h e running of f a u l t
a CHAMP l a t t i c e may be constructed w i t h more PCs d e t e c t i o n programs both i n i t s own PC and i n i t s
than necessary f o r t h e b a s i c u s e r t a s k s . For exam- d i r e c t neighbors. ( 2 ) The Relocation C o n t r o l l e r
p l e , it is p r a c t i c a l t o u s e two t o f i v e times t h e (RC), r e s i d e n t i n a d i f f e r e n t PC than t h e one f o r
number of PCs a c t u a l l y needed. The excess c a p a c i t y which i t is r e p o n s i b l e , c o n t a i n s t h e l o c a t i o n s of
w i l l provide backup f o r f a u l t avoidance and sur- t h e backup modules and checkpoint d a t a f o r t h e
vivability. modules o p e r a t i n g i n t h a t PC f o r which i t is
responsible. For example, t h e RC f o r PC N , RC(N),
The next two l a y e r s of f a u l t t o l e r a n c e are is r e s i d e n t i n M, M not equal t o N. ( 3 ) The
supplied by software. The o p e r a t i n g system maps Activation C o n t r o l l e r (AC), holds t h e information
t a s k modules i n t o t h e l a t t i c e of PCs and provides necessary t o r e c r e a t e one of a designated set of
f o r backup execution t o reduce v u l n e r a b i l i t y t o modules and t h e checkpoint d a t a necessary f o r res-
f a i l u r e s . The u s e r is expected t o a s s i g n p r i o r i t y t a r t i n g t h o s e modules. It responds t o an esta-
l e v e l s t o each t a s k module. These p r i o r i t y l e v e l s b l i s h e d need f o r one of t h o s e modules by r e c r e a t i n g
are used by t h e system t o manage g r a c e f u l degrada- and r e s t a r t i n g i t , and by updating i t t o c u r r e n t
t i o n and t o determine t h e e x t e n t t o which backup i s t i m e i f necessary.
needed f o r a n i n d i v i d u a l task. When a massive
f a i l u r e is d e t e c t e d , p r i o r i t y must be given t o Additional f a u l t d e t e c t i o n and management con-
preserving computations on t h e h i g h e s t p r i o r i t y s t r a i n t s are t h a t backup modules be l o c a t e d one
t a s k modules. s t e p away from t h e primary module, and t h e RC f o r a
p a r t i c u l a r PC be l o c a t e d two s t e p s away from it.
The f a u l t recovery process is managed by t h e For example, imagine a s e c t i o n of a CHAMP l a t t i c e ,
supervisor processor. The s u p e r v i s o r d e t e c t s a as shown i n Figure 6 , where modules a, b y and c are
f a u l t by d i a g n o s t i c procedures, by examination of t a s k s of PC 1, backup copies f o r a, b, and c are
message e r r o r c o r r e c t i o n code, o r by comparing t h e l o c a t e d as shown i n PCs 2, 4 , and 5 r e s p e c t i v e l y ,
r e s u l t s of s e v e r a l independent but i d e n t i c a l compu- and t h e RC f o r PC 1 i s i n PC 3.
t a t i o n s . When a f a u l t is d e t e c t e d t h e offending PC
i s i s o l a t e d by stopping a l l communication t o t h a t
PC r e l a t i n g t o t a s k code execution. The s u p e r v i s o r
then i n s t i t u t e s recovery procedures.

Following t h e recovery, t h e r e are three


choices as t o what can be done f o r t h e f a u l t y PC:
( 1 ) have neighbors t e s t t o determine t h e l o c a t i o n
of t h e f a u l t and i f p o s s i b l e r e a c t i v a t e t h e PC,
working "around" t h e f a u l t y l o c a t i o n ( f o r example,
a "stuck" memory c e l l does n o t preclude using t h e
rest of memory o r a damaged t a s k processor can be

1
eliminated from use, while t h e a s s o c i a t e d 110 pro-
cessor i s r e t a i n e d as a communication p a t h ) , ( 2 ) i f
necessary, r e p l a c e t h e f a u l t y PC with a new PC, o r
( 3 ) i f salvage i s not p o s s i b l e and r e p a i r is n o t
f e a s i b l e ( f o r example, i f t h e CHAMP system i s func-
PROCESSING
t i o n i n g as a f l i g h t c o n t r o l l e r f o r a space vehi- b' CENTERS
c l e ) , t h e PC is simply n o t u s a b l e and t h e excess
processing p o t e n t i a l is consequently reduced.
FIGURE 5 FAULT RECOVERY EXAMPLE

354
The FM i n each PC p e r i o d i c a l l y runs test pro- I n t h e second s i t u a t i o n , a s u p e r v i s o r proces-
grams. Assume t h a t t h e FM i n PC 4 d i s c o v e r s a s o r f i n d s two o r more f a u l t y neighbors i n t h e
f a u l t i n 1. The FM sends t h i s information t o RC course of r e g u l a r d i a g n o s t i c s . This i s grounds f o r
(1). RC (1) s e e k s confirmation by sending a m e s - concern i n t h a t more s e r i o u s l o s s e s are p o s s i b l e
sage t o t h e neighbors of PC 1 d i r e c t i n g t h a t 1 be s i n c e two f a i l u r e s a t v i r t u a l l y t h e same time (or
tested. The neighbors t e s t 1 and r e p o r t t h e w i t h i n a very s h o r t time i n t e r v a l ) is an improbable
r e s u l t s back t o R C ( 1 ) . occurrence.

I f a l l neighbors of 1 f i n d 1 f a u l t y , R C ( 1 ) The t h i r d s i t u a t i o n i s a n u n l i k e l y i n i t i a l
concludes t h a t 1 i s f a u l t y . Otherwise, i t con- d e t e c t o r of a massive f a i l u r e problem, b u t i t i s
cludes t h a t 4 i s f a u l t y and sends t h i s information important s i n c e i t is a p o s s i b l e consequence of
t:o R C ( 4 ) . Note t h a t t h e RC cannot a c c e p t informa- s e c t i o n a l damage. The main o b j e c t i v e is t o d e t e c t
t i o n without confirmation, as t h e PC c o n t a i n i n g t h e and d e a l w i t h a message t h a t is c i r c u l a t i n g end-
P.C may be f a u l t y . So t h e c y c l e s t a r t s again. l e s s l y looking f o r a missing ( l o s t ) PC. The pro-
c e s s t h a t d e t e c t s t h e c i r c u l a t i n g message could
Suppose i t i s confirmed t h a t 1 i s f a u l t y . a l s o a c t i v a t e t h e recovery.
R C ( 1 ) c o n t a i n s a l i s t i n d i c a t i n g where t h e backup
modules are l o c a t e d f o r PC 1, which i n t h i s example Once d e t e c t e d , a massive f a i l u r e invokes a
are i n P C s 2 , 4 , and 5. R C ( 1 ) sends messages t o recovery process similar t o t h e i n i t i a l s t a r t i n g up
t h e ACs of t h e s e P C s c a l l i n g f o r t h e a c t i v a t i o n of of t h e network. Two phases a r e involved: f i r s t ,
t:he a p p r o p r i a t e backup programs and t h e shutdown of r e c o n s t i t u t i n g t h e network and producing an inven-
communications w i t h 1. The l a t t e r a c t i o n p r e v e n t s t o r y of t h e u s e r code modules a v a i l a b l e and,
any messages from being s e n t t o 1 o r being received second, connecting and r e s t a r t i n g t h e u s e r code
from 1. I t a l s o p r e v e n t s any e f f o r t t o move any modules i n o r d e r of t h e p r e v i o u s l y d e c l a r e d p r i o r i -
module t o 1. However, following recovery t h e ties. Our work i n t h i s complex area i s embryonic
neighbors p e r i o d i c a l l y query t h e s h u t o u t c e n t e r i n and w i l l be d e s c r i b e d i n a l a t e r paper.
c a s e i t h a s been r e p a i r e d .

P C s 2, 4, and 5, r e c e i v i n g t h e message from PROJECT STATUS


ILC(l), a c t i v a t e any modules of PC 1 f o r which they
c o n t a i n t h e backup c a p a b i l i t y and checkpoint data. To provide a v e h i c l e t o t e s t , demonstrate, and
P a r t of t h e a c t i v a t i o n p r o c e s s i s t o c r e a t e a new verify operating system algorithms and f a u l t
backup i n a neighbor chosen a t random one s t e p recovery techniques, several s i m p l i f i e d YCs have
away, and t o add t h i s information t o i t s RC. been c o n s t r u c t e d i n hardware u s i n g 8-bit micropro-
c e s s o r s . The design of t h i s hardware emphasizes
When a l l backup modules have been a c t i v a t e d , t h e system f u n c t i o n s ; hence, i t provides a b a s i c
ELC(1) goes dormant and a message i s s e n t t o R C ( 3 ) c a p a c i t y f o r experimentation with system a l g o r i t h m s
i n d i c a t i n g t h e s t a t u s of t h e R C ( 1 ) module. but i s not optimized f o r u s e r a p p l i c a t i o n s .

The f a u l t y P C may have contained an RC f o r


some o t h e r PC. For t h i s reason a l l R C s are
backed-up, similar t o t a s k modules, s o t h a t they
t o o a r e a c t i v a t e d by t h e recovery process.

Recovery from a Massive F a i l u r e

As i n d i c a t e d , t h e a l g o r i t h m may not handle


massive f a i l u r e s . Some simultaneous f a i l u r e s can
be handled i f they occur i n P C s w i t h l i t t l e i n t e r -
communication t r a f f i c . A problem remains, however,
when a s e c t i o n of t h e network i s l o s t .

A p o s s i b l e massive f a i l u r e i s i n d i c a t e d when:

0 expected d a t a do not a r r i v e
6 a s u p e r v i s o r f i n d s two or more faulty
neighbors
+ a message cannot reach the intended
destination. FIGURE 6 PROCESSING CENTER BOARD

The f i r s t s i t u a t i o n a p p l i e s t o t h o s e p r o c e s s e s Figure 6 i s a photograph of t h e PC board. The


which are i n almost continuous o p e r a t i o n on a con- board c o n s i s t s of t h r e e 8 - b i t microprocessors; one
tinuous i n p u t d a t a stream. The absence of a r e s u l t microprocessor f o r each of t h e t h r e e p r o c e s s o r s .
f o r t h e next expected t i m e i n t e r v a l can be sensed The 110 processor has 2k b y t e s of RAM and 4k b y t e s
t o a c t i v a t e recovery processes. of ROM. The s u p e r v i s o r and t a s k p r o c e s s o r s each

355
have 4k b y t e s of RAM and 8k b y t e s of ROM. The 1/0 ACKNOWLEDGEMENT
switching f u n c t i o n f o r communications among proces-
s o r s on t h e board as w e l l as off t h e board t o The a u t h o r s wish t o acknowledge t h e work of
neighboring PCs i s simulated by e i g h t 8-bit peri- Lynn Jacobson and Fred Sommer f o r t h e i r e f f o r t s i n
pheral interface circuits. There are e i g h t 110 t h e development and t e s t i n g of o p e r a t i n g s o f t w a r e
p o r t s under c o n t r o l of t h e 1/0 p r o c e s s o r ; s i x 8-bit f o r t h e CHAMP system hardware.
p a r a l l e l 1 / 0 p o r t s f o r connection t o neighbor PCs,
one 8-bit p a r a l l e l p o r t f o r connection t o t h e
s u p e r v i s o r p r o c e s s o r , and one 8-bit p a r a l l e l p o r t
f o r connection t o t h e t a s k p r o c e s s o r . In this REFERENCES
design, t h e 1/0 processor and t h e t a s k p r o c e s s o r
can a l s o communicate v i a 32 b y t e s of two-ported R. J. Swan, "The Switching S t r u c t u r e and Address-
shared RAM, allowing f a s t d a t a t r a n s f e r between i n g A r c h i t e c t u r e of a n E x t e n s i b l e Multiproces-
these processors. s o r : Cm*," D o c t o r a l T h e s i s , CMU-CS-78-138,
Department of Computer Science, Carnegie-
P r e s e n t e f f o r t s r e l a t i n g t o hardware experi- Mellon U n i v e r s i t y , P i t t s b u r g h , PA.
mentation a r e i n t h e i n i t i a l s t a g e of development.
Operating procedures a r e being developed t o p r o v i d e A. M. Despain and D. A. P a t t e r s o n , "X-Tree: A
a b a s i c l e v e l communication s t r u c t u r e . These pro- Tree-Structured M u l t i p r o c e s s o r Computer Archi-
cedures a r e being designed s o t h a t they can be t e c t u r e , " 5 t h Symposium on Comp. Arch., Palo
expanded t o i n c l u d e more s o p h i s t i c a t e d t e c h n i q u e s A l t o , CA, A p r i l 3-5, 1978, Conf. Proc. pp
as t h e p r o j e c t p r o g r e s s e s . 144-151.

L. R. Weisberg and L. W. Sumney, "The New DOD


Program on Very High Speed I n t e g r a t e d C i r c u i t s
CONCLUSION (VHSI)" 1978 Government M i c r o c i r c u i t Applica-
t i o n s Conference Vol. V I I , pp. 18-20.
The CHAMP t y p e of a r c h i t e c t u r e employing a
network of microcomputers has g r e a t p o t e n t i a l f o r J. Wensley, e t a l , "SIFT: Design and Analysis of
achieving high throughput w i t h r e l i a b i l i t y , sur- a Fault-Tolerant Computer f o r A i r c r a f t Con-
v i v a b i l i t y and e a s e of expansion. The programming t r o l , " Proceedings of t h e IEEE, Vol. 66, No.
methodology based on t h e u s e of M-modules f i t s t h e 10, October 1978.
CHAMP c o n f i g u r a t i o n and a l l o w s i t s u s e i n a number
of important a p p l i c a t i o n environments. The i n t e r - M. C. Pease, "M-Modules: A Design Methodology,"
f a c e between t h e hardware c o n f i g u r a t i o n and t h e T e c h n i c a l Report l7, (Contract No N00014-77-
a p p l i c a t i o n program u s e s a d i s t r i b u t e d e x e c u t i v e C-0308 w i t h t h e O f f i c e of Naval Research,
system t h a t i s i t s e l f a b l e t o t o l e r a t e f a u l t s . Department of t h e Navy, A r l i n g t o n , VA, 22217)
SRI I n t e r n a t i o n a l , Menlo Park, CA 94025, March
Analysis and hardware demonstration have been 1979.
c a r r i e d t o t h e p o i n t where they confirm t h e b a s i c
f e a s i b i l i t y of t h e concept. Much f u r t h e r work is M. C. Pease "ACS.1: An Experimental Automated
needed, however, t o r e a l i z e t h e f u l l p o t e n t i a l s of Command Support System," IEEE Trans. on Sys-
t h e concept. Work towards t h i s o b j e c t i v e is con- tems, g,and C y b e r n e t i c s , Vol. SMC-8, No.
t inuing . 10, pp 725-735, October 1978.

356

You might also like