Professional Documents
Culture Documents
CHAPTER 4 Advances in High Performance Processing of Seismic Data - 1989 - Handbook of Geophysical Exploration Seismic Exploration
CHAPTER 4 Advances in High Performance Processing of Seismic Data - 1989 - Handbook of Geophysical Exploration Seismic Exploration
CHAPTER 4 Advances in High Performance Processing of Seismic Data - 1989 - Handbook of Geophysical Exploration Seismic Exploration
CHAPTER 4
by
E R N S T L. L E I S S
D e p a r t m e n t of C o m p u t e r S c i e n c e
Research C o m p u t a t i o n Laboratory
U n i v e r s i t y of H o u s t o n
and
O L I N G. J O H N S O N
D e p a r t m e n t of C o m p u t e r S c i e n c e
U n i v e r s i t y of H o u s t o n a n d t h e
H o u s t o n Area Research Center
1. INTRODUCTION
A d v a n c e s in g e o p h y s i c a l p r o c e s s i n g a r e d e p e n d e n t o n a d v a n c e s in computer
h a r d w a r e a n d s o f t w a r e . H e n c e , it is i m p o r t a n t for g e o p h y s i c i s t s t o b e a w a r e of
r e s e a r c h efforts a n d n e w p r o d u c t s in c o m p u t e r d e s i g n , I/O d e v i c e s , a l g o r i t h m s , a n d
programs.
H e r e w e s u r v e y t h e s e a r e a s . S e c t i o n t w o a d d r e s s e s a d v a n c e s in hardware.
M a n y r e s e a r c h p r o j e c t s in n e w c o m p u t e r a r c h i t e c t u r e s a r e r e v i e w e d . S o m e of t h e s e
h a v e a l r e a d y b e e n u s e d successfully in g e o p h y s i c a l m o d e l i n g o r p r o c e s s i n g . I/O
a d v a n c e s a r e a l s o c o v e r e d . S e c t i o n t h r e e a d d r e s s e s s o f t w a r e a d v a n c e s in l a n g u a g e s
a n d c o m p i l e r s . S e c t i o n f o u r c o n s i d e r s t h e p r o b l e m s of i m p l e m e n t i n g geophysical
a p p l i c a t i o n s in t h e s e n e w e r s y s t e m s . T h e realities a n d pitfalls of t h e i m p l e m e n t a t i o n
p r o c e s s a r e briefly d i s c u s s e d . T h e s u b j e c t of i n - c o r e p r o g r a m m i n g v e r s u s out-of-
s c o r e p r o g r a m m i n g is c o n s i d e r e d in s o m e d e t a i l . F i n a l l y , i m p l e m e n t i n g v e c t o r a n d
p a r a l l e l p r o g r a m m i n g is d i s c u s s e d .
S.S.Ε —C
56
2. H A R D W A R E ADVANCES
T h e t r a d i t i o n a l v o n N e u m a n n c o m p u t e r c o n s i s t s of a m e m o r y , a p r o c e s s o r , a n d a
bus between them. D a t a and i n s t r u c t i o n s a r e s t o r e d in t h e m e m o r y , and the
p r o c e s s o r c o n t r o l s a n d p e r f o r m s t h e c o m p u t a t i o n s , t h a t is, it g e n e r a t e s a d d r e s s e s
for d a t a a n d i n s t r u c t i o n s , fetches t h e m a n d c o m p u t e s o n d a t a . T h e b u s is t h e m o s t
f r e q u e n t l y u s e d c o m p o n e n t of t h e s y s t e m . T o a v o i d a p o t e n t i a l b o t t l e n e c k , von
N e u m a n n m a c h i n e s often i n c l u d e a s m a l l fast l o c a l s t o r a g e ( l o c a l m e m o r y a n d / o r
c a c h e ) w h i c h is a c c e s s e d m o r e f r e q u e n t l y b y t h e p r o c e s s o r .
T h e v o n N e u m a n n c o m p u t e r is a c o n t r o l flow c o m p u t e r w h e r e t h e flow of c o n -
t r o l c a u s e s t h e e x e c u t i o n of i n s t r u c t i o n s . C e n t r a l t o t h e v o n N e u m a n n m a c h i n e is
t h e c o n c e p t of t h e s t o r e d p r o g r a m , t h e p r i n c i p l e t h a t i n s t r u c t i o n s a n d d a t a a r e t o b e
stored together intermixed in a single, uniform storage medium rather than
s e p a r a t e l y . T h e a m b i g u i t y of t h e i n t e r p r e t a t i o n of a n e l e m e n t in s t o r a g e is r e s o l v e d
only temporarily when it is fetched and either executed as a n instruction or
o p e r a t e d o n a s d a t a . A d a t u m , c r e a t e d a s a r e s u l t of s o m e o p e r a t i o n s in t h e A L U
( a r i t h m e t i c l o g i c u n i t ) , m i g h t p o s s i b l y b e p l a c e d in s t o r a g e a s o t h e r d a t u m , b u t t h e n
fetched a n d e x e c u t e d a s a n i n s t r u c t i o n e i t h e r d e l i b e r a t e l y b y p r o g r a m d e s i g n o r b y
e r r o r . A n o t h e r c o n c e p t c e n t r a l t o t h e v o n N e u m a n n m a c h i n e is t h e p r o g r a m c o u n -
ter, a r e g i s t e r t h a t is u s e d t o i n d i c a t e t h e l o c a t i o n of t h e n e x t i n s t r u c t i o n t o b e
e x e c u t e d a n d w h i c h is a u t o m a t i c a l l y i n c r e m e n t e d b y e a c h i n s t r u c t i o n fetch.
T h e s t u d y of a r c h i t e c t u r e s t h a t utilize v a r i o u s t y p e s of c o n c u r r e n c y is m o t i v a t e d
b y t h e n e e d t o i n c r e a s e t h e p e r f o r m a n c e of c o m p u t e r s . T h e n e w m a c h i n e s w h i c h will
s u p e r s e d e t h e v o n N e u m a n n m o d e l will h a v e g r e a t e r p e r f o r m a n c e a n d m a y u s e very
l a r g e scale i n t e g r a t i o n ( V L S I ) t o i m p l e m e n t t h e c o n c u r r e n t a r c h i t e c t u r e s .
T h e a d v a n c e d c o m p u t e r s s t u d i e d h e r e h a v e b e e n classified a s m u l t i p r o c e s s o r s ,
dataflow computers, array processors, pipelined computers, supercomputers,
systolic a r r a y s , v e r y l a r g e i n s t r u c t i o n w o r d ( V L I W ) m a c h i n e s , a n d uniprocessors
based on the reduced instruction set computer (RISC) architecture. This
classification is b a s e d o n t h e m o d e of e x e c u t i o n of t h e p r o c e s s o r s , t h e p e r f o r m a n c e
a n d size of m e m o r y , t h e c o n t r o l m e c h a n i s m , a n d a n y s p e c i a l i z e d a r c h i t e c t u r e like
VLIW and RISC.
57
P i p e l i n i n g s p e e d s u p s i n g l e - t h r e a d e d c o d e . I n s t r u c t i o n e x e c u t i o n is b r o k e n i n t o
its c o m p o n e n t s (levels) s u c h a s i n s t r u c t i o n fetch, o p c o d e d e c o d i n g , o p e r a n d a d d r e s s
c a l c u l a t i o n , o p e r a n d fetch, a n d e x e c u t i o n , e a c h of w h i c h c a n b e e x e c u t e d i n d e p e n -
d e n t l y w i t h s i m u l t a n e o u s c o m p u t a t i o n s o n different sets of d a t a . A f l o a t i n g a d d c a n
b e p i p e l i n e d a s follows: sign c o n t r o l , e x p o n e n t c o m p a r e , m a n t i s s a shift, m a n t i s s a
add, e x p o n e n t adjust, a n d normalization. T h e E X P R E S S I O N PROCESSOR at
U n i v e r s i t y of W a s h i n g t o n , P I P E a t U n i v e r s i t y of W i s c o n s i n - M a d i s o n and TIP
f r o m J a p a n fall in t h i s c a t e g o r y .
A r r a y p r o c e s s o r s o b t a i n c o n c u r r e n c y b y p e r f o r m i n g i d e n t i c a l o p e r a t i o n s o n dif-
ferent p o r t i o n s of d a t a , t h a t is, t h e y a r e S I M D (single i n s t r u c t i o n s t r e a m , m u l t i p l e
d a t a s t r e a m ) . T h e y a c t a s fast c o p r o c e s s o r s w h i c h offload m a n y of t h e r e p e t i t i v e
c a l c u l a t i o n s n e e d e d in scientific a p p l i c a t i o n s . T h e y a r e c o n n e c t e d / c o n t r o l l e d b y a
h o s t . T h e h o s t p r o v i d e s t h e m e c h a n i s m s for c o m m u n i c a t i o n s a n d c o n t r o l b e t w e e n
t h e a r r a y p r o c e s s o r a n d t h e o u t s i d e w o r l d . It a l s o p e r f o r m s t h e t a s k s of d a t a
management, compilation, and resource allocation/control functions commonly
associated with a general-purpose operating system. Although array processors are
high performance machines, they are b u r d e n e d with several p r o b l e m s . First, struc-
t u r e d d a t a t h a t a r e v e c t o r s of i r r e g u l a r s t r i d e s a r e difficult t o h a n d l e b e c a u s e of
m e m o r y conflicts. S e c o n d l y , p r o g r a m s d o n o t c o n s i s t o n l y of v e c t o r i n s t r u c t i o n s .
The ADAPTIVE ARRAY PROCESSOR from Japan, PARALLEL IMAGE
NEIGHBORHOOD PROCESSOR at University of Missouri, MULTIPLE
P A R A L L E L P R O C E S S O R at G o o d y e a r Aerospace C o r p o r a t i o n , R I C E ARRAY
PROCESSOR at Rice University, V E R Y F A S T P A R A L L E L PROCESSOR at
C o l u m b i a U n i v e r s i t y a r e s o m e of t h e c u r r e n t a r r a y p r o c e s s o r p r o j e c t s .
A b i n a r y a r r a y p r o c e s s o r is a p a r a l l e l m a t r i x p r o c e s s o r in w h i c h e a c h p r o c e s s -
ing e l e m e n t is c o n s t r a i n e d t o bit serial o p e r a t i o n s . A p a r a l l e l m a t r i x p r o c e s s o r is a
S I M D m a c h i n e t h a t h a s a set of p r o c e s s i n g e l e m e n t s ( P E ' s ) o r g a n i z e d a s a t w o -
dimensional matrix such that d a t a m a y only be transferred between adjacent PE's.
D a t a i n t e r c o n n e c t i o n s b e t w e e n P E ' s a r e o n e bit wide. B i n a r y a r r a y processors
process picture data, conventionally represented by a large two-dimensional array
of p i c t u r e e l e m e n t s c a l l e d P i x e l s . B A S E a t P u r d u e U n i v e r s i t y a n d C L I P from
England are binary array processors.
The W A V E F R O N T ARRAY PROCESSOR a t t h e U n i v e r s i t y of Southern
58
C a l i f o r n i a is a s p e c i a l i z e d a r r a y p r o c e s s o r b a s e d o n t h e w a v e f r o n t c o n c e p t . T h e
w a v e f r o n t n o t i o n d r a s t i c a l l y r e d u c e s t h e c o m p l e x i t y in t h e d e s c r i p t i o n of p a r a l l e l
algorithms. The mechanism provided for t h i s d e s c r i p t i o n is a special-purpose,
w a v e f r o n t - o r i e n t e d l a n g u a g e . R a t h e r t h a n r e q u i r i n g a p r o g r a m for e a c h p r o c e s s o r
in t h e a r r a y , t h i s l a n g u a g e a l l o w s t h e p r o g r a m m e r t o a d d r e s s a n e n t i r e front of
processors. The wavefront architecture can provide asynchronous waiting capability
a n d consequently can cope with timing uncertainties such as local clocking, r a n d o m
delay in c o m m u n i c a t i o n s , and fluctuations of c o m p u t i n g times. In short, the
w a v e f r o n t n o t i o n l e n d s itself t o a ( a s y n c h r o n o u s ) d a t a f l o w c o m p u t i n g s t r u c t u r e t h a t
c o n f o r m s well w i t h t h e c o n s t r a i n t s of V L S I . T h e i n t e g r a t i o n of t h e w a v e f r o n t c o n -
cept, the wavefront language, a n d the wavefront architecture leads to a p r o g r a m -
m a b l e c o m p u t e r n e t w o r k c a l l e d t h e w a v e f r o n t a r r a y p r o c e s s o r ( W A P ) . T h e W A P is
in a sense a n o p t i o n a l t r a d e off b e t w e e n t h e g l o b a l l y s y n c h r o n i z e d a n d d e d i c a t e d
systolic a r r a y a n d t h e g e n e r a l - p u r p o s e d a t a f l o w m u l t i p r o c e s s o r . It is m a i n l y a i m e d
at incorporating the vast VLSI computational capability into modern signal
processing applications.
I n a d a t a f l o w c o m p u t e r t h e a v a i l a b i l i t y of i n p u t o p e r a n d s t r i g g e r s t h e e x e c u t i o n
of t h e i n s t r u c t i o n w h i c h c o n s u m e s t h e i n p u t s . It is a s s o c i a t e d w i t h s i n g l e - a s s i g n m e n t
languages in which data flows from one statement to another, execution of
s t a t e m e n t s is d a t a - d r i v e n a n d identifiers o b e y t h e s i n g l e - a s s i g n m e n t r u l e . A n o d e is
said t o b e firable ( e n a b l e d ) if a t o k e n a r r i v e s o n e a c h of t h e i n c o m i n g arcs
r e p r e s e n t i n g t h e n e c e s s a r y o p e r a n d s for t h e n o d e , a n d if n o t o k e n s a r e p r e s e n t o n
the outgoing arcs where the resulting tokens are to be emitted. T o hold the
d a t a b a s e of a l a r g e scale c o m p u t a t i o n , t h e d a t a f l o w c o m p u t e r h a s a r r a y m e m o r i e s .
T h e p r o c e s s i n g e l e m e n t s c o n s i s t of t w o k i n d s of u n i t s — c e l l b l o c k s a n d functional
u n i t s . Cell b l o c k s h o l d t h e i n s t r u c t i o n s a n d p e r f o r m t h e b a s i c f u n c t i o n of r e c o g n i z -
i n g w h i c h i n s t r u c t i o n s a r e r e a d y for e x e c u t i o n . T h e f u n c t i o n a l u n i t s p e r f o r m the
e x e c u t i o n of e n a b l e d i n s t r u c t i o n s .
Dataflow machines can be static or d y n a m i c (tagged), based on the m e t h o d by
which they pass t o k e n s from n o d e to n o d e . A static dataflow m a c h i n e allows only
o n e t o k e n o n a n a r c a t a t i m e . A p r o g r a m , a s s t o r e d in t h e c o m p u t e r ' s m e m o r y ,
c o n s i s t s of i n s t r u c t i o n s l i n k e d t o g e t h e r . E a c h i n s t r u c t i o n h a s a n o p e r a t i o n c o d e ,
s p a c e s for h o l d i n g o p e r a n d v a l u e s a s t h e y a r r i v e , a n d d e s t i n a t i o n fields t h a t i n d i c a t e
w h a t is t o b e d o n e w i t h t h e r e s u l t s of i n s t r u c t i o n e x e c u t i o n . T h e r o u t i n g n e t w o r k
59
2.1.4 Multiprocessors
M o s t of t h e p r e s e n t a r c h i t e c t u r e r e s e a r c h p r o j e c t s a r e m u l t i p r o c e s s o r s , either
shared-memory or message-passing. Multiprocessors use several processors
( h o m o g e n e o u s or h e t e r o g e n e o u s ) concurrently t o solve o n e or m o r e p r o b l e m s . T h e
e a r l y d e v e l o p m e n t of m u l t i p r o c e s s o r h a r d w a r e a n d t h e o p e r a t i n g s y s t e m s n e c e s s a r y
t o m a k e it effective in a p p l i c a t i o n s w e r e l a r g e l y o r i e n t e d t o w a r d i n c r e a s e d s y s t e m
t h r o u g h p u t o v e r single p r o c e s s o r s y s t e m s . T h e y h a v e t h e m o s t flexible c o m p u t e r
a r c h i t e c t u r e in e x p l o i t i n g a r b i t r a r i l y s t r u c t u r e d p a r a l l e l i s m . M u l t i p r o c e s s o r s y s t e m s
h a v e m u l t i p l e i n s t r u c t i o n s t r e a m s o v e r a set of i n t e r a c t i v e p r o c e s s o r s w i t h s h a r e d
r e s o u r c e s s u c h a s m e m o r i e s a n d d a t a b a s e s of a u t o n o m o u s processors with no
shared resources, but with an inter-processor communication network. Multi-
p r o c e s s o r s offer a n o t h e r d i m e n s i o n of p a r a l l e l i s m , n a m e l y m u l t i t a s k i n g ( c a p a b i l i t y
of a s y s t e m t o s u p p o r t t w o o r m o r e a c t i v e t a s k s s i m u l t a n e o u s l y ) in a d d i t i o n t o
v e c t o r i z a t i o n ( t h e p r o c e s s of r e p l a c i n g s e q u e n t i a l c o d e b y v e c t o r i n s t r u c t i o n s ) . T h e y
a r e m a i n l y t w o t y p e s of m u l t i p r o c e s s o r s , s h a r e d - m e m o r y a n d m e s s a g e - p a s s i n g .
In the shared-memory m o d e l , t h e d a t a is in p r e a l l o c a t e d l o c a t i o n s in the
60
s h a r e d - m e m o r y w h e r e it c a n b e a c c e s s e d b y e a c h p r o c e s s o r a n d o p e r a t e d upon
w i t h o u t interruptions from other processors. These m a c h i n e s are structured with a
s w i t c h i n g n e t w o r k , e i t h e r a c r o s s b a r c o n n e c t i o n of b u s e s o r a m u l t i s t a g e n e t w o r k
between processors a n d m e m o r y . P r o c e s s o r - m e m o r y c o m m u n i c a t i o n can also be
via a m u l t i p o r t e d m e m o r y . A n i n t e r l e a v e d m e m o r y is v e r y s u i t a b l e for shared-
memory multiprocessors to avoid some of the memory contentions. Com-
m u n i c a t i o n b e t w e e n p r o c e s s e s r u n n i n g c o n c u r r e n t l y in different p r o c e s s o r s o c c u r s
t h r o u g h shared variables a n d c o m m o n access to o n e large a d d r e s s space. A n a d v a n -
t a g e of s h a r e d - m e m o r y m u l t i p r o c e s s o r s is t h e m e m o r y s p a c e s a v i n g s i n c e o n e c o p y
of t h e o p e r a t i n g s y s t e m suffices. T h e r e is a l i m i t o n t h e n u m b e r of p r o c e s s o r s in a
shared-memory multiprocessor due to the m e m o r y contentions that increase with
a n i n c r e a s i n g n u m b e r of p r o c e s s o r s . S o m e of t h e s h a r e d - m e m o r y multiprocessor
projects are BUTTERFLY at Bolt, Beranek, and Newman, CEDAR at the
University of Illinois, a t U r b a n a - C h a m p a i g n , CM* and C.MMP at Carnegie-
M e l l o n U n i v e r s i t y , C O N C E R T a t M I T ( M a s s a c h u s e t t s I n s t i t u t e of T e c h n o l o g y ) ,
H O M O G E N E O U S M U L T I P R O C E S S O R from C a n a d a , G I G A C O M P U T E R at
A r g o n n e N a t i o n a l L a b o r a t o r y , M I D A S a t t h e U n i v e r s i t y of C a l i f o r n i a a t B e r k e l e y ,
P U M P S a t P u r d u e U n i v e r s i t y a n d R i c e U n i v e r s i t y , R E M P S a t t h e U n i v e r s i t y of
S o u t h e r n C a l i f o r n i a , T A M I P S f r o m J a p a n , T R A C a t t h e U n i v e r s i t y of T e x a s a t
Austin, a n d U L T R A at N e w Y o r k University. C E D A R has processor clusters
w h e r e a p r o c e s s o r c a n a c c e s s its o w n l o c a l m e m o r y o r t h e l o c a l m e m o r y of o t h e r
p r o c e s s o r s in t h e cluster. C E D A R c o m b i n e s t h e c o n t r o l m e c h a n i s m of d a t a f l o w
a r c h i t e c t u r e a n d t h e s t o r a g e m e c h a n i s m of v o n N e u m a n n m a c h i n e s . D I R E C T , a
multiprocessor developed at the University of Wisconsin has an associative
m e m o r y . A n a s s o c i a t i v e m e m o r y is a c o n t e n t a d d r e s s a b l e s t o r a g e , t h a t is, cells in
memory are addressed not by location, but by content. T R A C has a special
p r o p e r t y called varistructurability which m e a n s t h a t a n η-byte o p e r a n d can be
processed by one or m o r e byte-wide processors. T h e o p c o d e that directs these
o p e r a t i o n s m u s t b e i n d e p e n d e n t of t h e p h y s i c a l s t r u c t u r e of t h e m a c h i n e .
T h e message-passing multiprocessors d o not have any globally shared memory.
Each processor has a local m e m o r y a n d a n interprocessor connection network. T h e
a d v a n t a g e of t h e m e s s a g e - p a s s i n g m o d e l is t h a t d a t a is p a s s e d o n l y o n c e t h r o u g h
t h e c o n n e c t i o n n e t w o r k w h i l e t w o p a s s e s ( w r i t e a n d r e a d ) a r e n e e d e d for the
s h a r e d - m e m o r y m o d e l u n l e s s t h e d a t a is in t h e l o c a l s t o r a g e . Y e t a n o t h e r a d v a n t a g e
is t h a t for data-driven computation, data is p a s s e d through the network at
g e n e r a t i o n t i m e a n d n o t w h e n it is n e e d e d . T h u s l o n g e r d e l a y s t h r o u g h t h e n e t w o r k
c a n b e t o l e r a t e d in t h e c a s e w h e n d a t a is n o t u s e d i m m e d i a t e l y after its g e n e r a t i o n .
61
T h e s e m a c h i n e s c a n h a v e a v e r y l a r g e n u m b e r of p r o c e s s o r s , t h u s p o t e n t i a l l y h a v i n g
a very h i g h p e r f o r m a n c e . M e s s a g e - p a s s i n g m u l t i p r o c e s s o r s a r e difficult t o p r o g r a m
since a p r o g r a m m e r m u s t k n o w t h e c o d e e x e c u t e d b y e a c h p r o c e s s o r in o r d e r t o
p a s s t h e d a t a b e t w e e n p r o c e s s o r s c o r r e c t l y . S o m e of t h e m e s s a g e - p a s s i n g m u l t i -
p r o c e s s o r p r o j e c t s a r e C H I P a t t h e U n i v e r s i t y of W a s h i n g t o n a n d P u r d u e U n i v e r -
sity, C O N N E C T I O N M A C H I N E a t M I T a n d T h i n k i n g M a c h i n e s , Inc., C O S M I C
CUBE at California I n s t i t u t e of T e c h n o l o g y , DADO at C o l u m b i a University,
DON from Japan, MANIP at P u r d u e University, MU6V from England, and
Z M O B a t t h e U n i v e r s i t y of M a r y l a n d . P A S M is a m e s s a g e - p a s s i n g m u l t i p r o c e s s o r
at Purdue University with a partitionable SIMD/MIMD architecture. A par-
t i t i o n a b l e S I M D / M I M D s y s t e m is a p a r a l l e l p r o c e s s i n g s y s t e m w h i c h c a n b e s t r u c -
t u r e d a s o n e o r m o r e i n d e p e n d e n t S I M D a n d / o r M I M D m a c h i n e s of v a r i o u s sizes.
FAIM-1 at Fairchild Laboratory for Artificial Intelligence has a number of
p r o c e s s o r s w h e r e e a c h p r o c e s s o r is a f a n a t i c a l l y r e d u c e d i n s t r u c t i o n set c o m p u t e r
(FRISC). FRISC supports low-level symbol processing in ways similar to
uniprocessor Lisp-Machines: tagged-memory architecture, stack caches, and a
t a i l o r e d i n s t r u c t i o n set.
The W A F E R S C A L E I N T E G R A T E D M U L T I P R O C E S S O R at the Univer-
sity of Illinois a t U r b a n a - C h a m p a i g n h a s t h e m u l t i p r o c e s s o r p l a c e d o n a wafer. A
wafer scale i n t e g r a t e d m u l t i p r o c e s s o r is a m a c r o - c i r c u i t c o n s i s t i n g of a r e c t a n g u l a r
a r r a y of i n t e r c o n n e c t e d m o d u l e s a r r a n g e d o n a l a r g e p i e c e of silicon. E a c h of t h e s e
m o d u l e s c o u l d b e a s c o m p l e x a s t h e v e r y l a r g e scale i n t e g r a t e d (VLSI) multi-
processor. These m o d u l e s are n o t separately manufactured, tested a n d then assem-
b l e d a s V L S I c h i p s a r e . T h e y a r e f a b r i c a t e d a s a single u n i t , t h e V L S I wafer.
R P 3 a t I B M , T . J. W a t s o n R e s e a r c h C e n t e r , C H O P P a t C o l u m b i a U n i v e r s i t y ,
H M 2 P at Rennsselaer Polytechnic Institute, M U L T I PROCESSOR/COMPUTER
a t P r i n c e t o n U n i v e r s i t y h a v e a o r g a n i z a t i o n a l d u a l i t y of s h a r e d - m e m o r y multi-
processors and message-passing multiprocessors. They incorporate the advantages
of b o t h m o d e l s a n d h e n c e s e r v e m o r e a p p l i c a t i o n s . U L T R A a n d R P 3 h a v e a s p e c i a l
switch feature called c o m b i n i n g . In this process, m e m o r y requests a i m e d at the
same m e m o r y location are c o m b i n e d into one request at the switch they are passing
by.
F F P M a t t h e U n i v e r s i t y of N o r t h C a r o l i n a , M U L T I P R O C E S S O R REDUC-
TION MACHINE from England, S E R F R E from France, R E D I F L O W at the
U n i v e r s i t y of U t a h a r e all R e d u c t i o n m u l t i p r o c e s s o r s . I n a r e d u c t i o n c o m p u t e r , t h e
r e q u i r e m e n t for a r e s u l t t r i g g e r s t h e e x e c u t i o n of t h e i n s t r u c t i o n t h a t will g e n e r a t e
t h e v a l u e . It is a s s o c i a t e d w i t h a p p l i c a t i v e ( r e d u c t i o n of f u n c t i o n a l ) l a n g u a g e s . T h e
62
2.1.5 Supercomputers
E L I - 5 1 2 , d e s i g n e d a t Y a l e U n i v e r s i t y , is a V e r y L a r g e I n s t r u c t i o n W o r d ( V L I W )
m a c h i n e . V L I W m a c h i n e s a r e h i g h l y p a r a l l e l a r c h i t e c t u r e s t h a t offer a n a l t e r n a t i v e
to multiprocessors a n d array processors. They resemble ordinary multiprocessors
b u t h a v e a t i g h t l y c o u p l e d , single-flow c o n t r o l m e c h a n i s m . P r o g r a m s for V L I W s
m u s t specify fine-grained h a r d w a r e c o n t r o l . It is i m p o s s i b l e t o h a n d c o d e VLIW
m a c h i n e s . V L I W m a c h i n e s h a v e o n e c e n t r a l c o n t r o l u n i t i s s u i n g a single w i d e
instruction per cycle. Each wide instruction consists of many independent
o p e r a t i o n s . E a c h o p e r a n d r e q u i r e s a s m a l l , s t a t i c a l l y p r e d i c t a b l e n u m b e r of cycles
to execute. O p e r a t i o n s are pipelined. T h e underlying sequential architecture is
i n v a r i a b l y a r e d u c e d i n s t r u c t i o n set c o m p u t e r . T h e i n s t r u c t i o n s in t h e u n d e r l y i n g
R I S C - l e v e l a r e c a l l e d o p e r a t i o n s , w h i l e t h e t e r m i n s t r u c t i o n is r e s e r v e d for t h e v e r y
l o n g i n s t r u c t i o n w o r d s , w h i c h a r e c o l l e c t i o n s of o p e r a t i o n s . T h e i n s t r u c t i o n s a r e in
a single flow of c o n t r o l . T h u s a single l o n g i n s t r u c t i o n w o r d is fetched, a n d all t h e
p r o c e s s o r s d o t h e i r i n d i v i d u a l o p e r a t i o n s . T h e o p e r a t i o n s differ for t h e various
p r o c e s s o r s . After a n i n s t r u c t i o n is e x e c u t e d , t h e n e x t i n s t r u c t i o n is c h o s e n and
fetched. T h e i n s t r u c t i o n w o r d c o m p l e t e l y c o n t r o l s all c o m m u n i c a t i o n s a m o n g t h e
p r o c e s s o r s . D a t a t r a n s f e r s a n d t h e i r t i m i n g s a r e c o m p l e t e l y c h o r e o g r a p h e d in t h e
64
c o d e . C o m p a c t i o n is t h e p r o c e s s of g e n e r a t i n g v e r y l o n g i n s t r u c t i o n s f r o m some
s e q u e n t i a l s o u r c e . A c o m p a c t i n g c o m p i l e r is a c o m p i l e r t h a t t a k e s s o m e s e q u e n t i a l
high-level s o u r c e a n d g e n e r a t e s c o m p a c t e d c o d e . A c o m p i l e r ( B u l l d o g ) exists ( a t
Y a l e ) t h a t c a n p r o d u c t h i g h l y p a r a l l e l c o d e f r o m a b r o a d r a n g e of o r d i n a r y s e q u e n -
tial p r o g r a m s . T h i s c o m p i l e r uses a technique called T r a c e Scheduling. Trace
s c h e d u l i n g is a c o m p l e x p r o c e d u r e . T o h a n d l e c o n d i t i o n a l j u m p s in a p r o g r a m , a
t r a c e s c h e d u l i n g c o m p i l e r uses i n f o r m a t i o n a b o u t t h e d y n a m i c b e h a v i o r of t h e
p r o g r a m t o d o g r e e d y s c h e d u l i n g of o p e r a t i o n s . T h e c o m p i l e r c a n m a k e good
g u e s s e s w h e n j u m p s a r e w e i g h e d h e a v i l y t o w a r d s o n e l e g — b e c a u s e in t h i s c a s e it is
productive to be greedy. Otherwise V L I W s are p r o b a b l y the w r o n g architecture to
use.
R I S C a t U n i v e r s i t y of C a l i f o r n i a a t B e r k e l e y a n d M I P S a t S t a n f o r d University
a r e u n i p r o c e s s o r s b a s e d o n a R e d u c e d I n s t r u c t i o n Set C o m p u t e r ( R I S C ) a r c h i t e c -
t u r e . R I S C a r c h i t e c t u r e f e a t u r e s a s i m p l e , r e g u l a r i n s t r u c t i o n set w h i c h a l l o w s a
combination of i n s t r u c t i o n s t o b e e x e c u t e d faster t h a n t h e e q u i v a l e n t complex
i n s t r u c t i o n s . A t r a d i t i o n a l c o m p l e x i n s t r u c t i o n set c o m p u t e r relies o n h u n d r e d s of
specialized instructions, dozens of addressing modes, and several high-level
l a n g u a g e s i m p l e m e n t e d in h a r d w a r e . I n s u c h a c o m p u t e r t h e c o m p i l e r m u s t c o n -
sider t h e m a n y p o s s i b i l i t i e s i n h e r e n t in a c o m p l e x i n s t r u c t i o n a n d p e r f o r m a n u m -
b e r of m e m o r y t r a n s f e r s t o e x e c u t e it. T h i s r e q u i r e s i d e n t i f y i n g t h e i d e a l a d d r e s s i n g
m o d e a n d t h e s h o r t e s t i n s t r u c t i o n f o r m a t t o a d d t h e o p e r a n d s in m e m o r y . Y e t o n l y
a s m a l l n u m b e r of i n s t r u c t i o n t y p e s t a k e s u p m o s t of a c o m p u t e r ' s e x e c u t i o n t i m e .
L o a d , call a n d b r a n c h i n s t r u c t i o n s a r e f o u n d in c o m p i l e d c o d e m o r e often t h a n a n y
other instruction type. C o m p l e x o p e r a t i o n s c a n a c t u a l l y b e e x e c u t e d faster by
b r e a k i n g e a c h o n e d o w n i n t o a series of s i m p l e i n s t r u c t i o n s t h a t m o v e d a t a b e t w e e n
r e g i s t e r s a n d m e m o r y . T h i s is t h e p r i n c i p l e b e h i n d t h e R I S C a p p r o a c h . S o m e salient
f e a t u r e s of a R I S C - b a s e d m a c h i n e a r e r e g i s t e r t o r e g i s t e r o p e r a t i o n s t h a t allow
o p t i m i z a t i o n of c o m p i l e r s t h r o u g h r e u s e of o p e r a n d s w i t h i n s t r u c t i o n f o r m a t s , a n d
a d d r e s s i n g m o d e s t h a t p e r m i t i n s t r u c t i o n s t o b e d e c o d e d in a s i n g l e - m a c h i n e cycle.
M e m o r y reference i n s t r u c t i o n s c o n s i s t i n g of l o a d a n d s t o r e o p e r a t i o n s a r e a l s o
typical. A R I S C m a c h i n e h a s a high p e r f o r m a n c e m e m o r y hierarchy including
g e n e r a l p u r p o s e r e g i s t e r a n d c a c h e . O n e of t h e a d v a n t a g e s of t h e R I S C a p p r o a c h is
t h e p o t e n t i a l t o r e u s e a n y r e s u l t w i t h o u t c o m p u t i n g it.
65
S e i s m i c p r o c e s s i n g is i n d i s p u t a b l y o n e of t h e m o s t d a t a i n t e n s i v e a p p l i c a t i o n s t o
b e f o u n d . W e s t e r n G e o p h y s i c a l often c l a i m e d t h a t its t a p e l i b r a r y w a s s e c o n d o n l y
t o t h a t of t h e U . S . g o v e r n m e n t in size. D a t a c o l l e c t i o n , p r o c e s s i n g a n d s t o r a g e is
t h u s a m a t t e r of c o n s i d e r a b l e i m p o r t a n c e . C l e a r l y , a c o m p u t e r w i t h t h e fastest of
p r o c e s s o r s is u n e q u a l t o t h e t a s k of c o m m e r c i a l seismic p r o c e s s i n g if its I/O com-
p o n e n t s a r e i n a d e q u a t e . T h e seismic i n d u s t r y h a s n o t b e e n j u s t a c o n s u m e r of I/O
devices. It h a s , i n s t e a d , b e e n a p r i m a r y m o t i v a t i n g force in t h e d e v e l o p m e n t of n e w
devices. It h a s l o n g b e e n s t a n d a r d o p e r a t i n g p r o c e d u r e for I/O manufacturers to
a r r a n g e e a r l y e x p e r i m e n t s a n d t e s t s of t h e i r e q u i p m e n t in a s e i s m i c e n v i r o n m e n t .
I/O a d v a n c e s h a v e o c c u r r e d in m a n y t y p e s of h a r d w a r e : c h a n n e l s , c a r t r i d g e
t a p e s , o p t i c a l d i s k s , h y p e r d i s k s , solid s t a t e d e v i c e s , r a s t e r i z e r s , p l o t t e r s a n d CRT
g r a p h i c d i s p l a y s . It is p o s s i b l e o n l y t o s u m m a r i z e t h e l a t e s t s t a t u s of t h e s e t y p e s of
devices w i t h o u t l a r g e c h a p t e r s of t e c h n i c a l d e t a i l .
CHANNELS
TAPES
T a p e a d v a n c e s h a v e n o t s h o w n t h e s a m e m a g n i t u d e in i m p r o v e m e n t s a s o n e
finds in c o m p u t a t i o n s . T h e f o l l o w i n g t a b l e s u m m a r i z e s t h e r e l a t i v e performance
r a t e s a t t h e b e g i n n i n g of e a c h of t h e last t h r e e d e c a d e s in t a p e t e c h n o l o g y a n d in
computational performance.
T A B L E 1.
(in/sec) (bpi)
1960 75 800 1
1970 125 1600 20
1980 200 6250 200
T h u s , t a p e s a r e 2 0 t i m e s a s fast w h e r e a s c o m p u t e r s a r e 2 0 0 t i m e s a s fast. T h e
p r e s e n t d e c a d e h a s w i d e n e d t h i s difference w i t h c o m p u t e r s o p e r a t i n g a t o n e g i g a f l o p
( a p p r o x i m a t e l y t h e e q u i v a l e n t of 3 0 0 0 m i p s ) w i t h n o s u b s t a n t i a l i m p r o v e m e n t in
67
OPTICAL DISKS
O p t i c a l s t o r a g e t e c h n o l o g y is g r a d u a l l y b e c o m i n g m o r e i m p o r t a n t . I t s c h a r a c -
teristics make it an interesting alternative to conventional magnetic storage
technology, especially m a g n e t i c tape.
O p t i c a l s t o r a g e w a s first u s e d c o m m e r c i a l l y for v i d e o a n d a u d i o c o m p a c t d i s k s .
W h e r e a s in m a g n e t i c m e d i u m , information is r e c o r d e d and read by changing
m a g n e t i c p r o p e r t i e s , o p t i c a l s t o r a g e t e c h n o l o g y uses t i n y s o l i d - s t a t e l a s e r s t o c r e a t e
( w r i t e ) a n d s e n s e ( r e a d ) m i c r o s c o p i c p i t s in t h e d i s k ' s surface. T y p i c a l l y , t h e d i s k is
c o a t e d w i t h a reflective m a t e r i a l ; w r i t i n g t h e n c o n s i s t s of b u r n i n g a pit i n t o t h a t s u r -
face m a t e r i a l u s i n g t h e l a s e r a t a h i g h e r p o w e r s e t t i n g , w h i l e r e a d i n g is d o n e b y
m e a s u r i n g t h e reflectivity of a p a r t i c u l a r p o s i t i o n . T h u s , h i g h reflectivity ( n o p i t )
m i g h t r e p r e s e n t a 0 a n d l o w reflectivity ( p i t ) a t 1. T h i s s e t - u p is t h e b a s i s for all of
t h e c u r r e n l y ( 1 9 8 7 ) c o m m e r c i a l l y a v a i l a b l e l a s e r d i s k s ; it follows f r o m t h i s t h a t
i n f o r m a t i o n c a n b e r e c o r d e d o n l y o n c e , b u t r e a d m a n y t i m e s , g i v e n rise t o t h e
a c r o n y m W O R M ("write once, read m a n y " ) . This indicates the major d i s a d v a n t a g e
of c u r r e n t optical storage technology: it is g e n e r a l l y not possible to change
information stored o n such a laser disk.
( S t r i c t l y s p e a k i n g , t h i s is n o t q u i t e t r u e ; if o n e u s e s c e r t a i n n o n - s t a n d a r d c o d e s
t o r e c o r d i n f o r m a t i o n , a c e r t a i n n u m b e r of c h a n g e s of i n f o r m a t i o n r e c o r d e d o n a
W O R M l a s e r d i s k is p o s s i b l e . F o r a d i s c u s s i o n of t h i s issue a n d h o w t o g u a r a n t e e
t h a t s u c h c h a n g e s c a n b e p r e v e n t e d , see [ L E I S S 8 4 ] . H o w e v e r , since t h i s w o u l d
r e q u i r e c h a n g e s in t h e r e c o r d i n g s o f t w a r e a n d firmware, t h i s p o s s i b i l i t y is i g n o r e d
here.)
The ability to rewrite information seems crucial, mainly because one is
accustomed t o it. H o w e v e r , upon examining the requirements of seismic data
s t o r a g e ( a s well a s t h o s e of m a n y o t h e r t y p e s of i n f o r m a t i o n ) , it s h o u l d b e o b v i o u s
that the W O R M m e d i u m l a s e r d i s k is q u i t e a c c e p t a b l e , e s p e c i a l l y since it h a s
s e v e r a l i n t e r e s t i n g f e a t u r e s t h a t a r e q u i t e a t t r a c t i v e for s t o r a g e of s e i s m i c d a t a :
1. Permanence and Robustness: C o m p a r e d with magnetic media, information
68
HYPERDISKS
T h e s t a n d a r d h i g h p e r f o r m a n c e d i s k s for t h e C D C a n d C r a y s y s t e m s h a v e b e e n
m a n u f a c t u r e d b y C D C . T h e D D - 2 9 series t r a n s f e r s d a t a a t 4 M b y t e s / s e c a n d h a s a
c a p a c i t y of .6 G b y t e s . T h e n e w e r D D - 4 9 series h a s a s p e e d of 10 M b y t e s / s e c a n d a
c a p a c i t y of 1.2 G b y t e s .
S i n c e 1982, I b i s S y s t e m s of W e s t l a k e , C a l i f o r n i a has produced a parallel-
transfer disk drive m a d e with a p r o p r i e t a r y 14-inch t h i n film m e d i u m . I t s first
p r o d u c t , t h e M o d e l 1400, h a s a 12 M b y t e / s e c d a t a t r a n s f e r r a t e a n d a 1.4 G b y t e
s t o r a g e c a p a c i t y . I n o r d e r t o m a k e t h e s e d i s k s useful t o i n d u s t r y in g e n e r a l , I b i s h a s
d e v e l o p e d t w o i n d u s t r y s t a n d a r d i n t e r f a c e s , I b i s - I a n d I b i s - I I . B o t h of t h e s e i n t e r -
faces satisfy t h e r e q u i r e m e n t s of t h e I n t e l l i g e n t S t a n d a r d I n t e r f a c e ( I S I ) . I b i s h a s
s h i p p e d o v e r 1000 of t h e s e u n i t s t o C r a y , its single l a r g e s t c u s t o m e r .
I n o r d e r t o u s e t h e s e d i s k s e v e n m o r e effectively t h a n s i m p l y r e l y i n g o n t h e i r
i n h e r e n t s p e e d , t h e c o n c e p t of d i s k s t r i p i n g h a s a r i s e n . I n t h i s t e c h n i q u e , s e q u e n t i a l
e l e m e n t s of a file a r e d i v i d e d i n t o s m a l l g r o u p s s o t h a t o n e g r o u p o c c u p i e s o n e
t r a c k of a disk. S e q u e n t i a l g r o u p s a r e s t o r e d a c r o s s t h e d i s k u n i t s s o t h a t s e v e r a l
g r o u p s c a n b e r e a d in p a r a l l e l . U s i n g a m u l t i d i m e n s i o n a l v a r i a t i o n of t h i s t e c h n i q u e
along with other programming techniques Lhemann [LHEM85] was able to
c o n v e r t a n I/O bound three dimensional migration algorithm into a compute
bound program.
70
3. A D V A N C E S I N SOFTWARE
F o r t r a n r e m a i n s t h e m o s t c o m m o n l y u s e d p r o g r a m m i n g l a n g u a g e for scientific
computing. While other languages are being used (Pascal, C, A d a ) , they should not
p r e s e n t m a j o r c h a l l e n g e s t o F o r t r a n ' s d o m i n a t i o n ( s t r a n g l e - h o l d ? ) o n t h i s field for
t h e n e a r future. O f i m p o r t a n c e h o w e v e r , is t h e fact t h a t C r a y s e e m s i n t e n t t o p h a s e
in U N I X a s m a i n o p e r a t i n g s y s t e m ; t h i s s h o u l d give C a c e r t a i n a d v a n t a g e . T h e
e m p h a s i s p l a c e d b y t h e U S D e p a r t m e n t of D e f e n s e ( D o D ) o n A d a d o e s n o t s e e m
t o b e s h a r e d b y t h e m a n u f a c t u r e r s of h i g h - p e r f o r m a n c e c o m p u t i n g e q u i p m e n t n o r
their software suppliers, mainly because D o D has not (yet) materialized as a major
buyer. On the other hand, the proposed Fortran Standard, hopefully called
F o r t r a n 8X ( t h e X t o b e r e p l a c e d b y e i t h e r 8 o r 9 — t h i s is w h e r e t h e h o p e c o m e in:
if final a d o p t i o n d o e s n o t t a k e p l a c e in t h i s d e c a d e , it will b e F o r t r a n 9 X ! ) , will
i n c o r p o r a t e c e r t a i n l a n g u a g e f e a t u r e s t h a t will a i d in utilizing v e c t o r , a n d t o a lesser
e x t e n t , p a r a l l e l , c o m p u t e r s . F o r t r a n is h i g h l y s u i t a b l e for v e c t o r p r o c e s s i n g b e c a u s e
its m a i n p r o g r a m s t r u c t u r e is D O - l o o p , a n d t h i s is p r e c i s e l y t h e c o n s t r u c t that
vectorizes best automatically. T h e p r o p o s e d SEG seismic s u b r o u t i n e s (Seismic
S u b r o u t i n e S t a n d a r d ) a r e b a s i c a l l y a l i b r a r y of s u b r o u t i n e s w h i c h facilitates seismic
processing; they are formulated l a n g u a g e - i n d e p e n d e n t l y b u t are clearly a i m e d at
F o r t r a n . F o r t r a n h o w e v e r , a l t h o u g h e x c e l l e n t for v e c t o r i z a t i o n , is a p o o r vehicle for
parallel c o m p u t a t i o n s . F o r this reason, various languages have been designed with
t h e a i m of facilitating t h e u s e of p a r a l l e l i s m t h a t is a v a i l a b l e in t h e h a r d w a r e ; t h e y
71
e n a b l e t h e p r o g r a m m e r t o c o n t r o l p a r a l l e l i s m explicitly. N o n e of t h e m h o w e v e r h a s
r e a c h e d a level of a c c e p t a n c e t h a t p r o m i s e s significant p r o s p e c t s for b e c o m i n g a
standard (or even only dominating).
3.2 Compilers
4. I M P L E M E N T A T I O N : R E A L I T I E S A N D PITFALLS
P r o b l e m s in seismic d a t a p r o c e s s i n g a r e c h a r a c t e r i z e d b y h u g e d a t a sets, o c c u r -
ring b o t h as input a n d as o u t p u t . F o r e x a m p l e , a 3 D m i g r a t i o n p r o g r a m m a y h a v e
72
a s i n p u t a d a t a set c o n s i s t i n g of 2 4 0 t r a c e s o n 2 4 0 lines, w i t h e a c h t r a c e c o n t a i n i n g
3 0 0 0 s a m p l e s ( S A L N O R 7 ; see N e l s o n , 1982). C o n s e q u e n t l y , t h e i n p u t file c o n t a i n s
172.8 m i l l i o n n u m b e r s ; if e a c h n u m b e r ( w o r d ) h a s 32 b i t s , t h e i n p u t file is of size 5.5
G i g a b i t s , w i t h t h e o u t p u t file b e i n g of t h e s a m e o r d e r of m a g n i t u d e . Therefore,
p r o c e s s i n g realistic seismic d a t a sets is very likely t o a t least severely s t r a i n , if n o t
e x c e e d t h e c a p a c i t y of m o s t c u r r e n t c o m p u t e r s y s t e m s .
T h r e e issues a r e of m a j o r i m p o r t a n c e in t h i s c o n t e x t :
- T h e a m o u n t of p r i m a r y o r m a i n m e m o r y a v a i l a b l e for p r o c e s s i n g
- T h e a v a i l a b i l i t y of v e c t o r p r o c e s s i n g
- T h e p o s s i b i l i t y of utilizing p a r a l l e l i s m , especially m a c r o p a r a l l e l i s m .
I n t h e f o l l o w i n g s e c t i o n s , w e d i s c u s s e a c h of t h e s e issues a n d o u t l i n e t h e i r
i m p l i c a t i o n s for t h e p r e s e n t a n d t h e f u t u r e of seismic d a t a p r o c e s s i n g .
A p r o g r a m w h o s e d a t a in t h e i r e n t i r e l y c a n b e r e a d i n t o m a i n m e m o r y from
s e c o n d a r y s t o r a g e d e v i c e s ( d i s k s , t a p e s ) is c a l l e d i n - c o r e . I n c o n t r a s t , a n o u t - o f - c o r e
program requires that the operations performed by the p r o g r a m be grouped
t o g e t h e r i n t o p r o g r a m p a r t s in s u c h a w a y t h a t t h e d a t a set c a n b e p a r t i t i o n e d i n t o
subsets with the following properties:
- E a c h s u b s e t fits i n t o t h e a v a i l a b l e m a i n m e m o r y
- The operations in one program part require only the data in the
c o r r e s p o n d i n g d a t a subset.
T h e r e f o r e , a t different t i m e s d u r i n g t h e e x e c u t i o n of t h e p r o g r a m , different d a t a
s u b s e t s will r e s i d e in m a i n m e m o r y .
W i t h t h e e x c e p t i o n of t h e C r a y 2, c u r r e n t l y a v a i l a b l e c o m p u t e r s y s t e m s a r e
u n a b l e t o a c c o m m o d a t e in m a i n m e m o r y d a t a sets of size in excess of 5 G i g a b i t s ;
t h e r e f o r e i n - c o r e p r o g r a m s a r e n o t feasible. T h i s l e a v e s t w o a l t e r n a t i v e s , n a m e l y
out-of-core p r o g r a m m i n g a n d virtual m e m o r y m a n a g e m e n t .
A virtual m e m o r y e n v i r o n m e n t provides a u t o m a t i c paging; this m e a n s that the
data set is u n i f o r m l y subdivided into relatively small portions (in t h e VAX,
512 w o r d s ) , c a l l e d p a g e s . T h e s e p a g e s initially r e s i d e o n d i s k . W h e n e v e r a d a t a i t e m
is n e e d e d d u r i n g e x e c u t i o n , t h e o p e r a t i n g s y s t e m d e t e r m i n e s a u t o m a t i c a l l y in w h i c h
page the item resides a n d reads t h a t p a g e from disk into m a i n m e m o r y . While this
is d o n e , t h e p r o g r a m w a i t s . T h e r e t r i e v a l of a p a g e f r o m d i s k m a y r e q u i r e t w o
o r d e r s of m a g n i t u d e ( o r m o r e ) m o r e t i m e t h a n t h e o p e r a t i o n t h a t is e v e n t u a l l y
p e r f o r m e d o n t h e r e q u e s t e d i t e m . S i n c e t h e n u m b e r of p a g e s t h a t fit i n t o m a i n
73
m e m o r y is l i m i t e d , t h e r e q u e s t for a n o t h e r p a g e m a y n e c e s s i t a t e t h e r e m o v a l of a
p a g e c u r r e n t l y in m a i n m e m o r y . A l s o , t h e s a m e p a g e m a y h a v e t o b e r e t r i e v e d
a g a i n , e v e n if a different d a t a i t e m is r e q u e s t e d , b e c a u s e m a n y different i t e m s r e s i d e
in t h e s a m e p a g e . If t h e p a g e h a s b e e n r e m o v e d in t h e m e a n t i m e , it will h a v e t o b e
r e a d f r o m d i s k a g a i n in t h i s c a s e . A s a n i l l u s t r a t i o n c o n s i d e r t h e f o l l o w i n g two
functionally identical F o r t r a n loops:
D O 107=1,512 DO 107=1,512
D O 20 J = 1,512 D O 2 0 1 = 1,512
A(l J) = B{1 J) + C(7, J) A(I, J) = B(I9 J) + C(7, J)
20 C O N T I N U E 20 C O N T I N U E
10 C O N T I N U E 10 C O N T I N U E
Loops (LI) Loops (L2 )
If w e a s s u m e t h a t 512 a r r a y e l e m e n t s fit i n t o o n e p a g e , t h e n ( L I ) p e r f o r m s o v e r
a q u a r t e r of a m i l l i o n p a g e r e t r i e v a l s , w h e r e a s in ( L 2 ) o n l y 512 p a g e r e t r i e v a l s a r e
necessary b e c a u s e a r r a y s in F o r t r a n are stored in c o l u m n s . R u n n i n g t h e two
p r o g r a m s o n a V A X - 1 1 / 7 8 0 yields t h e f o l l o w i n g t i m i n g s : ( L I ) r e q u i r e s 2 9 3 sec, ( L 2 )
r e q u i r e s 9 sec.
V i r t u a l m e m o r y is n o t a t all t h e s a m e a s o u t - o f - c o r e p r o g r a m m i n g : in a n o u t -
of-core v e r s i o n , t h e e m p h a s i s is a t least a s m u c h o n p a r t i t i o n i n g t h e o p e r a t i o n s of
t h e p r o g r a m a s it is o n p a r t i t i o n i n g t h e d a t a ; in fact t h e t w o h a v e t o b e v e r y well
c o o r d i n a t e d . I n a v i r t u a l m e m o r y e n v i r o n m e n t , n o a t t e n t i o n is p a i d a t all t o t h e
p a r t i t i o n i n g of t h e o p e r a t i o n s , a n d a s t h e e x a m p l e a b o v e s h o w s , v a s t l y different
d a t a t r a n s f e r r e q u i r e m e n t s a n d c o n s e q u e n t l y v a s t l y different t i m i n g s m a y result.
In a virtual m e m o r y e n v i r o n m e n t the p r o g r a m m e r is less a b l e t o control
precisely t h e flow of i n p u t a n d o u t p u t ; t h i s m a y r e s u l t in inefficient u s e of t h e
c o m p u t e r resources. F o r this reason, virtual m e m o r y h a s n o t been preferred for
high-performance data processing. Indeed, supercomputers such as the Cray
systems at present do not support virtual memory management; instead the
p r o g r a m m e r is r e q u i r e d t o p a r t i t i o n d a t a a n d o p e r a t i o n s explicitly. T h i s r e s u l t s
in a t r a d e o f f b e t w e e n s a v i n g s in c o m p u t e r r e s o u r c e s ( a t t h e c o s t of additional
p r o g r a m m e r effort) a n d s a v i n g s in p e o p l e r e s o u r c e s ( a t t h e c o s t of c o m p u t e r t i m e ) .
At present, out-of-core programming is still n e c e s s a r y in realistic seismic
p r o c e s s i n g . T o give a c o n c r e t e e x a m p l e of t h e a m o u n t of c o m p u t e r t i m e t h a t c a n b e
saved by intelligently restructuring d a t a a n d instructions coordinately, consider a n
i m p l e m e n t a t i o n of t h e 3 D P h a s e Shift m i g r a t i o n of t h e S A L N O R 7 m o d e l o n t h e
Cray X - M P [ L H E M 8 5 ] . A perfectly c o m p e t e n t initial i m p l e m e n t a t i o n has an
74
V e c t o r p r o c e s s i n g is c u r r e n t l y t h e m a i n s t a y of all s e r i o u s s e i s m i c d a t a p r o c e s s i n g .
T h i s is d u e t o t h e f o l l o w i n g o b s e r v a t i o n :
Any Fortran program that:
- uses l a r g e a m o u n t s of m e m o r y ,
- h a s l a r g e i n p u t a n d o u t p u t d a t a sets, a n d
12
- p e r f o r m s a t least 1 0 operations
c a n b e v e c t o r i z e d w i t h a r a t h e r m o d e s t a m o u n t of effort, t o s u c h a n e x t e n t t h a t
a s p e e d - u p of a t least o n e o r d e r of m a g n i t u d e is a c h i e v e d .
75
S p e e d - u p is defined a s t h e C P U - t i m e of t h e s c a l a r v e r s i o n d i v i d e d b y t h e C P U -
t i m e of t h e v e c t o r i z e d v e r s i o n ( e v e r y t h i n g else u n c h a n g e d ) . M o d e s t a m o u n t of effort
means 5 % o r less of t h e t i m e r e q u i r e d t o d e v e l o p t h e ( s c a l a r v e r s i o n of t h e )
p r o g r a m . I n d e e d w i t h t o d a y ' s v e c t o r i z e r s it is p o s s i b l e t o s u b m i t a s c a l a r v e r s i o n of
a ( F o r t r a n 7 7 ) p r o g r a m a n d o b t a i n a p r o g r a m t h a t is s u b s t a n t i a l l y v e c t o r i z e d ; for
c e r t a i n v e c t o r i z e r s ( C o n v e x F o r t r a n V e c t o r i z i n g C o m p i l e r ) , it is c l a i m e d t h a t t h e
r e s u l t i n g c o d e a p p r o a c h e s 9 0 % efficiently of h a n d - c o d e d v e c t o r c o d e . M o r e o v e r ,
t h o s e p a r t s t h a t c a n n o t b e v e c t o r i z e d b y t h e s o f t w a r e t o o l c a n b e flagged s o t h a t
t h e p r o g r a m m e r m a y a t t e m p t t o r e s t r u c t u r e t h e c o d e a c c o r d i n g t o well u n d e r s t o o d
rules. T h e r e a r e " c a t a l o g u e s " of t h e s e r u l e s w h i c h c a n b e a p p l i e d w i t h o u t great
difficulty.
T o give a c o n c r e t e e x a m p l e , a 2 D P S P I a l g o r i t h m w a s r u n b a s e d o n that
d e s c r i b e d in [ M A J O 8 6 ] w h e r e t h e v e l o c i t y v a r i e s o n l y in t h e x - d i r e c t i o n , from
4 0 0 0 ft/sec t o 5 8 0 0 ft/sec a t t h e m i d p o i n t a n d t h e n b a c k t o 4 0 0 0 ft/sec ( l i n e a r l y ) .
T h e s y n t h e t i c t i m e s e c t i o n c o n s i s t s of a r o w of l's a t t h e 10th r o w ; t h e size is
5 1 2 x 5 1 2 . T h i s p r o g r a m w a s r u n in t w o v e r s i o n s o n a V A X - 1 1 / 7 8 0 , o n e v e r s i o n
u s i n g t h e V A X a l o n e , w i t h t h e F F T s in s c a l a r m o d e , t h e o t h e r v e r s i o n u s i n g o n e
F P S 100 a s v e c t o r p r o c e s s o r . T h e v e c t o r p r o c e s s o r w a s o n l y u s e d for t h e FFTs
involved in t h e v e c t o r i z e d PSPI version, the remainder of t h a t program was
u n c h a n g e d , i.e., n o t v e c t o r i z e d . T h e I/O w a i t i n g t i m e s a r e i d e n t i c a l for t h e t w o ver-
sions, but the CPU timings are not: the scalar version took approximately
4 2 , 6 7 0 sec (11:51:09.15), whereas the vectorized version took about 2 6 7 0 sec
(0:44:27.38). Consequently, the speed-up obtained by using a library routine that
uses t h e F P S 100 for t h e F F T s o n l y is 16! T h i s c l e a r l y c o n s t i t u t e s a significant p e r -
f o r m a n c e i n c r e a s e a t a r a t h e r m o d e s t i n c r e a s e in c o s t .
4.3 Parallelism
At t h e h a r d w a r e level, p a r a l l e l i s m d e n o t e s t h e p r e s e n c e of s e v e r a l p r o c e s s o r s ,
e a c h w i t h its o w n i n s t r u c t i o n s t r e a m a n d u n d e r its o w n c o n t r o l . E a c h p r o c e s s o r
m a y u s e a s h a r e d m e m o r y ( c o m m o n m e m o r y ) a n d / o r h a v e its o w n p r i v a t e m e m o r y .
Since t h e r e a r e s e v e r a l i n d e p e n d e n t a g e n t s , p r o v i s i o n s m u s t exist for t h e com-
munication between processors. This m a y be achieved through c o m m o n m e m o r y or
b y m e s s a g e p a s s i n g . I n t h e f o r m e r c a s e , t h e s y s t e m is c a l l e d t i g h t l y - c o u p l e d (an
e x a m p l e is t h e C r a y X - M P / 4 w h e r e u p t o f o u r p r o c e s s o r s use t h e s a m e l a r g e m a i n
m e m o r y ) , in t h e l a t t e r c a s e t h e s y s t e m is c a l l e d l o o s e l y - c o u p l e d ( a n e x a m p l e is
76
p r o v i d e d b y t h e I n t e l H y p e r c u b e ) . T h e u n d e r l y i n g i d e a is t o p r o v i d e Ν p r o c e s s o r s
a n d t h e r e b y t o a c h i e v e a s p e e d - u p of N ; t h i s is clearly a l s o t h e t h e o r e t i c a l u p p e r
b o u n d on any speed-up.
In contrast to vector processing where one vector instruction acts on m a n y
d a t a i t e m s , in p a r a l l e l s y s t e m s e a c h p r o c e s s o r e x e c u t e s i n d e p e n d e n t l y . T h e r e f o r e , in
contrast to vector processing, where most of the vectorization is done
a u t o m a t i c a l l y , in o r d e r t o e x p l o i t p a r a l l e l i s m efficiently o n e m u s t specify explicitly
w h i c h p o r t i o n of t h e p r o g r a m is t o b e e x e c u t e d o n w h i c h p r o c e s s o r u s i n g w h i c h
p o r t i o n of t h e d a t a . T h e s o f t w a r e t o o l s (called v e c t o r i z e r s ) t h a t a l l o w t h e u s e r t o
submit scalar code and perform the rewriting necessary to utilize t h e vector
c a p a b i l i t i e s of t h e t a r g e t m a c h i n e d o n o t exist yet for a u t o m a t i c a l l y parallelizing
code. In addition, some questions have been raised as to whether the currently
a v a i l a b l e l o o s e l y c o u p l e d s y s t e m s a r e s u i t a b l e for p r o c e s s i n g seismic d a t a b e c a u s e of
t h e i r l i m i t a t i o n s o n i n t e r p r o c e s s o r c o m m u n i c a t i o n a n d I/O [KAOL87].
I m p l e m e n t a t i o n s o n t h e C r a y X - M P / 4 of m i g r a t i o n a l g o r i t h m s s u c h a s P S P I
[ A M E S 8 7 ] a n d finite difference m e t h o d s [ T E R K 8 7 ] i n d i c a t e t h a t a s p e e d - u p of
3.5 is q u i t e a t t a i n a b l e ; t h i s is c l o s e t o t h e t h e o r e t i c a l u p p e r b o u n d of 4. H o w e v e r ,
f o u r p r o c e s s o r s a r e still m a n a g e a b l e for t h e p r o g r a m m e r s o t h a t t h e c o d e for t h e s e
a p p l i c a t i o n s c a n b e carefully h a n d - c o d e d . F o r m o r e p r o c e s s o r s , w e w o u l d e x p e c t
t h e a c t u a l s p e e d - u p t o b e significantly less t h a n 80 % of t h e t h e o r e t i c a l upper
b o u n d . A l s o u n c l e a r is h o w o n e m i g h t a c h i e v e s i m i l a r r e s u l t s a u t o m a t i c a l l y , i.e.,
with a software tool akin to a vectorizer.
A t t h e p r e s e n t t i m e , l o o s e l y - c o u p l e d s y s t e m s d o n o t a p p e a r c o m p e t i t i v e for
production processing of s e i s m i c d a t a . No software that would automatically
p a r a l l e l i z e u n i p r o c e s s o r c o d e is c o m m e r c i a l l y a v a i l a b l e . T h e l a c k of p a r a l l e l i z e r s is
p a r t i c u l a r l y d a m a g i n g b e c a u s e d e b u g g i n g p a r a l l e l c o d e is significantly h a r d e r t h a n
d e b u g g i n g u n i p r o c e s s o r c o d e . T h e e x i s t i n g p r o c e s s i n g s o f t w a r e , a l m o s t exclusively
written in Fortran (unless a lower-level language is used), is written for
u n i p r o c e s s o r s a n d will n o t b e a l l o w e d t o b e c o m e o b s o l e t e w i t h t h e a r r i v a l of n e w
p r o c e s s i n g h a r d w a r e . F o r t r a n is a p o o r vehicle for p a r a l l e l p r o g r a m m i n g (in c o n -
t r a s t t o v e c t o r i z i n g , for w h i c h it is very well s u i t e d since t h e o n l y d a t a s t r u c t u r e it
s u p p o r t s is t h e a r r a y ) .
P r o p o s a l s h a v e b e e n a d v a n c e d of s y s t e m s t h a t a r e specifically d e s i g n e d for
seismic processing but do not serve any other purpose. For example, it is
t e c h n o l o g i c a l l y feasible t o d e s i g n a n d m a n u f a c t u r e a c h i p for m i g r a t i o n . It is safe t o
e x p e c t t h a t a c h i p c a n b e d e s i g n e d t h a t will b e a t a n y s o f t w a r e i m p l e m e n t a t i o n of
m i g r a t i o n . T h e r e a r e h o w e v e r t w o m a j o r p r o b l e m s w i t h t h i s a p p r o a c h . O n e is
77
o b v i o u s l y c o s t — s i n c e t h e m a r k e t for s u c h a s y s t e m is q u i t e r e s t r i c t e d , t h e d e v e l o p -
m e n t c o s t p e r s o l d u n i t m i g h t b e p r o h i b i t i v e . A l s o , s u c h a s y s t e m w o u l d severely
stifle w o r k on new processing methods, since a chip containing a migration
a l g o r i t h m will r e n d e r u n a t t r a c t i v e w o r k o n i m p r o v e d m i g r a t i o n m e t h o d s . T h e field
is n o t m a t u r e ( s t a g n a n t ? ) e n o u g h t h a t a n y o n e c o m p a n y c o u l d m a k e a d e c i s i o n t o
use o n e p r o c e s s i n g m e t h o d , a n d o n e o n l y , for t h e n e x t d e c a d e o r so.
5. CONCLUSION
H i g h - p e r f o r m a n c e p r o c e s s i n g of s e i s m i c d a t a m u s t c l e a r l y s t a r t w i t h a n efficient
a l g o r i t h m . T h e r e is a h o s t of efficient m e t h o d s t h a t c a n b e t a i l o r e d t o a g i v e n
situation. M o s t applications use vector processing, a n d with very g o o d r e a s o n : at
p r e s e n t , t h i s is t h e single m o s t i m p o r t a n t f a c t o r in t h e p e r f o r m a n c e of a c o m p e t e n t l y
w r i t t e n a p p l i c a t i o n p r o g r a m . H o w e v e r , in r e a l i s t i c i m p l e m e n t a t i o n s , q u e s t i o n s s u c h
a s t h e I/O b e h a v i o r a n d t h e i n h e r e n t p a r a l l e l i s m of a p r o g r a m b e c o m e of c o n c e r n
since t h e y c a n very s e r i o u s l y affect t h e p e r f o r m a n c e of t h e p r o g r a m if t h e y a r e n o t
p r o p e r l y c o n s i d e r e d . A t p r e s e n t , I/O a n a l y s i s a n d d e t e c t i o n of p a r a l l e l i s m m u s t b e
c a r r i e d o u t m a n u a l l y . W e e x p e c t t h a t in t h e n e x t few y e a r s , s o f t w a r e t o o l s will
b e c o m e a v a i l a b l e t h a t assist in t h e s e t a s k s . H o w e v e r , t h e a c t u a l r e s t r u c t u r i n g of t h e
c o d e will r e q u i r e k n o w l e d g e of t h e a p p l i c a t i o n a n d t h e r e f o r e it is h i g h l y u n l i k e l y
t h a t r e s t r u c t u r i n g c a n b e fully a u t o m a t e d , in t h e n e a r of in t h e l o n g - t e r m f u t u r e .
T h e r e f o r e , p r o g r a m m i n g t h e n e w m a c h i n e s will p l a c e a significant b u r d e n o n t h e
p r o g r a m m e r s . T h e r e a s o n w h y v e c t o r i z a t i o n is s u c h a s u c c e s s is t h a t it c a n b e d o n e
s y n t a c t i c a l l y , i.e., w i t h o u t a n y u n d e r s t a n d i n g of t h e u n d e r l y i n g a p p l i c a t i o n . T h i s is
n o t t h e c a s e for t h e r e s t r u c t u r i n g of a p r o g r a m in o r d e r t o i m p r o v e its I/O behavior
or to exploit inherent parallelism.
In particular, there are two major p r o b l e m s associated with parallelism at the
hardware level, one related to hardware, the other related to software. The
h a r d w a r e p r o b l e m is o n e exclusively a s s o c i a t e d w i t h l o o s e l y - c o u p l e d s y s t e m s , w h i l e
t h e s o f t w a r e p r o b l e m is c o m m o n t o b o t h l o o s e l y - a n d t i g h t l y - c o u p l e d s y s t e m s . T h e
hardware problem is that of interprocessor communication; at present the
b a n d w i d t h is s i m p l y t o o s m a l l for realistic s e i s m i c p r o c e s s i n g . W h i l e t h e r e m e d y is
o b v i o u s , it is a l s o c o s t l y a n d m a y s e r i o u s l y affect t h e p r i c e / p e r f o r m a n c e r a t i o of t h e
resulting systems. Nevertheless, i m p r o v e m e n t s here are expected as s o o n as the
manufacturers realize t h a t i n t e r p r o c e s s o r c o m m u n i c a t i o n bandwidth is a major
b o t t l e n e c k . T h i s s h o u l d b e in t h e n e a r f u t u r e ; i n d e e d t h e r e a r e i n d i c a t i o n s t h a t t h e
78
C o n n e c t i o n M a c h i n e h a s a d d r e s s e d t h i s p r o b l e m . T h e s o f t w a r e p r o b l e m is o n e t h a t
cannot be solved that fast. The objective are software tools that parallelize
u n i p r o c e s s o r c o d e a u t o m a t i c a l l y ; t h i s i m p l i e s t h a t it m u s t b e b a s e d o n purely
s y n t a c t i c c o n s i d e r a t i o n s . W h i l e t h i s a p p e a r s feasible, t h e first r e a s o n a b l y efficient
p a r a l l e l i z e r is p r o b a b l y s e v e r a l y e a r s a w a y . U n t i l t h e n , p a r a l l e l i z a t i o n will h a v e t o
b e d o n e b y h a n d , w h i c h is t i m e c o n s u m i n g , n o t least of all b e c a u s e debugging
p a r a l l e l c o d e is a t least o n e o r d e r of m a g n i t u d e h a r d e r t h a n d e b u g g i n g u n i p r o c e s s o r
c o d e . A l s o , t h e l a r g e r t h e n u m b e r of p r o c e s s o r s , t h e m o r e difficult will it b e t o
d e s i g n efficient p a r a l l e l c o d e ; t h i s is a g a i n m o r e in f a v o r of t h e tightly-coupled
s y s t e m s w h i c h t y p i c a l l y h a v e fewer p r o c e s s o r s (four for t h e C r a y X - M P / 4 ; e i g h t for
t h e E T A - 1 0 for t h e t i m e b e i n g ) t h a n of t h e l o o s e l y - c o u p l e d s y s t e m s w h i c h m a y h a v e
u p to 65000 processors.
REFERENCES