Professional Documents
Culture Documents
A Review of Parallel Processing For Statistical Computation: Statistics and Computing (1996) 6, 37 49
A Review of Parallel Processing For Statistical Computation: Statistics and Computing (1996) 6, 37 49
statistical computation
Parallel computers dier from conventional serial computers in that they can, in a variety of ways,
perform more than one operation at a time. Parallel processing, the application of parallel
computers, has been successfully utilized in many ®elds of science and technology. The purpose
of this paper is to review eorts to use parallel processing for statistical computing. We present
some technical background, followed by a review of the literature that relates parallel computing
to statistics. The review material focuses explicitly on statistical methods and applications,
rather than on conventional mathematical techniques. Thus, most of the review material
is drawn from statistics publications. We conclude by discussing the nature of the review
We present a review of the literature pertaining to the explicit computing, including linear algebra, sorting and random
application of parallel processing to statistics. Literature number generation have been implemented on parallel
of this type is scarce, despite many authors (including computers, though not necessarily explicitly for the purpose
Sylwestrowicz, 1982 and HavraÂnek and StratkosÏ, 1989) of statistical computing. Section 3 provides a review of the
commenting on the utility of parallel processing for statis- available literature relating parallel processing to statistics,
tics. The review was conducted in order to locate the together with references to the parallelization of major
available literature in this ®eld and to isolate potential numerical methods. Some details of the literature search
research areas. The elements of statistical computation methods employed in this review are also brie¯y described.
that we consider are limited explicitly to numerical statis- Finally, Section 4 contains some closing comments about
tical algorithms and statistical applications. Aspects such the nature of the review material and the likely prospects
as linear algebra, optimization, and quadrature in the for parallel processing in statistics.
Also outside the scope of the review are such areas as parallel
be the decade of the parallel computer. Certainly, parallel Parallelism is the process of performing tasks concurrently,
computers provide computational power to solve otherwise that is, more than one task per unit time. A parallel computer
intractable problems (Kaufmann and Smarr, 1993). In is a computer that has the ability to exploit parallelism
Section 2 we give a brief introduction to parallel processing, incorporated in its architecture. This architecture usually
including outline descriptions of hardware and software consists of a collection of processing units coupled by an
Many dierent designs of parallel computer exist. The Eddy (1986) and Eddy and Schervish (1991) have made
computing terminology. whereby processors are independent but can access all
A variety of parallel computers exist with radically varying available memory. A single control unit drives the proces-
architecture. Attempts to classify the dierent designs have sing units. SIMD computers typically execute a single
not been completely successful (Hockney and Jesshope, stream of instructions with a number of simple processing
1988) and no universally accepted classi®cation scheme exists. units, each performing the same instructions on its own
In this paper we use Flynn's taxonomy (Flynn, 1972), which data. At a given time, the same instruction is being executed
classi®es computers according to how the machine relates on a collection of processors with each processor manipulat-
its instructions to the data being processed. We choose ing dierent data. All computers in this class therefore have
Flynn's classi®cation because it is adequate for describing synchronous operation, that is, access to shared memory is
parallel algorithms as well as hardware, and it is widely tightly coordinated. This is the most useful model for
used (Freeman and Philips, 1992). Within Flynn's taxon- massively parallel (incorporating many processors) scien-
omy, a stream is a sequence of items (instructions or data) ti®c computing with many engineering and scienti®c tasks
executed or operated on by a processor. There are four falling naturally in this class including image processing,
broad categories, dealing with single and multiple streams particle simulation and ®nite element methods (Lewis
of items: and El-Rewini, 1992).
an operation.
Array processors consist of a set of elementary proces-
SIMD: Single Instruction Multiple Data stream. A
sing units connected by a grid, which is usually square.
computer that has a single stream of instructions that
These machines are well suited to computations
initiate operations on many streams of data.
involving matrix manipulations. Examples include
MISD: Multiple Instruction Single Data stream. This
the AMT Distributed Array Processor and Thinking
category illustrates the problems inherent in interpreting
Machines' Connection Machine. An attached-array is
Flynn's taxonomy. Hwang (1993) puts systolic arrays in
simply a conventional computer with array hardware
this class, while other authors assert that no computers
connected.
fall in this category.
Pipelined vector processors achieve parallelism by two
MIMD: Multiple Instruction Multiple Data stream.
methods. First, arithmetic operations (addition, multi-
A computer with several processing units capable
plication, etc.) are broken down into individual elements
of operating on several data streams. This includes
and computed as a pipeline. Processing data as a pipeline
all forms of multiprocessor and is the most general
can be pictured as performing lower level operations as
form of parallelism.
an assembly line. Second, vector processing units, as
This classi®cation is reasonably well-de®ned although the name suggests, allow arithmetic operations
many modern parallel systems are hybrids of SIMD and between pairs of vector elements fed from vector regis-
MIMD and hence belong not to either class but to both. ters to functional units that use pipelining techniques.
We will describe the SIMD and MIMD models in more The Cray-1 and CDC Cyber 205 are examples of pipe-
The SIMD model is represented pictorially by Fig. 1, that typically consist of a tightly coordinated square
Parallel processing for statistical computation 39
Tsitsiklis (1989).
trol unit; P, processor; S, shared or local store (memory) structed by linking independent computers such as work-
array of processors operating on data accessible from distributed system for a single task. Such systems are
memory around the perimeter of the array. becoming more popular, and have been used for some
processor drives its own control unit and can have its
shared.
There is a broad range of tools for software development
A multiprocessor system consists of a number of powerful
FX/80.
on parallel computing systems. Unfortunately, there is determining processor synchronization points and
include the EPF (Encore Parallel Fortran) compiler 2.4. Assessing performance
the Transputer Development System (Inmos, 1990). Grain size is a qualitative measure of the inherent parallelism
For a detailed description of parallel environments of an algorithm. Grain size refers to the number of instruc-
see Lewis and El-Rewini, or Dongarra and Touran- tions performed in parallel before some kind of processor
cheau (1992). Environments designed to allow porta- synchronization must occur. The parallelism in a given
ble and network parallel programming include PVM algorithm may be ®ne-grain, medium-grain or coarse-grain,
(Parallel Virtual Machine) and LINDA (Carriero with ®ne-grain representing many synchronization points
and Gelernter, 1989). PVM (Geist et al., 1993, 1994) and coarse-grain representing very few. Broadly speaking,
is a widely used and freely available tool for construct- simple scalar ± vector operations are ®ne-grain, vector ±
ing parallel distributed systems. matrix operations are medium-grain and matrix ± matrix
Parallel debugging aids are essential since methods of operations are coarse-grain. Freeman and Philips (1992)
debugging a sequential program are inadequate for note that SIMD machines are generally suited to ®ne-
parallel code. In addition to the usual debugging and medium-grain parallelism and MIMD machines are
facilities, a parallel debugger must provide tools for more suitable for coarse-grain parallelism.
Parallel processing for statistical computation 41
1
Sp
Micheaux concludes that one way to make high-speed
To achieve ecient performance on a parallel computer, Thisted (1988) brie¯y mentioned parallel computers as
an algorithm must be designed to suit the particular archi- being an ideal method of implementing Jacobi methods
tecture of the computer. It is in the area of algorithm for extracting eigenvalues. Additionally, in his foreword,
design that the distinction between SIMD and MIMD is Thisted commented that his second volume would con-
most acute. In general, designing ecient algorithms for tain statistical algorithms for parallel computers.
42 Adams, Kirby, Harris and Clegg
from the annual Symposium on the Interface between Schervish (1988) considered a variety of parallel appli-
computing science and statistics. Organized annually by cations Ð including discrete Ð ®nite inference, a computer-
the Interface Foundation of North America and others, intensive approach for the analysis of discrete dataÐwhere
these proceedings regularly carry a variety of papers on the dominant aspect of the computation is simple summation
parallel processing and statistics. Material published of large sets of data. These applications allow elements of the
here relates to both general parallel processing and the computation to be divided and processed independently. The
application of parallel processing to statistics. parallel computer investigated, ES86, was a distributed sys-
This review is principally based on an initial search of tem constructed from a network of VAX computers. Paral-
two on-line databases, MATHSCI and BIDS (Bath Infor- lel discrete-®nite inference programs achieved near-linear
mation and Data Service Ð an on-line version of the ISI speed-up on the ES86 system. Improved accuracy was also
citation index). Both of these databases provide access reported for the parallel algorithm.
to journals and proceedings. We supplemented computer Skvoretz et al. (1992) employed an NCUBE/10 hypercube
searches with conventional manual search techniques. multiprocessor (MIMD) to assess the application of parallel
It is rather dicult to describe the ®eld of parallel statistical processing to typical large-scale social science research, using
computation systematically. This diculty stems from the census data for subsample selection and cross-tabulation.
fact that many statistical algorithms are constructed from They found that a crucial consideration for good parallel per-
`building-block' numerical algorithms. There is an immense formance is keeping the processors busy while disk access
amount of literature relating to numerical methods on occurs, and achieved this by distributing data to the proces-
parallel machines but we wish to focus on areas that are sors in small packets, rather than en masse (the term packet
either explicitly statistical or numerical methods investigated describes the amount of data being transmitted at a given
in the name of statistics. The review therefore proceeds in interaction). Good performance was reported. In comparison
the following manner. In each section we consider statistical with performance of the software package SPSS-X on an
computing loosely classi®ed in the general manner of the IBM 3081, the parallel method was always faster for large
chapter headings of Thisted (1988). Subsections are orga- (> 80 000 cases) data sets.
methods, and parallelization of methods speci®cally in the Much material has been published on parallel numerical
name of statistics. In the former group, we present only out- linear algebra, for example Ortega et al. (1990) give a biblio-
line details because of the amount of such material. The latter graphy relating to this subject containing 2000 entries! For
group, explicitly statistical in nature, is given more attention. excellent descriptions of parallel linear algebra see either
For all of the review material we focus attention on the Freeman and Philips (1992), Golub and Ortega (1993),
reported bene®ts and drawbacks of a parallel processing Dongarra et al. (1991) or Modi (1988). Many numerical
A problem with methods developed for particular parallel variety of parallel machines. These include both iterative
computers is a lack of portability. Thus, published parallel and direct methods and approaches for sparse matrices.
algorithms are unlikely to be readily ported to machines Stewart (1988) gave an overview of how statistical linear
with dierent architectures. A current approach is to publish algebra computations may be mapped onto dierent parallel
algorithmic `skeletons'. The following subsections review architectures, stressing the importance of the architecture and
parallel work relating to: the central role of linear algebra in statistical computing. Of
Parallel computers provide an ideal platform for applying 3.2.1. Linear regression
simple techniques to large data sets. Any statistical tech- HavraÂnek and StratkosÏ (1989) extended the work of Stewart
nique that allows data to be broken down and processed into a practical situation, focusing on multiple linear
Parallel processing for statistical computation 43
regression, using a host computer/attached-array processor The vectorized sorting component gave no speed-up.
(SIMD). It is noted that iterative algorithms often give Kapenga and McKean suggest that using vectorized
better parallel performance than elimination methods methods makes robust analysis of this type feasible for
(StratkosÏ , 1987). Havra nek and StratkosÏ consider various most problems of moderate size.
parallel approaches to Cholesky factorization. Interestingly, Kaufman et al. (1988) considered the application of
models including the biggest numbers of parameters, applied parallelism to resampling methods. Illustrative examples
to the largest test data set, yielded intractable computations, based on parallel algorithms for cluster analysis (see Section
due to restricted memory. Quoted speed-up results were 3.3.1) and robust regression demonstrated the applicability of
rather promising. The authors showed that their algorithm's parallelism to resampling methods. The computer investi-
performance does not depend critically on the size of data set. gated was the 1CAP 1 IBM research machine (MIMD),
Xu et al. (1989) considered multiple linear regression consisting of a host processor and 10 connected array pro-
models on an Intel iPSC/2 using an SPMD type decomposi- cessors. Attention focused on the least-median of squares
tion, for both improved performance and resistance to in robust regression, with consideration of dierent paral-
processor failure. They also considered the importance lelization strategies for a sequential program. The chosen
of packet size during parallel communication. Xu et al. method involved an SPMD type approach. Kaufman et
report good speed-up, this being of the order of 14 for al. noted that inter-processor communication is often a
16 processors. Amdahl's law manifested itself here, as critical factor and some form of load balancing is often
Kleijnen (1990) employed a Cyber 205 supercomputer Xu and Shiue (1993) developed three SPMD-type parallel
(SIMD) for a Monte Carlo experiment to compare the algorithms for least median of squares (LMS) regression
performance of Rao's test for validity of a regression (Rousseeuw, 1984), using an Intel iPSC/2 MIMD computer.
model with a cross-validation approach in multivariate Parallelization strategies were deduced from consideration of
regression. The parallel code performed well for ordinary the nested loop structure of sequential implementations. Xu
least squares estimates (OLS), and for general least and Shiue made extensive use of speed-up models to assess
squares (GLS) estimates based on large samples. How- dierent formulations. Load-balancing strategies were
ever, GLS estimates obtained from small samples actually used to improve algorithms where appropriate. An exact
resulted in speed-down, that is slower absolute time using algorithm and an approximate algorithm yielded near linear
parallelism than a single processor. speed-up, while a parallelized version of a fast sequential
Kleijnen and Annink (1992) gave a detailed description algorithm resulted in catastrophic speed-down due to severe
of regression analysis. They found that OLS estimates yielded Hawkins et al. (1994) investigated various con®gurations
good parallel performance due to eective vectorization. for the distributed computation of exact LMS regression.
However, the matrix inversion in their GLS algorithm Distributed computation (Section 2.2) in this case means
did not bene®t from vectorization and speed-up was partitioning a sequential program onto a group of processors,
diminished relative to the OLS method. which may be independent computers, and gathering results
As part of an investigation into parallel processing for on completion. Here, the processors need not be dedicated
social science research (see Section 3.1) Skvoretz et al. to a single task. Within Flynn's taxonomy this is a distributed
(1992) experimented with large regression models. The memory MIMD computer. A list of tasks is presented to
parallel component of the computation involved computing partition the problem into a distributed solution. The
the covariance matrix in an SPMD manner. Experiments code gave good speed-up (eciency is quoted), particularly
involved computing regression models using dierent for large sample sizes, on a variety of systems, including a
numbers of processors and reading varying amounts of 22-processor heterogeneous system that required extra
data from disk. They found that the latter consideration coding because of the added complexity of dierent vendors'
Kapenga and McKean (1987) considered vectorized algo- Mitchell and Beauchamp (1986) developed a Bayesian
rithms for the robust R-estimates of Jaeckel (1972). In method of subset selection in linear regression. A branch
particular, their purpose was to assess the vectorization and bound algorithm was implemented on a Cray X-MP
of the k-step algorithm of McKean and Hettmansperger supercomputer (SIMD). Computing large regression models
(1978). Experiments performed on a FPS-264 array pro- with many predictors motivated use of the supercomputer.
cessor (SIMD) are reported as achieving considerable With 25 predictor variables, reported as the largest test
speed-up at the highest levels of parallel optimization. case, the algorithm required one second of cpu time on
The vectorized algorithm had three major contributors the Cray. Sequential timings are not reported. Like the
a QR decomposition, a projection and a sorting stage. work of Mitchell and Morris (1988) (section 3.5) this
44 Adams, Kirby, Harris and Clegg
application of parallel processing appears to be purely cluster analysis on the 1CAP 1 research machine (see Section
functional, rather than an investigation of parallelism. 3.2.2). A modi®cation of a program called CLARA for the
Wollan (1988) addressed all-subsets regression on an k-medoid method for clustering (Kaufman and Rousseeuw,
Intel iPSC hypercube (MIMD), with the purpose of assessing 1986) is described. A sequential Fortran version of the
the usefulness of parallel processing to statistics. The most program is ported to the 1CAP. Two parallelization stra-
notable feature of the parallel algorithm was the computation tegies based on a master ± slave approach are described,
of every regression model. The algorithm performed well, both of which have minimal communication overhead.
achieving near-linear speed-up. However, in comparison Neither strategy heavily utilized the host processor. Good
with the Furnival ± Wilson branch and bound algorithm performance is reported.
results although collinearity checking was less rigorous. de Doncker et al. (1989) reported the use of a Sun-3 network
Eddy et al. (1992) described an extensive linear algebra of the algorithm resistant to outliers. An SPMD data par-
calculation performed as part of the US Census Post titioning algorithm was constructed. de Doncker et al.
Enumeration Survey, implemented on a variety of parallel observed that for observation matrices of moderate size,
platforms in order to assess the costs and bene®ts of fast most of the algorithm's processing time is concentrated
computing environments. The analysis involved using a on computation, with little time expended on inter-processor
CM-2 (SIMD) and distributed systems (MIMD) con- Healey and Davies (1983) reported the application of an
structed from DEC UNIX workstations running under ICL DAP (Distributed Array Processor) to large-scale
LINDA (see Section 2.3). Eddy et al. described diculties Poisson regression models (McCullagh and Nelder,
porting the original SAS/IML (interactive matrix language) 1983). The parallelization of the serial iterative scaling
code to parallel versions of Fortran and C on the dierent algorithm was described. The results demonstrated that
computing platforms. Porting to the Cray was a relatively the parallel algorithm's performance improved relative
easy task. The Connection Machine port proved more to the amount of data used by the model.
dicult with programs requiring substantial modi®cation. HeÂna and Norman (1987) described the design and
Various parallel strategies were investigated for the dis- implementation of an algorithm for solving large non-linear
tributed algorithm, constructed under LINDA, none of econometric models on vector processors. A reduced
which resulted in eective speed-up. The Cray yielded Newton algorithm is vectorized with particular attention
the fastest results for the computation. given to sparse matrix operations. The method was
Many non-linear optimization techniques have been results were reported for both a CYBER-205 and a
implemented in parallel. Lootsma and Ragsdell (1988), Cray X/MP. Dierent properties were observed in
Lootsma (1989) and Schnabel (1988) give detailed surveys detailed analysis of the performance on the two computers.
of parallel non-linear optimization algorithms. Zenios (1989) Promising results were reported.
3.3.1. Clustering
The subject of time series applications is addressed to some
Raphalen (1982) considered the use of a SIMD machine extent in the literature describing the application of parallel
was con®ned only to the ®rst stage, computing a distance In an early paper, Fahrmeir (1977) considered using
matrix, based on Euclidean distance. Two algorithms were parallel computers for estimating stochastic parameters
considered for this task, their application being based on of time series models. Regression, distributed-lag and
the size of the data set and number of variables. The ARMA models were considered. A Bayesian adaptive-
implementation machine was not described. It was noted ®ltering approach was adopted because it requires little
that whilst near linear speed-up is theoretically possible, knowledge of the dispersion properties of the error variables.
communication overheads induced by the interconnection The performance of the parallel algorithm is not quanti®ed
network can reduce the achieved eciency signi®cantly. although it appears to perform well.
Kaufman et al. (1988) investigated partitioning methods in Schervish (1988) brie¯y described the parallelization of
Parallel processing for statistical computation 45
a time series model, described in detail in Schervish and integrated in dierent directions in parallel. On a sequent-
Tsay (1988). Bayesian models were constructed for auto- symmetry multiprocessor machine, linear speed-up was
regressive processes that allow for changes in the model. achieved. It is noted that this performance remains intact
Their method for dealing with outliers required the estima- for higher dimensional functions.
Schervish (1988) brie¯y described the application of the ES86 of a large component of integer processing. This includes
distributed VAX system (see Section 3.1) to the analysis areas such as sorting and computing random numbers. A
of large hierarchical models for household crime data. detailed discussion of sorting algorithms for a variety of
The algorithm involved the repeated evaluation of four- parallel computers is given in Akl (1985). Computing random
dimensional integrals, giving a speed-up of 6 on a system numbers in parallel is a large research area and we refer the
consisting of 11 processors. Schervish notes that improving reader to Anderson (1990) for a review of common methods.
this speed-up would require extensive reprogramming. Mitchell and Morris (1988) developed a Bayesian
Schervish also reported the use of the ES86 parallel approach to the design and analysis of computational
system for a hierarchical model for analysing data from a experiments. In this case, a computational experiment
large prison inmate survey. The model required that nearly uses a computer to model a physical system, with a design
10 000 integrals be evaluated. The parallel algorithm to dictate the input to the program. A Bayesian approach
involved splitting the computation into small pieces and was used for predicting the outcome of the experiment,
distributing the pieces to each VAX processor. A network which led to specialized design procedures. Experiments
of 10 VAX stations yielded a speed-up of 8. were conducted on a Cray X-MP (SIMD), with each
Numerical integration and related topics do not appear to IBM 3090-600E (MIMD) computer. Interest focused on
have received the same amount of attention as the areas testing the equality of k covariance matrices. Parallelism
described above. The subject is introduced in both Golub was exploited by making each processor perform an equal
and Ortega (1993) and Freeman and Philips (1992). Gladwell fraction of the desired number of permutations (again an
(1987) considers vectorized forms of certain one-dimensional SPMD type of approach). Near linear speed-up was
a parallel multivariate numerical integration algorithm Many writers, including Stewart (1988), Stine and
suitable for certain MIMD computers. Some performance Woteki (1989) and Kaufman et al. (1988) suggest that
results are given for MIMD adaptive quadrature codes in the bootstrap (Efron and Tibshirani, 1993) is an excellent
de Doncker and Vakalis (1993). Golub and Ortega observe candidate for parallel implementation. Xu and Shiue
that for many integrals the power of parallel processing is (1991) described two examples of parallelizing bootstrap
unnecessary. However, in certain statistical areas where con®dence intervals on an Intel iPSC/2 hypercube
high-dimensional integrals occur, such as Bayesian methods, (MIMD). The parallel algorithm divides the bootstrap
the increased processing power of parallel computers may be sampling equally among the nodes. Each node computes
Sylwestrowicz (1982) gave examples of Monte Carlo are then applied across the nodes to generate the required
methods in statistics, implemented on an ICL DAP percentiles for the con®dence intervals. Speed-up depended
(SIMD). An ecient method of pseudorandom number on which type of con®dence interval was constructed. For
generation for a SIMD machine was given. A performance both algorithms the dominant computations were sorting
model for evaluation of a simple one-dimensional integral and searching. Xu and Shiue discussed a speed-up model
using the parallel Monte Carlo approach suggests excellent that gives an upper bound for expected speed-up.
size may be inappropriate for small problems. This section brie¯y introduces material not suitable for
O'Sullivan and Pawitan (1993) brie¯y described a parallel inclusion above. Some of these areas cover the more
approach to multidimensional density estimation by tomo- obscure links between computer science and statistics.
graphy. The approach involved ®ltered backprojection and Freisleben (1993) describes parallel neural network
is readily parallelized. One-dimensional functions were algorithms for extracting principal components directly
46 Adams, Kirby, Harris and Clegg
from data. BaÈck and Homeister (1994), discuss the use With the exception of SAS, no statistics packages
of parallel processing in their introduction to genetic have been implemented in optimized form on parallel
Carlo methods are discussed in Malfait et al. (1993). The Parallel methods may not be appropriate for interactive
successful use of SIMD hardware for image processing is computing, thereby making them ineective for rou-
described by Grenander and Miller (1994). tine analysis. In particular, some MIMD systems are
Novelty of parallel computers. Parallel computers are ted in an ecient manner on the particular parallel
not widely available, nor have they achieved wide- computer being used. The task of developing algo-
spread commercial success. Indeed, many hardware rithms into the appropriate form can be complicated
and software issues in parallel processing remain active and time consuming.
research topics.
Modern sequential computers provide sucient power to For a statistical application to justify a parallel solution,
drive standard packages for most statistical problems. it must require facilities not available on conventional com-
Statistical computing can generally be accommodated puters. Typically the requirement is speed, although it could
adequately by conventional computers. Even if a job also be large amounts of memory or disk space. Currently
takes a long time, it is easy for the statistician to leave the use of parallel computers for statistical applications
a computer running overnight or over a weekend. has a potentially long software development time. Hence,
The current absence of standard packages. Most sta- the statistician should think carefully about the return
tistical computing is based on statistical packages. from parallel computers before using them. In particular
Parallel processing for statistical computation 47
the cost of development on a parallel computer must be Cray-1 Computer Systems (1981) Fortran (CFT) reference manual.
balanced against the performance bene®ts likely to ensue. Publication No. SR-0009, Rev. H.
networked systems oers the most immediate prospect projection pursuit. In K. Berk and L. Malone (eds) . Computer
of parallel processing hardware to statisticians. Specialist Science and Statistics. Proceedings of the 21st Symposium on
parallel computers will undoubtedly become more widely the Interface, pp. 308 ± 13. American Statistical Association.
Amsterdam.
We are grateful to the anonymous referees for their useful Supercomputer, 7(2), 72 ± 80.
comments which lead to a much improved paper. Earlier Durst, M. J. (1987) Library software in the supercomputing
Akl, S. (1985) Parallel Sorting Algorithms. Academic Press, New on a network of Vaxes. In T. M. Boardman and I. M.
Al-Jumeily, D. M., Clegg, D. B., Pountney, D. C. and Harris, P. of the 18th Symposium on the Interface, pp. 30 ± 6. American
ory Computers. No. CMS 5, School of Computing and Math- Eddy, W. F., Meyer, M. M., Mockus, A., Schervish, M. J., Tan,
ematical Sciences, Liverpool John Moores University. K. and Viele, K. (1992) Smoothing census adjustment factors:
Anderson, S. L. (1990) Random number generators on vector an application of high performance computing. In H. J.
computers and other advanced architectures. SIAM Review, Newton (ed.), Computing Science and Statistics, Proceedings
32(2), 221 ± 51. of the 24th Symposium on the Interface, pp. 503 ± 10. American
strategies. Statistics and Computing, 4, 51 ± 63. Eddy, W. F. and Schervish, M. J. (1991) Parallel computing Ð a
Bailey, D. H. (1991) Twelve ways to fool the masses when giving tutorial for statisticians. In E. M. Keramidas (ed.), Computing
performance results on parallel computers. Supercomputer, Science and Statistics, Proceedings of the 23rd Symposium
Computation, Prentice-Hall, Englewood Clis, NJ. Efron, B. and Tibshirani, R. J. (1993) An Introduction to the Boot-
Brophy, J. F., Gentle, J. E., Li, J. and Smith, P. W. (1989) Soft- strap. Chapman and Hall, London.
ware for advanced architecture computers. In K. Berk and Encore (1988) Encore Parallel Fortran, Ref. No. 724 ± 06785,
L. Malone (eds), Computer Science and Statistics, Proceedings Encore Computer Corporation, Fort Lauderdale, FL.
of the 21st Symposium on the Interface, pp. 116±20. American Fahrmeir, L. (1977) Parallel estimation algorithms for stochastic
Carriero, N. and Gelernter, D. (1989) LINDA in context. Commu- Computers Ð Parallel Mathematics, pp. 99 ± 102. North-
Chambers, J. M. (1977) Computational Methods for Data Analysis. Flynn, M. J. (1972) Some computer organisations and their
Freeman, T. L. and Philips, C. (1992) Parallel Numerical Algorithms. Jaeckel, L. A. (1972) Estimating regression coecients by minimis-
Prentice-Hall, Englewood Clis, NJ. ing the dispersion of the residuals. Annals of Mathematical
Freisleben, B. (1993) Parallel learning algorithms for principal Statistics, 43, 1449±58.
component extraction. In Proceedings of the 3rd International Kapenga, J. A. and McKean, J. W. (1987) The vectorisation of
Conference on Arti®cial Neural Networks, 372, 267±71. algorithms for R-estimates in linear regression. In R. M.
Furnival, G. M. and Wilson, R. W., Jr. (1974) Regression by leaps Heiberger (ed.), Computer Science and Statistics, Proceed-
and bounds. Technometrics, 16, 299 ± 511. ings of the 19th Symposium on the Interface, pp. 502 ± 5.
Geist, A., Beguelin, A., Dongarra, J., Weichang, J., Manchek, R. American Statistical Association.
.
and Sunderam, V. (1993) PVM 3 0 User's Guide and Reference Kaufman, L. and Rousseeuw, P. J. (1986) Clustering large data sets.
Manual. Tech. Rept. ORNL/TM-12187, Oak Ridge National In E. Gelsema and L. Kanal (eds), Pattern Recognition in Prac-
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R. and Kaufman, L., Hopke, P. K. and Rousseeuw, P. J. (1988) Using a
Sunderam, V. (1994) PVM: Parallel Virtual Machine Ð A parallel computer system for statistical resampling methods.
Users' Guide and Tutorial for Networked Parallel Computing. Computational Statistics Quarterly, 2, 129 ± 41.
MIT Press, Cambridge, MA. (also available online http:// Kaufmann, W. J. and Smarr, L. L. (1993) Supercomputing and the
Gladwell, I. (1987) Vectorisation of one dimensional quadrature Kleijnen, J. P. C. (1990) Supercomputers for Monte Carlo Simula-
codes. In G. Fairweather and P. M. Keast (eds), Numerical tion: Cross-validation versus Rao's test in multivariate analysis.
Integration. Recent Developments, Software and Applications, In K. H. Jockes, G. Rothe and W. Sendler (eds), Bootstrapping
NATO ASI Series C203, pp. 230 ± 8. and Related Techniques, pp. 233±45. Springer-Verlag, Berlin.
Golub, G. and Ortega, J. M. (1993) Scienti®c Computing an Kleijnen, J. P. and Annink, B. (1992) Vector computers, Monte
Introduction with Parallel Computing. Academic Press, Carlo simulation and regression analysis: an introduction.
Gonzalez, C., Chen, J. and Sarma, J. (1988) A tool to generate Lafaye de Micheaux, D. (1984) Parallelization of algorithms in
FORTRAN parallel code for the Intel IPSC/2 Hypercube. the practice of statistical data. In T. Havra nek, Z. Sidak
In E. J, Wegman, D. T. Gantz and J. J. Miller (eds). Computer and M. Novak (eds), COMPSTAT '84 Ð Proceedings in
Science and Statistics. Proceedings of the 20th Symposium on the Computational Statistics, pp. 293 ± 300. Vienna.
Interface, pp. 214±9. American Statistical Association. Lewis, T. G. and El-Rewini, H. (1992) Introduction to Parallel
Grenander, U. and Miller, M. I. (1994) Representation of know- Processing, Prentice-Hall, Englewood Clis, NJ.
ledge in Complex Systems. Journal of the Royal Statistical Lootsma, F. A. (1989) Parallel Non Linear Optimisation. No. 89 ±
Society, Series B, 54(4), 549 ± 603. 45 Faculty of Tech. Math. and Informatics, Delft University
with parallel processing of linear models. Bulletin of the Lootsma, F. A. and Ragsdell, K. M. (1988) State-of-the-art in
International Statistical Institute, 53, 105 ± 17. parallel nonlinear optimisation. Parallel Computing, 6,
Distributing a computationally intensive estimator: the Malfait, M., Roose, D. and Vandermeulen, D. (1993) A convergence
case of exact LMS regression. Computational Statistics, 9, measure and some parallel aspects of Markov chain Monte
Healey, A. R. and Davies, S. T. (1983) Statistical model ®tting Stochastic Methods in Image and Signal Processing, Proc.
J. Joubert and U. Schendel (eds), Parallel Computing '83 McCullagh, P. and Nelder, J. A. (1983) Generalised Linear Models.
HeÂna, P. J. and Norman, A. L. (1987) Solving nonlinear econo- McKean, J. W. and Hettmansperger, T. P. (1978) A robust analysis
metric models using vector processors. In T. M. Boardman of the general linear model based on one step R-estimates.
and I. M. Stefanski (eds), Computer Science and Statistics, Biometrika, 65, 571±9.
Proceedings of the 18th Symposium on the Interface, pp. Mitchell, T. J. and Beauchamp, J. J. (1986) Algorithms for Bayesian
348 ± 51. American Statistical Association. variable selection in regression. In T. M. Boardman (ed.), Com-
Hockney, R. W. and Jesshope, C. R. (1988) Parallel Computers 2. puter Science and Statistics, Proceedings of the 18th Symposium
Adam Hilger, Bristol. on the Interface, pp. 181±2. American Statistical Association.
Huber, P. J. (1985) Projection pursuit. Annals of Statistics, 13, Mitchell, T. J. and Morris, M. D. (1988) A Bayesian approach to
Hwang, K. (1993) Advanced Computer Architecture: Parallelism. E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer
Scalability, Programmability. McGraw-Hill, New York. Science and Statistics. Proceedings of the 20th Symposium on
Ihnen, L. (1989) Vectorisation of the SAS(R) System. In K. Berk the Interface, pp. 49 ± 51. American Statistical Association.
and L. Malone (eds), Computer Science and Statistics. Pro- Modi, J. J. (1988). Parallel Algorithms for Matrix Computations.
ceedings of the 21st Symposium on the Interface, pp. 121 ± 7. Clarendon Press, Oxford.
Inmos (1990) Transputer Development System (2nd edn.). Prentice- estimation by tomography. Journal of the Royal Statistical
Ortega, J. M., Voigt, R. G. and Romine, C. H. (1990) A biblio- Computer Science and Statistics, Proceedings of the 18th
graphy on parallel and vector numerical algorithms. In K. Symposium on the Interface, pp. 11 ± 14. American Statistical
for Matrix Computations, pp. 125 ± 97. SIAM, Philadelphia. Stewart, G. W. (1988) Parallel linear algebra in statistical compu-
Ostrouchov, G. (1987) Parallel computing on a hypercube: an tations. In D. Edwards and N. E. Raun (eds), COMPSTAT
overview of the architecture and some applications. In R. M. '88, Proceedings in Computational Statistics, pp. 3±14. Phy-
of the 19th Symposium on the Interface, pp. 27±32. American Stine, R. A. and Woteki, T. H. (1989) A graphical programming
Perrott, R. H. (1987) Parallel Programming. Addison-Wesley, In ASA Proceedings of the Statistical Computing Section, pp.
Quinn, M. J. (1987) Designing Ecient Algorithms for Parallel StratkosÏ, Z. (1987) Eectivity and optimizing algorithms and pro-
Computers. McGraw-Hill, New York. grams on the host-computer/array processor systems. Paral-
Raphalen, M. (1982) Applying parallel processing to data analysis: lel Computing 4, 197 ± 207.
computing a distance's matrix on an SIMD machine. In Sylwestrowicz, J. D. (1982) Parallel processing in statistics. In
H. Caussinus, P. Ettinger and R. Tomassone (eds), H. Caussinus, P. Ettinger and R. Tomassone (eds),
COMPSTAT '82 Ð Proceedings in Computational Statistics, COMPSTAT '82 Ð Proceedings in Computational Statistics,
Rousseeuw, P. J. (1984) Least median of squares regression. Journal Thisted, R. A. (1988) Elements of Statistical Computing. Chapman
Schervish, M. J. (1988) Applications of parallel computation to Wilson, G. V. (1993) A glossary of parallel computing
statistical inference. Journal of the American Statistical terminology. IEEE Parallel and Distributed Terminology,
Schervish, M. J. and Tsay, R. S. (1988) Bayesian modelling and Wollan, P. (1988) All-subsets regression on a hypercube computer.
forecasting in large scale time series. In J. C. Spall (ed.), Baye- In E. J. Wegman, D. T. Gantz and J. J. Miller (eds), Computer
sian Analysis of Time Series and Dynamic Models, pp. 23 ± 52. Science and Statistics. Proceedings of the 20th Symposium on the
Marcel Dekker, New York. Interface, pp. 224±7. American Statistical Association.
Schnabel, R. B. (1988) Sequential and parallel methods for uncon- Xu, C. W. and Shiue, W. K. (1991) Parallel bootstrap and inference
strained optimization. Tech. Rept. CU-CS-414-88, Dept. of for means. Computational Statistics Quarterly, 3, 233±9.
Comput. Sci., University of Colorado at Boulder, CO. Xu, C. W. and Shiue, W. K. (1993) Parallel algorithms for least
Schork, N. J. and Hardwick, J. (1990) Supercomputer-intensive median of squares regression. Computational Statistics and
multivariable randomization tests. In C. Page and R. LePage Data Analysis, 16, 349 ± 62.
(eds), Computing Science and Statistics, Proceedings of the 22nd Xu, M., Miller, J. J. and Wegman, E. J. (1989) Parallelizing
Symposium on the Interface, pp. 509±13. Springer-Verlag, New mutiple linear regression for speed and redundancy: an
Skvoretz, J., Smith, S. A. and Baldwin, C. (1992) Parallel processing Science and Statistics. Proceedings of the 21st Symposium on
applications for data analysis in the social sciences. Concur- the Interface, pp. 138 ± 44. American Statistical Association.
rency: Practice and Experience, 4(3), 207±21. Zenios, S. A. (1989) Parallel numerical optimization: current
Stewart, G. W. (1986) Communication in parallel algorithms: an status and an annotated bibliography. Operational Research
example. In T. M. Boardman and I. M. Stefanski (eds), Society of America Journal of Computing, 1, 20 ± 43.