An Introduction To Parallel Programming Second Edition Peter S Pacheco Full Chapter PDF

An Introduction to Parallel
Programming. Second Edition Peter S.

Pacheco
Visit to download the full and correct content document:
https://ebookmass.com/product/an-introduction-to-parallel-programming-second-editi
on-peter-s-pacheco/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
An Introduction to Parallel Programming 2. Edition

Pacheco
https://ebookmass.com/product/an-introduction-to-parallel-
programming-2-edition-pacheco/
Fishes: An Introduction to Ichthyology Peter B. Moyle
https://ebookmass.com/product/fishes-an-introduction-to-
ichthyology-peter-b-moyle/
Parallel programming: concepts and practice González-

Domínguez
https://ebookmass.com/product/parallel-programming-concepts-and-
practice-gonzalez-dominguez/
An Introduction to Programming through C++ Abhiram G.

Ranade
https://ebookmass.com/product/an-introduction-to-programming-
through-c-abhiram-g-ranade/
An Introduction to Policing 9th Edition John S. Dempsey
https://ebookmass.com/product/an-introduction-to-policing-9th-
edition-john-s-dempsey/
The Tangled Bank: An Introduction to Evolution Second

Edition – Ebook PDF Version
https://ebookmass.com/product/the-tangled-bank-an-introduction-
to-evolution-second-edition-ebook-pdf-version/
The Stacked Deck: An Introduction to Social Inequality

Second Edition Jennifer Ball
https://ebookmass.com/product/the-stacked-deck-an-introduction-
to-social-inequality-second-edition-jennifer-ball/
Introduction to global studies Second Edition John

Mccormick
https://ebookmass.com/product/introduction-to-global-studies-
second-edition-john-mccormick/
An Introduction to Redox Polymers for Energy-Storage

Applications Ulrich S. Schubert
https://ebookmass.com/product/an-introduction-to-redox-polymers-
for-energy-storage-applications-ulrich-s-schubert/
An Introduction to Parallel
Programming
SECOND EDITION
Peter S. Pacheco
University of San Francisco
Matthew Malensek
University of San Francisco
Table of Contents
Cover image
Title page
Copyright
Dedication
Preface
Chapter 1: Why parallel computing
1.1. Why we need ever-increasing performance
1.2. Why we're building parallel systems
1.3. Why we need to write parallel programs
1.4. How do we write parallel programs?
1.5. What we'll be doing
1.6. Concurrent, parallel, distributed
1.7. The rest of the book

1.8. A word of warning
1.9. Typographical conventions
1.10. Summary
1.11. Exercises
Bibliography
Chapter 2: Parallel hardware and parallel software
2.1. Some background
2.2. Modifications to the von Neumann model
2.3. Parallel hardware
2.4. Parallel software
2.5. Input and output
2.6. Performance
2.7. Parallel program design
2.8. Writing and running parallel programs
2.9. Assumptions
2.10. Summary
2.11. Exercises
Bibliography
Chapter 3: Distributed memory programming with MPI

3.1. Getting started
3.2. The trapezoidal rule in MPI
3.3. Dealing with I/O
3.4. Collective communication
3.5. MPI-derived datatypes
3.6. Performance evaluation of MPI programs
3.7. A parallel sorting algorithm
3.8. Summary
3.9. Exercises
3.10. Programming assignments
Bibliography
Chapter 4: Shared-memory programming with Pthreads
4.1. Processes, threads, and Pthreads
4.2. Hello, world
4.3. Matrix-vector multiplication
4.4. Critical sections
4.5. Busy-waiting
4.6. Mutexes
4.7. Producer–consumer synchronization and semaphores

4.8. Barriers and condition variables
4.9. Read-write locks
4.10. Caches, cache-coherence, and false sharing
4.11. Thread-safety
4.12. Summary
4.13. Exercises
Bibliography
Chapter 5: Shared-memory programming with OpenMP
5.1. Getting started
5.2. The trapezoidal rule
5.3. Scope of variables
5.4. The reduction clause
5.5. The parallel for directive
5.6. More about loops in OpenMP: sorting
5.7. Scheduling loops
5.8. Producers and consumers
5.9. Caches, cache coherence, and false sharing
5.10. Tasking
5.11. Thread-safety
5.12. Summary
5.13. Exercises
Bibliography
Chapter 6: GPU programming with CUDA
6.1. GPUs and GPGPU
6.2. GPU architectures
6.3. Heterogeneous computing
6.4. CUDA hello
6.5. A closer look
6.6. Threads, blocks, and grids
6.7. Nvidia compute capabilities and device architectures
6.8. Vector addition
6.9. Returning results from CUDA kernels
6.10. CUDA trapezoidal rule I
6.11. CUDA trapezoidal rule II: improving performance
6.12. Implementation of trapezoidal rule with warpSize thread

blocks
6.13. CUDA trapezoidal rule III: blocks with more than one warp
6.14. Bitonic sort
6.15. Summary
6.16. Exercises
Bibliography
Chapter 7: Parallel program development
7.1. Two n-body solvers
7.2. Sample sort
7.3. A word of caution
7.4. Which API?
7.5. Summary
7.6. Exercises
Bibliography
Chapter 8: Where to go from here
Bibliography
Bibliography
Bibliography
Index
Copyright
Morgan Kaufmann is an imprint of Elsevier
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2022 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any

form or by any means, electronic or mechanical, including
photocopying, recording, or any information storage and retrieval
system, without permission in writing from the publisher. Details on
how to seek permission, further information about the Publisher's
permissions policies and our arrangements with organizations such
as the Copyright Clearance Center and the Copyright Licensing
Agency, can be found at our website:
www.elsevier.com/permissions.
This book and the individual contributions contained in it are

protected under copyright by the Publisher (other than as may be
noted herein).
Cover art: “seven notations,” nickel/silver etched plates, acrylic on
wood structure, copyright © Holly Cohn
Notices
Knowledge and best practice in this field are constantly changing.
As new research and experience broaden our understanding,
changes in research methods, professional practices, or medical
treatment may become necessary.
Practitioners and researchers must always rely on their own
experience and knowledge in evaluating and using any
information, methods, compounds, or experiments described
herein. In using such information or methods they should be
mindful of their own safety and the safety of others, including
parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the
authors, contributors, or editors, assume any liability for any injury
and/or damage to persons or property as a matter of products
liability, negligence or otherwise, or from any use or operation of
any methods, products, instructions, or ideas contained in the
material herein.
Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of
Congress
British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library
ISBN: 978-0-12-804605-0
For information on all Morgan Kaufmann publications visit our

website at https://www.elsevier.com/books-and-journals
Publisher: Katey Birtcher

Acquisitions Editor: Stephen Merken
Content Development Manager: Meghan Andress
Publishing Services Manager: Shereen Jameel
Production Project Manager: Rukmani Krishnan
Designer: Victoria Pearson
Typeset by VTeX
Printed in United States of America
Last digit is the print number: 9 8 7 6 5 4 3 2 1

Dedication
To the memory of Robert S. Miller

Preface
Parallel hardware has been ubiquitous for some time now: it's
difficult to find a laptop, desktop, or server that doesn't use a
multicore processor. Cluster computing is nearly as common today
as high-powered workstations were in the 1990s, and cloud
computing is making distributed-memory systems as accessible as
desktops. In spite of this, most computer science majors graduate
with little or no experience in parallel programming. Many colleges
and universities offer upper-division elective courses in parallel
computing, but since most computer science majors have to take a
large number of required courses, many graduate without ever
writing a multithreaded or multiprocess program.
It seems clear that this state of affairs needs to change. Whereas
many programs can obtain satisfactory performance on a single core,
computer scientists should be made aware of the potentially vast
performance improvements that can be obtained with parallelism,
and they should be able to exploit this potential when the need
arises.
Introduction to Parallel Programming was written to partially
address this problem. It provides an introduction to writing parallel
programs using MPI, Pthreads, OpenMP, and CUDA, four of the
most widely used APIs for parallel programming. The intended
audience is students and professionals who need to write parallel
programs. The prerequisites are minimal: a college-level course in
mathematics and the ability to write serial programs in C.
The prerequisites are minimal, because we believe that students
should be able to start programming parallel systems as early as
possible. At the University of San Francisco, computer science
students can fulfill a requirement for the major by taking a course on
which this text is based immediately after taking the “Introduction
to Computer Science I” course that most majors take in the first
semester of their freshman year. It has been our experience that there
really is no reason for students to defer writing parallel programs
until their junior or senior year. To the contrary, the course is
popular, and students have found that using concurrency in other
courses is much easier after having taken this course.
If second-semester freshmen can learn to write parallel programs
by taking a class, then motivated computing professionals should be
able to learn to write parallel programs through self-study. We hope
this book will prove to be a useful resource for them.
The Second Edition
It has been nearly ten years since the first edition of Introduction to
Parallel Programming was published. During that time much has
changed in the world of parallel programming, but, perhaps
surprisingly, much also remains the same. Our intent in writing this
second edition has been to preserve the material from the first
edition that continues to be generally useful, but also to add new
material where we felt it was needed.
The most obvious addition is the inclusion of a new chapter on
CUDA programming. When the first edition was published, CUDA
was still very new. It was already clear that the use of GPUs in high-
performance computing would become very widespread, but at that
time we felt that GPGPU wasn't readily accessible to programmers
with relatively little experience. In the last ten years, that has clearly
changed. Of course, CUDA is not a standard, and features are
added, modified, and deleted with great rapidity. As a consequence,
authors who use CUDA must present a subject that changes much
faster than a standard, such as MPI, Pthreads, or OpenMP. In spite of
this, we hope that our presentation of CUDA will continue to be
useful for some time.
Another big change is that Matthew Malensek has come onboard
as a coauthor. Matthew is a relatively new colleague at the
University of San Francisco, but he has extensive experience with
both the teaching and application of parallel computing. His
contributions have greatly improved the second edition.
g y p
About This Book
As we noted earlier, the main purpose of the book is to teach
parallel programming in MPI, Pthreads, OpenMP, and CUDA to an
audience with a limited background in computer science and no
previous experience with parallelism. We also wanted to make the
book as flexible as possible so that readers who have no interest in
learning one or two of the APIs can still read the remaining material
with little effort. Thus the chapters on the four APIs are largely
independent of each other: they can be read in any order, and one or
two of these chapters can be omitted. This independence has some
cost: it was necessary to repeat some of the material in these
chapters. Of course, repeated material can be simply scanned or
skipped.
On the other hand, readers with no prior experience with parallel
computing should read Chapter 1 first. This chapter attempts to
provide a relatively nontechnical explanation of why parallel
systems have come to dominate the computer landscape. It also
provides a short introduction to parallel systems and parallel
programming.
Chapter 2 provides technical background on computer hardware
and software. Chapters 3 to 6 provide independent introductions to
MPI, Pthreads, OpenMP, and CUDA, respectively. Chapter 7
illustrates the development of two different parallel programs using
each of the four APIs. Finally, Chapter 8 provides a few pointers to
additional information on parallel computing.
We use the C programming language for developing our
programs, because all four API's have C-language interfaces, and,
since C is such a small language, it is a relatively easy language to
learn—especially for C++ and Java programmers, since they will
already be familiar with C's control structures.
Classroom Use
This text grew out of a lower-division undergraduate course at the
University of San Francisco. The course fulfills a requirement for the
computer science major, and it also fulfills a prerequisite for the
undergraduate operating systems, architecture, and networking
courses. The course begins with a four-week introduction to C
g
programming. Since most of the students have already written Java
programs, the bulk of this introduction is devoted to the use pointers
in C.1 The remainder of the course provides introductions first to
programming in MPI, then Pthreads and/or OpenMP, and it finishes
with material covering CUDA.
We cover most of the material in Chapters 1, 3, 4, 5, and 6, and
parts of the material in Chapters 2 and 7. The background in Chapter
2 is introduced as the need arises. For example, before discussing
cache coherence issues in OpenMP (Chapter 5), we cover the
material on caches in Chapter 2.
The coursework consists of weekly homework assignments, five
programming assignments, a couple of midterms and a final exam.
The homework assignments usually involve writing a very short
program or making a small modification to an existing program.
Their purpose is to insure that the students stay current with the
coursework, and to give the students hands-on experience with
ideas introduced in class. It seems likely that their existence has been
one of the principle reasons for the course's success. Most of the
exercises in the text are suitable for these brief assignments.
The programming assignments are larger than the programs
written for homework, but we typically give the students a good
deal of guidance: we'll frequently include pseudocode in the
assignment and discuss some of the more difficult aspects in class.
This extra guidance is often crucial: it's easy to give programming
assignments that will take far too long for the students to complete.
The results of the midterms and finals and the enthusiastic reports
of the professor who teaches operating systems suggest that the
course is actually very successful in teaching students how to write
parallel programs.
For more advanced courses in parallel computing, the text and its
online supporting materials can serve as a supplement so that much
of the material on the syntax and semantics of the four APIs can be
assigned as outside reading.
The text can also be used as a supplement for project-based
courses and courses outside of computer science that make use of
parallel computation.
Support Materials
An online companion site for the book is located at
www.elsevier.com/books-and-journals/book-
companion/9780128046050.. This site will include errata and
complete source for the longer programs we discuss in the text.
Additional material for instructors, including downloadable figures
and solutions to the exercises in the book, can be downloaded from
https://educate.elsevier.com/9780128046050.
We would greatly appreciate readers' letting us know of any errors
they find. Please send email to mmalensek@usfca.edu if you do find
a mistake.
Acknowledgments
In the course of working on this book we've received considerable
help from many individuals. Among them we'd like to thank the
reviewers of the second edition, Steven Frankel (Technion) and Il-
Hyung Cho (Saginaw Valley State University), who read and
commented on draft versions of the new CUDA chapter. We'd also
like to thank the reviewers who read and commented on the initial
proposal for the book: Fikret Ercal (Missouri University of Science
and Technology), Dan Harvey (Southern Oregon University), Joel
Hollingsworth (Elon University), Jens Mache (Lewis and Clark
College), Don McLaughlin (West Virginia University), Manish
Parashar (Rutgers University), Charlie Peck (Earlham College),
Stephen C. Renk (North Central College), Rolfe Josef Sassenfeld (The
University of Texas at El Paso), Joseph Sloan (Wofford College),
Michela Taufer (University of Delaware), Pearl Wang (George Mason
University), Bob Weems (University of Texas at Arlington), and
Cheng-Zhong Xu (Wayne State University). We are also deeply
grateful to the following individuals for their reviews of various
chapters of the book: Duncan Buell (University of South Carolina),
Matthias Gobbert (University of Maryland, Baltimore County),
Krishna Kavi (University of North Texas), Hong Lin (University of
Houston–Downtown), Kathy Liszka (University of Akron), Leigh
Little (The State University of New York), Xinlian Liu (Hood
College), Henry Tufo (University of Colorado at Boulder), Andrew
g y y
Sloss (Consultant Engineer, ARM), and Gengbin Zheng (University
of Illinois). Their comments and suggestions have made the book
immeasurably better. Of course, we are solely responsible for
remaining errors and omissions.
Slides and the solutions manual for the first edition were prepared
by Kathy Liszka and Jinyoung Choi, respectively. Thanks to both of
them.
The staff at Elsevier has been very helpful throughout this project.
Nate McFadden helped with the development of the text. Todd
Green and Steve Merken were the acquisitions editors. Meghan
Andress was the content development manager. Rukmani Krishnan
was the production editor. Victoria Pearson was the designer. They
did a great job, and we are very grateful to all of them.
Our colleagues in the computer science and mathematics
departments at USF have been extremely helpful during our work
on the book. Peter would like to single out Prof. Gregory Benson for
particular thanks: his understanding of parallel computing—
especially Pthreads and semaphores—has been an invaluable
resource. We're both very grateful to our system administrators,
Alexey Fedosov and Elias Husary. They've patiently and efficiently
dealt with all of the “emergencies” that cropped up while we were
working on programs for the book. They've also done an amazing
job of providing us with the hardware we used to do all program
development and testing.
Peter would never have been able to finish the book without the
encouragement and moral support of his friends Holly Cohn, John
Dean, and Maria Grant. He will always be very grateful for their
help and their friendship. He is especially grateful to Holly for
allowing us to use her work, seven notations, for the cover.
Matthew would like to thank his colleagues in the USF
Department of Computer Science, as well as Maya Malensek and
Doyel Sadhu, for their love and support. Most of all, he would like to
thank Peter Pacheco for being a mentor and infallible source of
advice and wisdom during the formative years of his career in
academia.
Our biggest debt is to our students. As always, they showed us
what was too easy and what was far too difficult. They taught us
how to teach parallel computing. Our deepest thanks to all of them.
1
“Interestingly, a number of students have said that they found the
use of C pointers more difficult than MPI programming.”
Chapter 1: Why parallel
computing
From 1986 to 2003, the performance of microprocessors increased, on
average, more than 50% per year [28]. This unprecedented increase
meant that users and software developers could often simply wait
for the next generation of microprocessors to obtain increased
performance from their applications. Since 2003, however, single-
processor performance improvement has slowed to the point that in
the period from 2015 to 2017, it increased at less than 4% per year
[28]. This difference is dramatic: at 50% per year, performance will
increase by almost a factor of 60 in 10 years, while at 4%, it will
increase by about a factor of 1.5.
Furthermore, this difference in performance increase has been
associated with a dramatic change in processor design. By 2005,
most of the major manufacturers of microprocessors had decided
that the road to rapidly increasing performance lay in the direction
of parallelism. Rather than trying to continue to develop ever-faster
monolithic processors, manufacturers started putting multiple
complete processors on a single integrated circuit.
This change has a very important consequence for software
developers: simply adding more processors will not magically
improve the performance of the vast majority of serial programs,
that is, programs that were written to run on a single processor. Such
programs are unaware of the existence of multiple processors, and
the performance of such a program on a system with multiple
processors will be effectively the same as its performance on a single
processor of the multiprocessor system.
All of this raises a number of questions:
• Why do we care? Aren't single-processor systems fast
enough?
• Why can't microprocessor manufacturers continue to
develop much faster single-processor systems? Why build
parallel systems? Why build systems with multiple
processors?
• Why can't we write programs that will automatically convert
serial programs into parallel programs, that is, programs
that take advantage of the presence of multiple processors?
Let's take a brief look at each of these questions. Keep in mind,

though, that some of the answers aren't carved in stone. For
example, the performance of many applications may already be
more than adequate.
1.1 Why we need ever-increasing performance

The vast increases in computational power that we've been enjoying
for decades now have been at the heart of many of the most dramatic
advances in fields as diverse as science, the Internet, and
entertainment. For example, decoding the human genome, ever
more accurate medical imaging, astonishingly fast and accurate Web
searches, and ever more realistic and responsive computer games
would all have been impossible without these increases. Indeed,
more recent increases in computational power would have been
difficult, if not impossible, without earlier increases. But we can
never rest on our laurels. As our computational power increases, the
number of problems that we can seriously consider solving also
increases. Here are a few examples:
• Climate modeling. To better understand climate change, we

need far more accurate computer models, models that
include interactions between the atmosphere, the oceans,
solid land, and the ice caps at the poles. We also need to be
able to make detailed studies of how various interventions
might affect the global climate.
• Protein folding. It's believed that misfolded proteins may be
involved in diseases such as Huntington's, Parkinson's, and
Alzheimer's, but our ability to study configurations of
complex molecules such as proteins is severely limited by
our current computational power.
• Drug discovery. There are many ways in which increased
computational power can be used in research into new
medical treatments. For example, there are many drugs that
are effective in treating a relatively small fraction of those
suffering from some disease. It's possible that we can devise
alternative treatments by careful analysis of the genomes of
the individuals for whom the known treatment is ineffective.
This, however, will involve extensive computational analysis
of genomes.
• Energy research. Increased computational power will make it
possible to program much more detailed models of
technologies, such as wind turbines, solar cells, and batteries.
These programs may provide the information needed to
construct far more efficient clean energy sources.
• Data analysis. We generate tremendous amounts of data. By
some estimates, the quantity of data stored worldwide
doubles every two years [31], but the vast majority of it is
largely useless unless it's analyzed. As an example, knowing
the sequence of nucleotides in human DNA is, by itself, of
little use. Understanding how this sequence affects
development and how it can cause disease requires extensive
analysis. In addition to genomics, huge quantities of data are
generated by particle colliders, such as the Large Hadron
Collider at CERN, medical imaging, astronomical research,
and Web search engines—to name a few.
These and a host of other problems won't be solved without
tremendous increases in computational power.
1.2 Why we're building parallel systems

Much of the tremendous increase in single-processor performance
was driven by the ever-increasing density of transistors—the
electronic switches—on integrated circuits. As the size of transistors
decreases, their speed can be increased, and the overall speed of the
integrated circuit can be increased. However, as the speed of
transistors increases, their power consumption also increases. Most
of this power is dissipated as heat, and when an integrated circuit
gets too hot, it becomes unreliable. In the first decade of the twenty-
first century, air-cooled integrated circuits reached the limits of their
ability to dissipate heat [28].
Therefore it is becoming impossible to continue to increase the
speed of integrated circuits. Indeed, in the last few years, the
increase in transistor density has slowed dramatically [36].
But given the potential of computing to improve our existence,
there is a moral imperative to continue to increase computational
power.
How then, can we continue to build ever more powerful
computers? The answer is parallelism. Rather than building ever-
faster, more complex, monolithic processors, the industry has
decided to put multiple, relatively simple, complete processors on a
single chip. Such integrated circuits are called multicore processors,
and core has become synonymous with central processing unit, or
CPU. In this setting a conventional processor with one CPU is often
called a single-core system.
1.3 Why we need to write parallel programs

Most programs that have been written for conventional, single-core
systems cannot exploit the presence of multiple cores. We can run
multiple instances of a program on a multicore system, but this is
often of little help. For example, being able to run multiple instances
of our favorite game isn't really what we want—we want the
program to run faster with more realistic graphics. To do this, we
need to either rewrite our serial programs so that they're parallel, so
that they can make use of multiple cores, or write translation
programs, that is, programs that will automatically convert serial
programs into parallel programs. The bad news is that researchers
have had very limited success writing programs that convert serial
programs in languages such as C, C++, and Java into parallel
programs.
This isn't terribly surprising. While we can write programs that
recognize common constructs in serial programs, and automatically
translate these constructs into efficient parallel constructs, the
sequence of parallel constructs may be terribly inefficient. For
example, we can view the multiplication of two matrices as a
sequence of dot products, but parallelizing a matrix multiplication as
a sequence of parallel dot products is likely to be fairly slow on
many systems.
An efficient parallel implementation of a serial program may not
be obtained by finding efficient parallelizations of each of its steps.
Rather, the best parallelization may be obtained by devising an
entirely new algorithm.
As an example, suppose that we need to compute n values and
add them together. We know that this can be done with the
following serial code:
Now suppose we also have p cores and . Then each core can
form a partial sum of approximately values:
Here the prefix indicates that each core is using its own, private
variables, and each core can execute this block of code
independently of the other cores.
After each core completes execution of this code, its variable
will store the sum of the values computed by its calls to
. For example, if there are eight cores, , and the 24 calls to
return the values
1, 4, 3, 9, 2, 8, 5, 1, 1, 6, 2, 7, 2, 5, 0, 4, 1, 8, 6, 5, 1, 2, 3, 9,
then the values stored in might be
Here we're assuming the cores are identified by nonnegative

integers in the range , where p is the number of cores.
When the cores are done computing their values of , they can
form a global sum by sending their results to a designated “master”
core, which can add their results:
In our example, if the master core is core 0, it would add the values
.
But you can probably see a better way to do this—especially if the
number of cores is large. Instead of making the master core do all the
work of computing the final sum, we can pair the cores so that while
core 0 adds in the result of core 1, core 2 can add in the result of core
3, core 4 can add in the result of core 5, and so on. Then we can
repeat the process with only the even-ranked cores: 0 adds in the
result of 2, 4 adds in the result of 6, and so on. Now cores divisible
by 4 repeat the process, and so on. See Fig. 1.1. The circles contain
the current value of each core's sum, and the lines with arrows
indicate that one core is sending its sum to another core. The plus
signs indicate that a core is receiving a sum from another core and
adding the received sum into its own sum.
FIGURE 1.1 Multiple cores forming a global sum.
For both “global” sums, the master core (core 0) does more work
than any other core, and the length of time it takes the program to
complete the final sum should be the length of time it takes for the
master to complete. However, with eight cores, the master will carry
out seven receives and adds using the first method, while with the
second method, it will only carry out three. So the second method
results in an improvement of more than a factor of two. The
difference becomes much more dramatic with large numbers of
cores. With 1000 cores, the first method will require 999 receives and
adds, while the second will only require 10—an improvement of
almost a factor of 100!
The first global sum is a fairly obvious generalization of the serial
global sum: divide the work of adding among the cores, and after
each core has computed its part of the sum, the master core simply
repeats the basic serial addition—if there are p cores, then it needs to
add p values. The second global sum, on the other hand, bears little
relation to the original serial addition.
The point here is that it's unlikely that a translation program
would “discover” the second global sum. Rather, there would more
likely be a predefined efficient global sum that the translation
program would have access to. It could “recognize” the original
serial loop and replace it with a precoded, efficient, parallel global
sum.
We might expect that software could be written so that a large
number of common serial constructs could be recognized and
efficiently parallelized, that is, modified so that they can use
multiple cores. However, as we apply this principle to ever more
complex serial programs, it becomes more and more difficult to
recognize the construct, and it becomes less and less likely that we'll
have a precoded, efficient parallelization.
Thus we cannot simply continue to write serial programs; we
must write parallel programs, programs that exploit the power of
multiple processors.
1.4 How do we write parallel programs?

There are a number of possible answers to this question, but most of
them depend on the basic idea of partitioning the work to be done
among the cores. There are two widely used approaches: task-
parallelism and data-parallelism. In task-parallelism, we partition
the various tasks carried out in solving the problem among the cores.
In data-parallelism, we partition the data used in solving the
problem among the cores, and each core carries out more or less
similar operations on its part of the data.
As an example, suppose that Prof P has to teach a section of
“Survey of English Literature.” Also suppose that Prof P has one
hundred students in her section, so she's been assigned four teaching
assistants (TAs): Mr. A, Ms. B, Mr. C, and Ms. D. At last the semester
is over, and Prof P makes up a final exam that consists of five
questions. To grade the exam, she and her TAs might consider the
following two options: each of them can grade all one hundred
responses to one of the questions; say, P grades question 1, A grades
question 2, and so on. Alternatively, they can divide the one
hundred exams into five piles of twenty exams each, and each of
them can grade all the papers in one of the piles; P grades the papers
in the first pile, A grades the papers in the second pile, and so on.
In both approaches the “cores” are the professor and her TAs. The
first approach might be considered an example of task-parallelism.
There are five tasks to be carried out: grading the first question,
grading the second question, and so on. Presumably, the graders
will be looking for different information in question 1, which is
about Shakespeare, from the information in question 2, which is
about Milton, and so on. So the professor and her TAs will be
“executing different instructions.”
On the other hand, the second approach might be considered an
example of data-parallelism. The “data” are the students' papers,
which are divided among the cores, and each core applies more or
less the same grading instructions to each paper.
The first part of the global sum example in Section 1.3 would
probably be considered an example of data-parallelism. The data are
the values computed by , and each core carries out
roughly the same operations on its assigned elements: it computes
the required values by calling and adds them together.
The second part of the first global sum example might be considered
an example of task-parallelism. There are two tasks: receiving and
adding the cores' partial sums, which is carried out by the master
core; and giving the partial sum to the master core, which is carried
out by the other cores.
When the cores can work independently, writing a parallel
program is much the same as writing a serial program. Things get a
great deal more complex when the cores need to coordinate their
work. In the second global sum example, although the tree structure
in the diagram is very easy to understand, writing the actual code is
g y y g
relatively complex. See Exercises 1.3 and 1.4. Unfortunately, it's
much more common for the cores to need coordination.
In both global sum examples, the coordination involves
communication: one or more cores send their current partial sums to
another core. The global sum examples should also involve
coordination through load balancing. In the first part of the global
sum, it's clear that we want the amount of time taken by each core to
be roughly the same as the time taken by the other cores. If the cores
are identical, and each call to requires the same amount
of work, then we want each core to be assigned roughly the same
number of values as the other cores. If, for example, one core has to
compute most of the values, then the other cores will finish much
sooner than the heavily loaded core, and their computational power
will be wasted.
A third type of coordination is synchronization. As an example,
suppose that instead of computing the values to be added, the values
are read from . Say, is an array that is read in by the master core:
In most systems the cores are not automatically synchronized.

Rather, each core works at its own pace. In this case, the problem is
that we don't want the other cores to race ahead and start computing
their partial sums before the master is done initializing and making
it available to the other cores. That is, the cores need to wait before
starting execution of the code:
We need to add in a point of synchronization between the

initialization of and the computation of the partial sums:
The idea here is that each core will wait in the function
until all the cores have entered the function—in particular, until the
master core has entered this function.
Currently, the most powerful parallel programs are written using
explicit parallel constructs, that is, they are written using extensions
to languages such as C, C++, and Java. These programs include
explicit instructions for parallelism: core 0 executes task 0, core 1
executes task 1, …, all cores synchronize, …, and so on, so such
programs are often extremely complex. Furthermore, the complexity
of modern cores often makes it necessary to use considerable care in
writing the code that will be executed by a single core.
There are other options for writing parallel programs—for
example, higher level languages—but they tend to sacrifice
performance to make program development somewhat easier.
1.5 What we'll be doing

We'll be focusing on learning to write programs that are explicitly
parallel. Our purpose is to learn the basics of programming parallel
computers using the C language and four different APIs or
application program interfaces: the Message-Passing Interface or
MPI, POSIX threads or Pthreads, OpenMP, and CUDA. MPI and
Pthreads are libraries of type definitions, functions, and macros that
can be used in C programs. OpenMP consists of a library and some
modifications to the C compiler. CUDA consists of a library and
modifications to the C++ compiler.
You may well wonder why we're learning about four different
APIs instead of just one. The answer has to do with both the
extensions and parallel systems. Currently, there are two main ways
of classifying parallel systems: one is to consider the memory that
the different cores have access to, and the other is to consider
whether the cores can operate independently of each other.
In the memory classification, we'll be focusing on shared-memory
systems and distributed-memory systems. In a shared-memory
system, the cores can share access to the computer's memory; in
principle, each core can read and write each memory location. In a
shared-memory system, we can coordinate the cores by having them
examine and update shared-memory locations. In a distributed-
memory system, on the other hand, each core has its own, private
memory, and the cores can communicate explicitly by doing
something like sending messages across a network. Fig. 1.2 shows
schematics of the two types of systems.
FIGURE 1.2 (a) A shared memory system and (b) a

distributed memory system.
The second classification divides parallel systems according to the

number of independent instruction streams and the number of
independent data streams. In one type of system, the cores can be
thought of as conventional processors, so they have their own
control units, and they are capable of operating independently of
each other. Each core can manage its own instruction stream and its
own data stream, so this type of system is called a Multiple-
Instruction Multiple-Data or MIMD system.
An alternative is to have a parallel system with cores that are not
capable of managing their own instruction streams: they can be
thought of as cores with no control unit. Rather, the cores share a
single control unit. However, each core can access either its own
private memory or memory that's shared among the cores. In this
type of system, all the cores carry out the same instruction on their
own data, so this type of system is called a Single-Instruction
Multiple-Data or SIMD system.
In a MIMD system, it's perfectly feasible for one core to execute an
addition while another core executes a multiply. In a SIMD system,
two cores either execute the same instruction (on their own data) or,
if they need to execute different instructions, one executes its
instruction while the other is idle, and then the second executes its
instruction while the first is idle. In a SIMD system, we couldn't have
one core executing an addition while another core executes a
multiplication. The system would have to do something like this:
=5.7cm
Time First core Second core

1 Addition Idle
2 Idle Multiply
Since you're used to programming a processor with its own

control unit, MIMD systems may seem more natural to you.
However, as we'll see, there are many problems that are very easy to
solve using a SIMD system. As a very simple example, suppose we
have three arrays, each with n elements, and we want to add
corresponding entries of the first two arrays to get the values in the
third array. The serial pseudocode might look like this:
Now suppose we have n SIMD cores, and each core is assigned one
element from each of the three arrays: core i is assigned elements
, and . Then our program can simply tell each core to add its
x- and y-values to get the z value:
This type of system is fundamental to modern Graphics Processing

Units or GPUs, and since GPUs are extremely powerful parallel
processors, it's important that we learn how to program them.
Our different APIs are used for programming different types of
systems:
• MPI is an API for programming distributed memory MIMD

systems.
• Pthreads is an API for programming shared memory MIMD
systems.
• OpenMP is an API for programming both shared memory
MIMD and shared memory SIMD systems, although we'll be
focusing on programming MIMD systems.
• CUDA is an API for programming Nvidia GPUs, which
have aspects of all four of our classifications: shared memory
and distributed memory, SIMD, and MIMD. We will,
however, be focusing on the shared memory SIMD and
MIMD aspects of the API.
1.6 Concurrent, parallel, distributed

If you look at some other books on parallel computing or you search
the Web for information on parallel computing, you're likely to also
run across the terms concurrent computing and distributed
computing. Although there isn't complete agreement on the
distinction between the terms parallel, distributed, and concurrent,
many authors make the following distinctions:
• In concurrent computing, a program is one in which
multiple tasks can be in progress at any instant [5].
• In parallel computing, a program is one in which multiple
tasks cooperate closely to solve a problem.
• In distributed computing, a program may need to cooperate
with other programs to solve a problem.
So parallel and distributed programs are concurrent, but a

program such as a multitasking operating system is also concurrent,
even when it is run on a machine with only one core, since multiple
tasks can be in progress at any instant. There isn't a clear-cut
distinction between parallel and distributed programs, but a parallel
program usually runs multiple tasks simultaneously on cores that
are physically close to each other and that either share the same
memory or are connected by a very high-speed network. On the
other hand, distributed programs tend to be more “loosely coupled.”
The tasks may be executed by multiple computers that are separated
by relatively large distances, and the tasks themselves are often
executed by programs that were created independently. As
examples, our two concurrent addition programs would be
considered parallel by most authors, while a Web search program
would be considered distributed.
But beware, there isn't general agreement on these terms. For
example, many authors consider shared-memory programs to be
“parallel” and distributed-memory programs to be “distributed.” As
our title suggests, we'll be interested in parallel programs—
programs in which closely coupled tasks cooperate to solve a
problem.
1.7 The rest of the book

How can we use this book to help us write parallel programs?
First, when you're interested in high performance, whether you're
writing serial or parallel programs, you need to know a little bit
about the systems you're working with—both hardware and
software. So in Chapter 2, we'll give an overview of parallel
hardware and software. In order to understand this discussion, it
will be necessary to review some information on serial hardware and
software. Much of the material in Chapter 2 won't be needed when
we're getting started, so you might want to skim some of this
material and refer back to it occasionally when you're reading later
chapters.
The heart of the book is contained in Chapters 3–7. Chapters 3, 4,
5, and 6 provide a very elementary introduction to programming
parallel systems using C and MPI, Pthreads, OpenMP, and CUDA,
respectively. The only prerequisite for reading these chapters is a
knowledge of C programming. We've tried to make these chapters
independent of each other, and you should be able to read them in
any order. However, to make them independent, we did find it
necessary to repeat some material. So if you've read one of the three
chapters, and you go on to read another, be prepared to skim over
some of the material in the new chapter.
Chapter 7 puts together all we've learned in the preceding
chapters. It develops two fairly large programs using each of the
four APIs. However, it should be possible to read much of this even
if you've only read one of Chapters 3, 4, 5, or 6. The last chapter,
Chapter 8, provides a few suggestions for further study on parallel
programming.
1.8 A word of warning

Before proceeding, a word of warning. It may be tempting to write
parallel programs “by the seat of your pants,” without taking the
trouble to carefully design and incrementally develop your program.
This will almost certainly be a mistake. Every parallel program
contains at least one serial program. Since we almost always need to
coordinate the actions of multiple cores, writing parallel programs is
almost always more complex than writing a serial program that
solves the same problem. In fact, it is often far more complex. All the
rules about careful design and development are usually far more
important for the writing of parallel programs than they are for
serial programs.
1.9 Typographical conventions
We'll make use of the following typefaces in the text:
• Program text, displayed or within running text, will use the

following typefaces:
• Definitions are given in the body of the text, and the term
being defined is printed in boldface type: A parallel
program can make use of multiple cores.
• When we need to refer to the environment in which a
program is being developed, we'll assume that we're using a
UNIX shell, such as , and we'll use a to indicate the shell
prompt:
• We'll specify the syntax of function calls with fixed

argument lists by including a sample argument list. For
example, the integer absolute value function, , in ,
might have its syntax specified with
For more complicated syntax, we'll enclose required content in
angle brackets and optional content in square brackets .
For example, the C statement might have its syntax
specified as follows:
This says that the statement must include an expression

enclosed in parentheses, and the right parenthesis must be
followed by a statement. This statement can be followed by
an optional clause. If the clause is present, it must
include a second statement.
1.10 Summary
For many years we've reaped the benefits of having ever-faster
processors. However, because of physical limitations, the rate of
performance improvement in conventional processors has decreased
dramatically. To increase the power of processors, chipmakers have
turned to multicore integrated circuits, that is, integrated circuits
with multiple conventional processors on a single chip.
Ordinary serial programs, which are programs written for a
conventional single-core processor, usually cannot exploit the
presence of multiple cores, and it's unlikely that translation
programs will be able to shoulder all the work of converting serial
programs into parallel programs—programs that can make use of
multiple cores. As software developers, we need to learn to write
parallel programs.
When we write parallel programs, we usually need to coordinate
the work of the cores. This can involve communication among the
cores, load balancing, and synchronization of the cores.
In this book we'll be learning to program parallel systems, so that
we can maximize their performance. We'll be using the C language
with four different application program interfaces or APIs: MPI,
Pthreads, OpenMP, and CUDA. These APIs are used to program
parallel systems that are classified according to how the cores access
memory and whether the individual cores can operate
independently of each other.
In the first classification, we distinguish between shared-memory
and distributed-memory systems. In a shared-memory system, the
cores share access to one large pool of memory, and they can
coordinate their actions by accessing shared memory locations. In a
distributed-memory system, each core has its own private memory,
and the cores can coordinate their actions by sending messages
across a network.
In the second classification, we distinguish between systems with
cores that can operate independently of each other and systems in
which the cores all execute the same instruction. In both types of
system, the cores can operate on their own data stream. So the first
type of system is called a multiple-instruction multiple-data or
MIMD system, and the second type of system is called a single-
instruction multiple-data or SIMD system.
MPI is used for programming distributed-memory MIMD
systems. Pthreads is used for programming shared-memory MIMD
systems. OpenMP can be used to program both shared-memory
MIMD and shared-memory SIMD systems, although we'll be
looking at using it to program MIMD systems. CUDA is used for
programming Nvidia graphics processing units or GPUs. GPUs
have aspects of all four types of system, but we'll be mainly
interested in the shared-memory SIMD and shared-memory MIMD
aspects.
Concurrent programs can have multiple tasks in progress at any
instant. Parallel and distributed programs usually have tasks that
execute simultaneously. There isn't a hard and fast distinction
y
between parallel and distributed, although in parallel programs, the
tasks are usually more tightly coupled.
Parallel programs are usually very complex. So it's even more
important to use good program development techniques with
parallel programs.
1.11 Exercises
1.1 Devise formulas for the functions that calculate and

in the global sum example. Remember that each core
should be assigned roughly the same number of elements of
computations in the loop. : First consider the case when n
is evenly divisible by p.
1.2 We've implicitly assumed that each call to
requires roughly the same amount of work as the other calls.
How would you change your answer to the preceding
question if call requires times as much work as the
call with ? How would you change your answer if the
first call ( ) requires 2 milliseconds, the second call ( )
requires 4, the third ( ) requires 6, and so on?
1.3 Try to write pseudocode for the tree-structured global sum
illustrated in Fig. 1.1. Assume the number of cores is a power
of two (1, 2, 4, 8, …). : Use a variable to determine
whether a core should send its sum or receive and add. The
should start with the value 2 and be doubled after each
iteration. Also use a variable to determine which
core should be partnered with the current core. It should
start with the value 1 and also be doubled after each
iteration. For example, in the first iteration and
, so 0 receives and adds, while 1 sends. Also in the
first iteration and , so 0 and
1 are paired in the first iteration.
1.4 As an alternative to the approach outlined in the preceding
problem, we can use C's bitwise operators to implement the
tree-structured global sum. To see how this works, it helps to
write down the binary (base 2) representation of each of the
core ranks and note the pairings during each stage: =8.5cm
From the table, we see that during the first stage each core is
paired with the core whose rank differs in the rightmost or
first bit. During the second stage, cores that continue are
paired with the core whose rank differs in the second bit; and
during the third stage, cores are paired with the core whose
rank differs in the third bit. Thus if we have a binary value
that is 0012 for the first stage, 0102 for the second, and
1002 for the third, we can get the rank of the core we're
paired with by “inverting” the bit in our rank that is nonzero
in . This can be done using the bitwise exclusive or ∧
operator.
Implement this algorithm in pseudocode using the bitwise
exclusive or and the left-shift operator.
1.5 What happens if your pseudocode in Exercise 1.3 or Exercise
1.4 is run when the number of cores is not a power of two
(e.g., 3, 5, 6, 7)? Can you modify the pseudocode so that it
will work correctly regardless of the number of cores?
1.6 Derive formulas for the number of receives and additions
that core 0 carries out using
a. the original pseudocode for a global sum, and
b. the tree-structured global sum.
Make a table showing the numbers of receives and additions
carried out by core 0 when the two sums are used with
cores.
1.7 The first part of the global sum example—when each core
adds its assigned computed values—is usually considered to
be an example of data-parallelism, while the second part of
the first global sum—when the cores send their partial sums
to the master core, which adds them—could be considered to
be an example of task-parallelism. What about the second
part of the second global sum—when the cores use a tree
structure to add their partial sums? Is this an example of
data- or task-parallelism? Why?
1.8 Suppose the faculty members are throwing a party for the
students in the department.
a. Identify tasks that can be assigned to the faculty
members that will allow them to use task-parallelism
when they prepare for the party. Work out a schedule
that shows when the various tasks can be performed.
b. We might hope that one of the tasks in the preceding
part is cleaning the house where the party will be held.
How can we use data-parallelism to partition the work
of cleaning the house among the faculty?
c. Use a combination of task- and data-parallelism to
prepare for the party. (If there's too much work for the
faculty, you can use TAs to pick up the slack.)
1.9 Write an essay describing a research problem in your major
that would benefit from the use of parallel computing.
Provide a rough outline of how parallelism would be used.
Would you use task- or data-parallelism?
Bibliography
[5] Clay Breshears, The Art of Concurrency: A Thread Monkey's
Guide to Writing Parallel Applications. Sebastopol, CA:
O'Reilly; 2009.
[28] John Hennessy, David Patterson, Computer Architecture: A
Quantitative Approach. 6th ed. Burlington, MA: Morgan
Kaufmann; 2019.
[31] IBM, IBM InfoSphere Streams v1.2.0 supports highly
complex heterogeneous data analysis, IBM United States
Software Announcement 210-037, Feb. 23, 2010
http://www.ibm.com/common/ssi/rep_ca/7/897/ENUS210-
037/ENUS210-037.PDF.
[36] John Loeffler, No more transistors: the end of Moore's Law,
Interesting Engineering, Nov 29, 2018. See
https://interestingengineering.com/no-more-transistors-the-
end-of-moores-law.
Chapter 2: Parallel hardware and
parallel software
It's perfectly feasible for specialists in disciplines other than
computer science and computer engineering to write parallel
programs. However, to write efficient parallel programs, we often
need some knowledge of the underlying hardware and system
software. It's also very useful to have some knowledge of different
types of parallel software, so in this chapter we'll take a brief look at
a few topics in hardware and software. We'll also take a brief look at
evaluating program performance and a method for developing
parallel programs. We'll close with a discussion of what kind of
environment we might expect to be working in, and a few rules and
assumptions we'll make in the rest of the book.
This is a long, broad chapter, so it may be a good idea to skim
through some of the sections on a first reading so that you have a
good idea of what's in the chapter. Then, when a concept or term in a
later chapter isn't quite clear, it may be helpful to refer back to this
chapter. In particular, you may want to skim over most of the
material in “Modifications to the von Neumann Model,” except “The
Basics of Caching.” Also, in the “Parallel Hardware” section, you can
safely skim the material on “Interconnection Networks.” You can
also skim the material on “SIMD Systems” unless you're planning to
read the chapter on CUDA programming.
2.1 Some background

Parallel hardware and software have grown out of conventional
serial hardware and software: hardware and software that runs
(more or less) a single job at a time. So to better understand the
current state of parallel systems, let's begin with a brief look at a few
aspects of serial systems.
2.1.1 The von Neumann architecture

The “classical” von Neumann architecture consists of main
memory, a central-processing unit (CPU) or processor or core, and
an interconnection between the memory and the CPU. Main
memory consists of a collection of locations, each of which is capable
of storing both instructions and data. Every location has an address
and the location's contents. The address is used to access the location,
and the contents of the location is the instruction or data stored in
the location.
The central processing unit is logically divided into a control unit
and a datapath. The control unit is responsible for deciding which
instructions in a program should be executed, and the datapath is
responsible for executing the actual instructions. Data in the CPU
and information about the state of an executing program are stored
in special, very fast storage, called registers. The control unit has a
special register called the program counter. It stores the address of
the next instruction to be executed.
Instructions and data are transferred between the CPU and
memory via the interconnect. This has traditionally been a bus,
which consists of a collection of parallel wires and some hardware
controlling access to the wires. More recent systems use more
complex interconnects. (See Section 2.3.4.) A von Neumann machine
executes a single instruction at a time, and each instruction operates
on only a few pieces of data. See Fig. 2.1.
Another random document with
no related content on Scribd:
des syllabes gutturales qui sont autant de versets du Koran. Dehors,
par les rues ensoleillées et blanches, c’est, le long des échopes des
tisserands et des brodeurs, la plupart silencieuses et closes, une
atmosphère de fête et de repos ; la ville est sillonnée de
promeneurs : nomades encapuchonnés de laine fauve, Marocains
laissant entrevoir des ceintures de soie claire sur de bouffantes
grègues de drap mauve ou vert tendre toutes soutachées d’argent ;
jeunes indigènes sveltes et musclés dans des burnous d’une
blancheur insolite avec, au coin de l’oreille, la branche de narcisses
ou la rose piquée sous le foulard du turban.
Avec la joie en dedans, qui est le propre de l’Arabe, toutes ces
silhouettes élégantes et racées, chevilles fines et torses minces,
vont et viennent, se croisent à travers les rues montantes avec à
peine un sourire au passage pour l’ami rencontré ou la
connaissance saluée du bout des doigts posés sur la bouche et sur
l’œil ; et le silence de cette gaieté étonne, cette gaieté majestueuse
et hautaine sans un geste et sans une parole au milieu des
derboukas et des glapissements de flûtes, bourdonnant au fond des
cafés maures.
Ils sont bondés, encombrés aujourd’hui à ne point y jeter une
épingle. Un grouillement de cabans et de loques vermineuses y
prend le thé et le kaoua, vautré sur l’estrade tendue de nattes qui
sert ici de lit et de divan. De hâves visages d’ascètes y stupéfient,
reculés dans le clair-obscur des capuchons, à côté de grands yeux
noirs à paupières lourdes et de faces souriantes d’Arabes de la
Kabylie ; des uniformes de turcos mettent au milieu de ces grisailles
d’éclatantes taches bleu de ciel, car c’est aujourd’hui jour de sortie
pour eux, les autorités françaises ont égard à la piété musulmane et
toutes les casernes du Méchouar sont dehors.
Depuis dix heures du matin, l’ancienne citadelle d’Abd-el-Kader
vomit par son unique porte en plein cintre un flot ininterrompu de
tiraillours. Astiqués, guêtrés de blanc, le crâne tondu et la face
éclairée d’un sourire à dents blanches sous le turban de Mahomet,
ils se répandent joyeux à travers la ville, abordent les indigènes,
disparaissent à des coins de ruelles, sous de mystérieuses portes
basses, logis de parents ou d’amis, entrent gravement dans les
mosquées, stationnent un moment devant les marchands d’oranges,
de jujubes et de figues de Barbarie, puis vont s’échouer au café
maure, où ils prennent place, graves, au milieu des joueurs, et,
tandis que les burnous, allongés dans un indescriptible
enchevêtrement de bras et de pieds nus, remuent les dés, les
échecs et les cartes ; eux, extatiques et muets, les braves petits
tirailleurs algériens, vident avec recueillement l’imperceptible tasse
de kaoua, hypnotisés par les aigres grincements de quelque joueur
de mandoline.
Quelques-uns, en vrais fils de l’Orient, au lieu de l’éternelle
cigarette roulée au bout des doigts, fument silencieusement le kief.
Un enfant dressé à cet usage bourre le narghilé et le tend aux
fumeurs ; et, tandis que le maître du café s’agite et va et vient autour
de son petit fourneau de faïence, dans les étincellements d’émail et
de porcelaine de ses innombrables petites tasses, le fumeur, déjà
engourdi par l’opium, laisse tomber d’un geste las le bec du narghilé
et s’assoupit, les yeux au plafond, immobile.
Dans des embrasures équivoques, des visages de mauresques
fardées apparaissent. Les pommettes sont d’un rose inquiétant de
vin nouveau, des tatouages en étoiles nimbent leurs tempes ou
trouent leurs joues d’invraisemblables mouches ; la nuit tombe,
d’autres portes s’entrebâillent au coin de ruelles infâmes, et des
intérieurs d’une nudité et d’une saleté de tanières s’entrevoient à la
lueur d’une chandelle fichée dans un goulot de bouteille ou à même
le suif égoutté sur une table ; des robes de percales claires et des
bustes entortillés de châles se hasardent sur des seuils, des appels
et des provocations en idiomes d’Espagne harcèlent des zouaves et
des chasseurs d’Afrique qui ricanent et passent ; un groupe de
turcos entre en se bousculant sous une voûte ornée de colonnettes
à chapiteaux de marbre, une odeur d’aromates et de suint s’en
échappe ; il est six heures, on ouvre les bains maures.
LES VILLES MORTES

Une haute muraille d’argile et de basalte dressant pendant des
lieues des contreforts rougeâtres avec çà et là des taches vertes, qui
sont des vignes et parfois des lentisques : crêtes déchirées où des
flocons de nuages s’accrochent comme des lambeaux de toisons,
car la muraille est haute et se perd dans le ciel : la chaîne du Djebel-
Térim.
Au pied, d’interminables vignes, des vergers d’oliviers séculaires,
des bosquets de figuiers convulsés et trapus, des haies bleuâtres de
cactus, cerclant l’orge et le blé des cultures indigènes, et, le long des
sentiers bordés de petits murs, des irrigations d’eau vive débordant
d’étroits caniveaux creusés à profondeur de bêche, qui vont porter la
fraîcheur et la fécondité à travers cinquante lieues de labours et de
jardins : la vallée de l’Isser.
Derrière vous, ce mamelon couronné de murs blancs, que
chacun de vos pas en avant abaisse et efface, Tlemcen, la cité des
Émirs : Tlemcen déjà lointaine et dont les sonneries de casernes,
claironnant depuis cinq heures du matin, n’arrivent plus maintenant
qu’en modulations vagues, confondues avec les grincements de
guitare d’un colon espagnol, rencontré tout à l’heure au tournant
d’un chemin.
Et dans cette solitude cultivée, au passant rare, où nul toit de
métairie n’apparaît, tout à coup surgissent devant vous des tours,
hautes tours ruinées, éventrées et pourtant se tenant encore. De
croulantes murailles les relient ; c’est l’ancienne enceinte d’une ville
disparue, s’ouvrant en cirque sur cent hectares jadis bâtis de
luxueuses demeures, de palais, de mosquées, de koubas et de
bains : Mansourah.
Mansourah, la ville guerrière, dont la splendeur rivale tint huit ans
en échec la prospérité menacée de Tlemcen ; Mansourah, la ville
assiégeante bâtie à une lieue de la ville assiégée ; Mansourah, dont
l’enceinte, aujourd’hui démantelée, éparpille à mi-flanc du Djebel-
Térim jusqu’à travers les vallées de l’Isser les moellons de ses tours
et les briques vernissées de ses portes, les monuments, les maisons
et les rues ayant été rasés par les vainqueurs avec défense à tous
les habitants de la plaine de prononcer jamais le nom de la ville
détruite et de tenter de bâtir sur son emplacement.
Un siège de huit ans, que soutint la cité des Émirs, s’éveillant un
matin, après trois assauts successifs, enveloppée d’une épaisse
muraille en pisé dont on admire encore les restes, et, du coup,
bloquée, sans communication, privée de vivres et de renforts, et
comme ce n’était pas assez, voilà qu’au milieu du camp ennemi
s’élevait en même temps une ville. La mosquée surgissait la
première, une des plus grandes qui aient jamais existé, ensuite le
minaret poste-vigie d’où l’on pouvait, à trente mètres de hauteur,
surveiller les allées et venues des assiégés, puis des maisons se
groupèrent autour des monuments : palais des grands chefs
environnés de jardins, cafés et bains maures, et enfin des demeures
plus humbles, abris de fantassins ou des simples cavaliers.
Et ce fut Mansourah, la cité assiégeante, grandie comme dans
un rêve menaçant et terrible sous les remparts même de Tlemcen,
Tlemcen, la ville investie, affamée et déjà réduite à composition.
Qu’advint-il ? Les indigènes ont voué aux sultans Yacoub et
Youcef, qui mirent autrefois, dans la nuit des temps, la cité des Émirs
en péril, une si fanatique et si vivace haine, qu’il est presque
impossible de se faire raconter la légende, et c’est à peine si l’Arabe
interrogé sur l’histoire de ces ruines consent à vous en dire le nom
comme à regret : Mansourah.
Singulière destinée des choses humaines ! Tlemcen vouée à la
destruction subsiste encore, bien plus, est demeurée la reine du
Magreb et, toute hérissée de minarets et de mosquées, a conservé
intactes les richesses de sa merveilleuse architecture. De
Mansourah-la-Victorieuse, il ne reste que des débris de murailles,
des tours en ruine ; sur les cent hectares jadis couverts de palais et
de luxueuses demeures, colons et indigènes ont planté de la vigne.
En vain son minaret de briques roses et vertes se dresse-t-il encore
orgueilleusement auprès de sa pauvre mosquée. Vaincue par la
Djéma-el-Kébir, le croyant fidèle n’en franchit plus jamais le seuil ;
seuls les roumis troublent parfois l’abandon et la solitude de ses
salles à ciel ouvert, car les plafonds ont croulé avec l’arceau des
voûtes ; et des fissures des anciennes mosaïques ont jailli çà et là
des pieds noueux et tordus d’amandiers, dont l’arabe nomade
dédaigne même la fleur.
LE CHAMP DES IRIS
Il faisait ce jour-là un ciel pâle et blanc, un ciel d’hiver ouaté de

légers nuages, dont la mélancolie nous donnait pour la première
fois, avec la sensation de l’exil, le regret de la France ; et, fatigués
de monter et descendre les éternelles petites rues étroites aux
maisons crépies à la chaux, plus las encore de haltes et de
marchandages devant les échopes en tanières des ciseleurs de
filigranes et des tisseurs de tapis, nous avions pris le parti d’aller
promener notre ennui en dehors de la ville, dans cette campagne à
la fois verdoyante et morne, que le Djebel-Térim et ses hauts
contreforts crénelés et droits attristent encore de leur ombre.
Je ne sais plus quel officier de la place nous avait parlé, la veille,
du tombeau d’un marabout fameux, bâti à mi-côte, à quelques lieues
de Tlemcen, et dormant là, depuis déjà des siècles, auprès de la
mosquée, toute de mosaïque et de bronze, d’une petite ville en
ruine, cité mourante du fatalisme de ses habitants, Bou-Médine ; et il
nous avait plu à nous, qui l’avant-veille avions visité Mansourah, la
ville morte, d’aller contempler de près ce grand village arabe,
s’émiettant pierre à pierre autour de sa mosquée par obéissance au
marabout enterré là ; car l’arabe de Bou-Médine ne relève jamais,
n’étaye même pas sa maison qui s’écroule. Il laisse s’accomplir la
volonté d’en haut ; et quand son toit est effondré et la porte de son
seuil pourrie, il se lève et va ailleurs ; et c’est peut-être en vérité le
secret du charme enveloppant, un peu triste et berceur, de Tlemcen
et de son paysage, que cette antique ville arabe renaissant sous la
domination européenne entre Mansourah, la ville morte, et Bou-
Médine, la ville mourante, qui va s’effritant d’heure en heure et se
dépeuplant de jour en jour.
Et puis, c’était, nous avait-on dit, dans l’intérieur même du
tombeau du prophète, des faïences de la plus belle époque arabe,
éclatantes et fraîches comme placées d’hier, et puis il y avait là tout
un trésor d’étendards musulmans baignant les mosaïques de
merveilleuses soies, et la prière en extase d’éternelles femmes
voilées autour d’un puits d’eau vive à la margelle de marbre, la
légende attribuant au puissant marabout le don de féconder l’épouse
stérile et le miracle des imprévues maternités ; et l’on nous faisait
grâce des curiosités de la route ; un des plus beaux décors de la
province avec ses talus gazonnés tout fleuris de pervenches, ses
haies parfumées de sureau et ses ruisselets d’eau courante arrosant
les frêles colonnettes d’autres koubas, tombeaux moins importants
de prophètes moins fameux, éparpillant autour de Bou-Médine leurs
réductions de dômes, tous blanchis à la chaux.
Et nous filions au galop démantibulé de deux chevaux de louage,
les yeux aux cimes des montagnes toutes baignées de vapeurs, la
pensée absente, envolée auprès des affections lointaines
demeurées au delà des mers et des lieues, vraiment désemparés et
désâmés sous ce moite et pâle ciel d’Afrique, ce jour-là si pareil au
ciel mélancolique et doux de nos climats.
Tlemcen était déjà loin derrière nous, comme enfoncée au ras de
ses remparts sur son mamelon aux pentes ravinées, et déjà le
minaret de Bou-Médine se détachait couleur d’onyx auprès du dôme
blanc de sa mosquée, à mi-flanc du Djebel-Térim, quand notre
voiture tout à coup s’arrêtait : l’un de nous venait de toucher l’épaule
du cocher…
A notre droite, de l’autre côté de la route, séparée par un profond
fossé, s’étendait une grande pelouse bossuée çà et là de monticules
gazonnés et de larges mosaïques. Une hostile haie, cactus
bleuâtres et figuiers de Barbarie, enchevêtrait autour leurs raquettes
et leurs dards ; un terre-plein traversait le fossé, qui reliait la pelouse
à la route, et deux hauts piliers de pierre, coiffés de boules verdies,
en indiquaient la porte, une porte béante que continuait, à travers les
replis du terrain, une large et sombre allée de cyprès, mais des
cyprès géants comme on en voit seulement dans les pays de
l’Islam : leurs cônes noirs semblaient dépasser les crêtes des
montagnes. « Le cimetière arabe », nous disait notre cocher.
Il était charmant et comme hanté de douces et profondes
rêveries, ce cimetière arabe s’étendant là aux portes de la ville, au
pied de ces hauteurs abruptes, rougeâtres, couronnées de vapeurs ;
et le deuil de ses cyprès et de ses tombes s’éclairait, comme d’une
parure, d’une poésie imprévue et touchante… Il était littéralement
bleu de fleurs, mais bleu comme la mer et bleu comme le ciel, du
bleu profond des vagues à peine remuées, et du bleu un peu mauve
des horizons de montagnes, toute une bleue floraison d’iris nains
ayant jailli là, foisonnante et vivace, entre les tombes. Iris d’Afrique
presque sans tiges, précoces et parfumés, fleurs d’hiver de ces
climats enchantés, fleurs de deuil aussi, puisque de cimetières, et
réflétant dans leurs calices humides, comme touchés d’une lueur,
tous les bleus imaginables, depuis celui de la Méditerranée jusqu’au
bleu transparent des ailes de libellules, et l’azur un peu triste des
ciels lavés de pluie et l’azur assombri des pervenches de mars ; et
sur ses pentes gazonnées, se renflant et s’abaissant çà et là, c’était
comme un soulèvement d’immobiles et courtes vagues ; une mer à
la fois verte et bleue, battant les dômes blanchis des koubas et les
mosaïques des tombes d’une submergeante écume de fleurs.
« Tu dormiras sous les iris », dit je ne sais quel refrain de poésie
arabe ; et, l’âme envahie, pénétrée d’une délicieuse et calmante
tristesse, nous allions à travers le champ du repos, observés et
suivis çà et là, par les lourds regards noirs des Mauresques voilées,
car ce cimetière à l’entrée si déserte et d’apparence abandonné
sous son flux de fleurs bleuissantes apparaissait peu à peu peuplé
de fantômes. Chacun de nos pas en avant nous en découvrait un
assis, les jambes croisées, auprès des sépultures. Silhouettes
encapuchonnées d’indigènes immobilisés là, un chapelet entre leurs
doigts osseux, avec, sous leurs longues paupières, le regard lointain
et fixe des races contemplatives ; affaissement d’étoffes et de voiles
de femmes en prière, l’air de stryges avec leurs faces pâles
masquées du haïck, toutes conversant doucement d’une voix
chantonnante et rauque avec l’époux ou le parent mort ; car le
musulman n’a pas du cadavre et du néant final l’épouvante horrifiée
du chrétien. Son imagination lumineuse n’en évoque ni le squelette
ni le charnier ; il croit son mort endormi, demeuré là vivant sous la
kouba de chaux ou la mosaïque de faïence et, comme on vient
veiller sur le sommeil d’un enfant, le nomade des plaines et le Maure
des villes viennent s’asseoir et rêver durant de longues heures
auprès des sépultures chères, dans la méditation du passé et de
mystérieux colloques avec l’être défunt ; ils ne le croient qu’endormi.
Et la preuve de cette foi consolante nous était donnée par un vieux
mendiant du désert, biblique silhouette et burnous en loque,
accroupi, les mains jointes, sur le bord d’une tombe. « Celle de sa
troisième femme, nous disait notre guide », et, bien qu’infirme et
presque aveugle, venu là à pied de plus de cinquante lieues passer
la journée avec la morte. L’air d’un vieux dromadaire avec sa face
ravinée et poilue, il marmottait avec ardeur une espèce de mélopée,
ses pauvres jambes maigres repliées sous lui, à la fois touchant et
comique sous la garde d’une petite fille de dix ans à peine, tout
enjoaillée de bracelets et de sequins, l’allure d’une petite princesse,
avec ses grands yeux noirs dans son petit visage fauve ; enfantine
Antigone dont les petits pieds nus avaient vaillamment trottiné durant
des lieues pour amener sur cette tombe ce vieil Œdipe du désert. Ils
avaient même apporté avec eux les provisions de la journée, la
poignée de dattes légendaire et l’obligatoire couscouss dans une
vieille casserole d’étain recouverte d’une large feuille de figuier.
Accroupie devant un petit feu de branches sèches, l’Antigone arabe
en surveillait la cuisson.
Arrêtés devant le groupe, nous l’admirions en silence, épiés par
l’œil perçant de la petite fille qui se levait enfin et, tout à coup
apprivoisée, s’approchait de nous et nous demandait des sous. Tout
à coup des ululements et des plaintes aiguës, tout un ensemble de
voix lointaines et de rumeurs confuses nous faisaient tourner la tête
dans la direction de la ville. Toutes les formes indigènes affaissées
sur les tombes s’étaient du même coup redressées sous le burnous
ou le haïck, et toutes avec nous regardaient serpenter et descendre
en dehors de Tlemcen, dans le chemin en lacet des remparts, un
long défilé de gandouras, de cabans et de robes traînant sur leurs
pas une sourde mélopée de tristesse et de deuil. Le gémissant
cortège sortait d’une des portes ruinées de la ville, zigzaguait un
moment sur le mamelon raviné qui l’isole en îlot au-dessus de la
plaine, et, tel un long serpent déployant ses anneaux, se répandait
maintenant dans la campagne.
« Un enterrement arabe, » chuchotait à notre oreille notre cocher-
guide. Nous avions cette chance unique d’assister à une des plus
belles cérémonies de la religion musulmane dans ce farouche et
merveilleux décor. Le cortège entrait déjà dans le cimetière et, tandis
que sa file ininterrompue continuait de couler hors de l’enceinte de
Tlemcen et de descendre la colline avec des glapissements et des
notes de plain-chant barbare, les porteurs de civières s’engageaient
déjà dans la grande allée des cyprès, et les trois morts, apparus
étendus, à visage à peine couvert, sur les épaules de quatre des
leurs, se profilaient avec leurs pieds rigides sous la légère étoffe qui
leur sert de linceul. Pas de cercueil : roulé dans une sparterie, le
mort arabe rentre dans le néant comme il entre au bain maure, à
peine enveloppé d’un voile, et ce peu de souci du cadavre dit assez
avec quelle passive indifférence, quel fatalisme calme les croyants
de l’Islam envisagent la mort.
Le cortège avait fait halte. Trois à quatre cents indigènes, sans
compter ceux trouvés à notre arrivée, peuplaient maintenant ce
mélancolique et doux cimetière aux iris. Debout en cercle autour des
trois fosses, ils se tenaient tous immobiles, le front incliné et grave,
l’œil impassible et la pensée comme demeurée ailleurs. Ils
marmottaient, les deux bras étendus en avant, les mains grandes
ouvertes, de sourdes paroles qui sont chez eux les prières des
morts. Les Arabes en méditation auprès des sépultures, et qui
s’étaient levés à l’entrée du cortège, avaient repris leur posture
accroupie et répliquaient à ces prières par des balbutiements, tels
des répons d’enfant de chœur.
Et dans cette foule d’amis et de parents des morts, rien que des
hommes, pas une femme. Mahomet, bien oriental, la bannit de toute
cérémonie religieuse comme de la cour de ses mosquées, la
confinant au logis pour prier, aimer et pleurer.
La cérémonie touchait à sa fin, les burnous et les gandouras se
touchaient maintenant la barbe et les yeux du bout de leurs doigts
fins en signe d’humilité et de deuil ; un immense ululement, comme
d’hyènes surprises, s’élevait parmi les tombes : on venait de glisser
le mort en terre. La civière s’incline au bord de la fosse et le cadavre,
mis lentement en mouvement, y descend, la face tournée du côté de
l’aurore, vêtu de son seul suaire et dérobé, suprême pudeur, aux
yeux de l’assistance par une étoffe que les parents tiennent tendue
comme un voile au-dessus de cet enfouissement. On pose sur ce
corps de la terre et des pierres, et dans cette foule, jusqu’alors si
grave et si recueillie, ce sont tout à coup des cris, des disputes et
des gestes de forcenés autour d’une distribution d’argent, faite à
raison d’un sou par invité. Des querelles éclatent, des corps à corps
s’engagent. Dans le feu de la lutte, des Arabes roulent par terre,
toute l’animalité de ce peuple enfantin et rapace reparaît déchaînée
en des menaces et des voies de faits et, dans la bousculade, nous
avons ce triste spectacle du pauvre vieil Œdipe du désert culbuté sur
la tombe de sa femme et s’agitant désespéré, la plante des pieds en
l’air, avec des cris de vieux chacal qu’on égorge, comique, aveugle
et lamentable, tandis que sa petite Antigone, tout au lucre, gambade
et sautille autour d’un distribueur de sous et réclame deux fois son
dû avec des gestes impérieux de sorcière.
Et nous avons quitté le champ des iris.
SIDI-BEL-ABBÈS
Que sommes-nous venus faire dans ce poste du sud oranais, et

par quelle malencontreuse idée les guides consultés, depuis le
Joanne jusqu’au Bœdeker, mentionnent-ils dans les curiosités à voir
ces quatre grandes casernes entourées de remparts avec, autour
d’elles, quatre grandes rues de banlieue, poussiéreuses et tristes,
aboutissant à quatre portes béantes sur la rase campagne, une
campagne pelée, tout en pierrailles et en touffes d’alfa, qu’essaie en
vain de dissimuler aux regards une grande allée circulaire de
platanes.
Ils longent, en effet, les fortifications de la petite ville, et tournent
tout autour, défeuillés et tristes, tristes et défeuillés sur un frileux ciel
pâle, et mettent sous les lunes et les demi-lunes de Sidi-bel-Abbès
la tristesse provinciale et l’incurable ennui d’un cours de sous-
préfecture.
Et c’est sous ces platanes que nous promenons notre
dépaysement en attendant le départ de la diligence fixé à huit
heures ; cela nous fait sept heures d’attente, car nous sortons à
peine de table et, chassés de la ville par la navrante banalité des
quatre rues européennes, tout en bureaux de tabac et en
estaminets, à l’instar de Paris (quelque chose comme un quartier de
Courbevoie ou de Puteaux transporté dans la morne aridité du Sud),
nous avons encore préféré, de guerre lasse, venir rôder en dehors
de la ville, dans ces allées, où du moins des uniformes français,
zouaves et légionnaires en petite tenue, manœuvrent, l’arme au
bras, et par le flanc droit et par le flanc gauche arpentent le terrain et
pivotent aux commandements des moniteurs.
Plus loin, dans un bouquet d’eucalyptus, l’école des clairons
s’époumonne : au-dessus des remparts aux talus gazonnés se
dressent de longs toits ardoisés de casernes, celle des spahis et
celle des turcos pour la soldatesque indigène, celles des zouaves et
de la légion étrangère pour l’élément européen.
Sidi-bel-Abbès, poste avancé fondé par le général Bedeau en
1843 pour tenir en respect les Béni-Amer, tribu très dangereuse, très
remuante et toujours menaçante du sud oranais.
Les Béni-Amer sont loin ; nos pointes dans le sud, étendant
chaque jour une lente mais sûre conquête, atteignent aujourd’hui les
frontières du Maroc.
Et ce sont ces jeunes recrues emblousées de toile bise sur leurs
grègues bouffantes, ces petits légionnaires imberbes et roses de la
Suisse ou de la Norvège, dont la vaillantise et l’effort continus
agrandissent chaque jour cette unique et merveilleuse colonie
d’Algérie, au climat enveloppant de caresse et de torpeur, telle une
maîtresse savante et dangereuse.
Mais, morbleu ! ce n’est pas ici qu’on voudrait couler ni finir ses
jours ; ici, c’est bien l’exil dans ce qu’il a de plus douloureux et de
plus morne, l’engourdissement d’une affreuse petite ville du Midi
d’une laideur de banlieue, aggravée de la sécheresse de cette
province d’Oran, si espagnole d’aspect.
Oh ! Sidi-bel-Abbès et son vilain petit Grand Café des Officiers, à
la devanture écaillée de chaleur, aux tables de fer comme lépreuses
de rouille, où nous feuilletons, de mâle rage et de désespoir,
d’anciens numéros de la Vie Parisienne.
Mais qu’est-ce que cette animation subite ? Voilà que les rues,
tout à l’heure désertes, s’emplissent et s’éclairent d’uniformes ; un
grouillement d’indigènes insoupçonnés jusque-là s’agite et bruit à
des encoignures de ruelles et de placettes ; des cafés maures
s’allument, bondés de vivantes guenilles, colons kabyles et nomades
des plaines, avec, çà et là, des vestes bleues de turcos ; des trôlées
de zouaves et de légionnaires, traversant à grandes enjambées la
place, nous donnent le mot de l’énigme.
Ces sonneries de clairons, dont Sidi-bel-Abbès retentit depuis
près d’une demi-heure, et que nous n’avions même pas
remarquées, viennent de sonner la soupe ; et c’est l’heure où tout ce
qui est permissionnaire de huit heures ou de la nuit sort, en rajustant
son ceinturon, de la cour des casernes.
Dans le quartier arabe, tout à coup découvert derrière la place de
l’église, montent d’infâmes odeurs de musc et de fritures ; les
estaminets de France empoisonnent l’absinthe, les cafés maures,
encombrés de grands fantômes en burnous et de spahis accroupis,
embaument, eux, les aromates et le kaoua ; de hautaines silhouettes
de spahis vont et viennent par groupes, drapées de grands
manteaux rouges, leur fier profil enlinceulé de blanc, et les éperons
de leurs bottes luisent dans l’ombre avec les points de feu des
cigarettes. Des sons de derboukas glapissent, et je ne sais quelles
exhalaisons d’épices et de laine flottent dans l’air, une senteur à la
fois écœurante et exquise de charogne et de fleurs violentes, cette
espèce de pourriture d’encens, qui est le parfum même de l’Algérie
et de tous les pays de l’Islam.
DILIGENCES D’AFRIQUE
Poussiéreuses, démantibulées, sonnant la ferraille et

brinqueballant sur des roues écaillées avec un roulis de balancelle,
empestant l’oignon cru, l’ail, la laine humide, la sueur humaine et le
poulailler, antédiluviennes, enfin, et comme échappées d’un roman
de Balzac, que le Dieu des chrétiens et l’Allah musulman vous
gardent à jamais des diligences en Alger !
Oh ! leurs caisses inévitablement peintes en jaune, jaune mimosa
rechampi de rouge vif, leurs coussins de velours d’Utrecht rongés
par la poussière, la lune, le soleil, leurs vasistas inébranlables, leurs
banquettes de cuir affaissées, encrassées, gommées de toutes les
taches, et leurs relents de cuisine espagnole et de suint arabe (tant
de voyageurs d’hiver et d’été, touristes et colons, indigènes et
conscrits, s’y sont entassés), et le mystère inquiétant de leurs
bâches pointant haut vers le ciel, gonflées de bottes d’alfa, de sacs
de pommes de terre, de pois chiches, de couffins de dattes et de
paniers d’oranges avec, dans l’ombre de leurs toiles, quatre têtes
d’indigènes haut juchés là en l’air, apparaissant imperturbables et
calmes, telles des têtes coupées.
Elles s’en vont le long des routes interminables, entre les plaines
en pierrailles, hérissées de cactus, et les cultures d’alfa où poussent,
çà et là, palmiers nains et lentisques, dans un bruit de sonnaille et
de grelots vainqueurs, oh ! combien démenti par l’allure harassée de
trois pauvres haridelles qu’il faut à tous les relais étriller, ranimer.
Elles vont, les tristes diligences d’Afrique, elles roulent, comme
secouées de sanglots convulsifs, vers l’éternel recul de hautes
montagnes bleues, toujours fuyantes et toujours immobiles dans le
mirage des horizons. Ce sont les hauts plateaux, la chaîne de l’Atlas
ou bien les monts de Kabylie ! Qu’importe. Hallucinantes et
spectrales, leurs cimes coiffées de neige se dressent comme toutes
proches dans l’or vert des couchants et le rose des aurores entre
leurs versants ; des ondulations mauves, qui sont ici la mer et plus
loin des montagnes, promettent au voyageur des rades ensoleillées
avec des bateaux en partance ou de fraîches oasis ombragées de
palmiers ; bernique ! Ce sont là les jeux ordinaires de l’atmosphère
de rêve et de clarté des ciels de ces pays. Montagnes, oasis et
rades bleues sont loin, et les traînardes diligences d’Afrique
continuent de rouler sur l’aveuglant ruban des poussiéreuses routes,
lamentables et comiques sous leurs bâches énormes toujours prêtes
à sombrer, lamentables surtout par les claquements de fouet et les
jurons grondants de leur cocher botté, moustachu et crotté, l’air d’un
Tartarin maltais retour d’Alger, comiques par les noms triomphants
dont se parent leurs antiques caisses fendillées… car, devinez
comment s’appellent ces diligences ? le Vengeur, le Jean-Bart,
Jeanne-d’Arc, le Surcouf, toutes les gloires et tous les héroïsmes, et
jusqu’au Courrier de Lyon, titre au moins équivoque dans la menace
du soir, au tournant étranglé de quelque ravin sombre envahi de
ficus et de palmiers énormes avec, çà et là, dans l’interstice des
roches, des silhouettes d’indigènes, nomades sans chameaux et
bergers sans moutons, vraiment par trop singulièrement embusqués.
Et elles vont toujours, et sous le soleil qui brûle, dans l’azur
étouffant des longues journées d’été et sous le clair de lune, qui
peuple de fantômes la brousse et la clairière et change chaque
Arabe en spectre encapuchonné. Elles vont sous les pluies d’hiver,
torrentielles et tièdes, qui nettoient une fois, tous les six mois, leurs
vitres, et sous le siroco, qui, lui, se charge de les brouiller de craie et
leur tisse, en soufflant, des stores improvisés. Elles vont donc
bondées de Kabyles marchands de poules, de cheiks en bottes de
cuir rouge brodé, d’Espagnoles équivoques aux pommettes trop
roses, de conscrits tondus ras avec des yeux encore pleins du ciel
de la France, de petits turcos rageurs à profil court de fauve, de
colons suants et basanés, de mauresques crasseuses aux poignets
lourds d’anneaux et de grands Mahonais, les pieds nus dans des
espadrilles, l’air d’échappés du bagne avec leur regard noir et leurs
joues mal rasées. Elles vont, râlent, cahotent, semblent à l’agonie et
arrivent parfois, invraisemblables et touchants véhicules, demi-
corricolos des villes d’Italie, demi-berlines de l’émigré.
MOSTAGANEM
LA ROUTE
Pour Gervais Courtellemont, qui voulut me

faire faire quinze heures de diligence d’Afrique !
Six heures de diligence, de diligence d’Afrique, secoués comme

des paniers de noix sous la bâche de l’impériale où s’engouffre,
depuis trois heures, à la fois sable, flamme et poussière, un terrible
siroco ; mais nous nous estimons encore heureux de ce voyage à
travers les airs, en songeant au sort des Européens emprisonnés
dans la puanteur étouffante de l’intérieur. Il y a bien, près de nous,
affalé au travers de sacs de pommes de terre, un marchand
indigène dont les loques et les jambes poilues voisinent, à chaque
cahot, un peu trop près de nos épaules ; mais nous avons calé nos
têtes sur des tartans pliés en quatre, mis nos foulards sur nos
oreilles, et, garantis tant bien que mal des trop inquiétants contacts,
nous roulons et nous tanguons (c’est le mot), sur notre banquette
d’impériale, les yeux à demi-clos, le cœur un peu vague, tombés
dans une espèce d’engourdissement d’homme ivre, qui tient à la fois
de l’influenza et du mal de mer.
A travers le grillage de nos cils baissés, des brousses et des
plaines d’alfas, d’un gris monotone de plantes pétrifiées, filent
interminablement, lamentables dans le poudroiement d’un ciel
presque blanc. Notre peau brûle et des grains de sable craquent
sous nos dents, avec, de temps à autre, un grand souffle de feu sur
nos lèvres sèches : c’est le siroco, et, le long de la route poudreuse,
s’élance et se dresse ici la hampe frêle et feuillagée de vert d’un
aloès en pleine floraison, les lames bleuâtres de sa touffe déjà
fibreuses et flétries, et plus loin s’échelonnent encore d’autres
agaves tués et séchés par l’éclosion de leur fleur.
Et Mostaganem qui n’apparaît pas encore ! Mostaganem que
depuis déjà deux heures notre cocher s’obstine à nous montrer du
doigt, au revers, il est vrai, d’une colline en falaise, dont nous ne
pouvons voir que le premier versant. Oh ! ce cocher et ses relents de
vieille laine et de crasse à chacun de ses mouvements sur son
siège, ses perpétuelles haltes à tous les bouchons espagnols, ses
pourparlers avec la cabaretière en châle rose et les colons à face de
bandits, inévitablement attablés là sous les poivriers d’une primitive
tonnelle, et les mortelles minutes dévorées à attendre que cocher,
cabaretière et terrassiers louches aient fini leurs colloques et vidé
leurs verres. Si jamais on nous y reprend à croquer le marmot, la
poussière et les lieues sous la bâche en cerceaux d’une diligence
d’Afrique !
Cependant l’air fraîchit. Une brise, comme venue du large,
baigne nos tempes martelées par la fièvre, et voilà qu’un grand
lambeau d’azur, mais d’un azur qui moutonne comme une baie de
l’Océan, apparaît dans l’échancrure de deux montagnes : c’est la
mer. La colline en falaise qui cache Mostaganem s’est soudain
abaissée et voici que nos rosses, que vient de ranimer ce
changement de la température, hennissent et descendent
maintenant au grand trot la rampe d’un chemin tout bordé de nopals,
au flanc d’un inattendu repli de terrain.
Après ces mornes lieues de plaines ensoleillées et grises, nous
filons dans le creux d’un vallon converti en culture : bosquets
d’orangers au feuillage d’un vert dur, quinconces de citronniers aux
frondaisons plus pâles, plantations de bananiers aux longues et
souples feuilles déchirées par le vent, et chargés de régimes, carrés
de choux de France et de petits pois à rames avec, au pied, des
arbustes d’Afrique, des champs de violettes et d’entêtants narcisses
criblés d’une jonchée de jaunes fruits tombés : tout un Éden de
gourmandises et de parfums… et voilà que la colline en falaise, qui
s’était abaissée, se relève. Nous roulons maintenant au fond du
vallon, et dans les fissures du ciel blanc, comme craquelé de
chaleur, des morceaux bleus font trou. La mer, elle, est devenue
verte, du vert glauque strié d’écume des baies normandes et
bretonnes, la mer des nostalgiques horizons de nos années
d’enfance.
Dans tes algues vertes,

Mer, apporte-moi
Des plages désertes
Du bois pour mon toit,
De la poudre sèche,
Un fusil damasquiné,
Des filets de pêche,
Avec un ruban pour mon nouveau-né.
Et tandis que cette chanson de la côte nous hante au point de

l’avoir sur les lèvres, nous montons au pas la colline en falaise au
sommet de laquelle nous apercevrons enfin Mostaganem, la
Mostaganem française bâtie en face de la mer et dominant de ses
casernes tout son faubourg de villas d’officiers retraités, enfouies
sous d’éclatantes floraisons de bougainvillias et de faux ébéniers.
LA VILLE
Non, nous n’en raffolons pas de cette petite ville essentiellement

française avec sa place entourée d’arcades, les éternelles arcades
que nous retrouverons désormais partout en Algérie, sur la place de
Blidah comme dans les rues Bab-Azoum et Bab-el-Oued d’Alger,
son jardin public aux bancs fleuris d’uniformes et de bonnets de
nourrices, son va-et-vient d’officiers bottés et éperonnés à travers
ses rues de sous-préfecture morne, et son théâtre municipal, où il y
a, ce soir, bal des Femmes de France, et demain, représentation de
gala de Coquelin cadet et de Jean Coquelin. Oh ! tournées
artistiques !… C’est à se croire à Brive-la-Gaillarde, et, sans les
boutiques des marchands Mozabites installés à côté de l’hôtel et
débitant là, avec des gestes lents, presque dédaigneux, et des
petites voix caressantes, des babouches et du haïck au mètre pour
voiles de femmes et gandouras d’intérieur, on se croirait
véritablement en France, et dans la France du centre, dont ce pays
d’aloès et de palmiers a justement aujourd’hui le ciel pommelé et
doux.
Ils sont d’ailleurs si peu africains de silhouette et d’allure, ces
Mozabites trapus et gras aux mollets énormes et aux larges faces
éternellement souriantes. Avec leur instinct mercantile, leur
prodigieuse entente du commerce et leur parler gazouillant, ils sont
vraiment d’une autre race que les Arabes qui, dans leur misère
hautaine, les détestent et les méprisent un peu de la même haine et
du même mépris dont nous enveloppons, nous autres Parisiens, les
juifs.
« Les Mozabites, les juifs de l’Algérie », me disait à tort, hier, en
parlant d’eux, un officier de Tlemcen. Les juifs de l’Algérie ! comme
si ce malheureux pays n’avait pas assez des siens, des juifs
incrustés dans son territoire comme la vermine dans la peau, et
suçant sa richesse et sa fertilité par tous ses pores. Les juifs de
l’Algérie ! ces bons gros Mozabites industrieux et travailleurs aux
grands yeux éclairés d’une bonté d’hommes gras ! dites plutôt « les
Auvergnats de l’Algérie » ; et ce sont, en effet, des Auvergnats. Ils en
ont la ténacité et l’adresse, les dons d’économie qu’ignore
totalement l’Arabe vivant au jour le jour, paresseux et joueur. Et, en
effet, ce sont bien des silhouettes de fouchtras qu’ils promènent
dans leurs boutiques d’épiceries et d’étoffes, en allant et venant,
jambes nues, leur espèce de dalmatique pareille à des tapis leur
battant au ras des genoux.
Auprès de la mer, c’est une file de villas bien plus françaises que
mauresques, en dépit et des terrasses et des murailles blanchies à
la chaux ; petites maisons d’officiers en retraite, pris, eux aussi, au
charme de ce climat de caresses, et retirés là avec les leurs au fond
de fausses mosquées percées de bay-Windows et ornées de
persiennes vertes, dans l’ombre criblée d’or et de pourpre violette de
petits jardins plantés d’orangers et de bougainvillias.

An Introduction To Parallel Programming Second Edition Peter S Pacheco Full Chapter PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Introduction To Parallel Programming Second Edition Peter S Pacheco Full Chapter PDF

Uploaded by

Copyright:

Available Formats

An Introduction to Parallel

Programming. Second Edition Peter S.

An Introduction to Parallel Programming 2. Edition

Fishes: An Introduction to Ichthyology Peter B. Moyle

Parallel programming: concepts and practice González-

An Introduction to Programming through C++ Abhiram G.

The Tangled Bank: An Introduction to Evolution Second

The Stacked Deck: An Introduction to Social Inequality

Introduction to global studies Second Edition John

An Introduction to Redox Polymers for Energy-Storage

Chapter 1: Why parallel computing

1.1. Why we need ever-increasing performance

1.2. Why we're building parallel systems

1.3. Why we need to write parallel programs

1.4. How do we write parallel programs?

1.5. What we'll be doing

1.6. Concurrent, parallel, distributed

1.7. The rest of the book

1.9. Typographical conventions

Chapter 2: Parallel hardware and parallel software

2.1. Some background

2.2. Modifications to the von Neumann model

2.3. Parallel hardware

2.4. Parallel software

2.5. Input and output

2.7. Parallel program design

2.8. Writing and running parallel programs

Chapter 3: Distributed memory programming with MPI

3.2. The trapezoidal rule in MPI

3.3. Dealing with I/O

3.4. Collective communication

3.5. MPI-derived datatypes

3.6. Performance evaluation of MPI programs

3.7. A parallel sorting algorithm

3.10. Programming assignments

Chapter 4: Shared-memory programming with Pthreads

4.1. Processes, threads, and Pthreads

4.2. Hello, world

4.3. Matrix-vector multiplication

4.4. Critical sections

4.7. Producer–consumer synchronization and semaphores

4.9. Read-write locks

4.10. Caches, cache-coherence, and false sharing

4.14. Programming assignments

Chapter 5: Shared-memory programming with OpenMP

5.1. Getting started

5.2. The trapezoidal rule

5.3. Scope of variables

5.4. The reduction clause

5.5. The parallel for directive

5.6. More about loops in OpenMP: sorting

5.7. Scheduling loops

5.8. Producers and consumers

5.9. Caches, cache coherence, and false sharing

5.14. Programming assignments

Chapter 6: GPU programming with CUDA

6.1. GPUs and GPGPU

6.2. GPU architectures

6.3. Heterogeneous computing

6.4. CUDA hello

6.5. A closer look

6.6. Threads, blocks, and grids

6.7. Nvidia compute capabilities and device architectures