Professional Documents
Culture Documents
PDF A Primer On Memory Consistency and Cache Coherence Second Edition Synthesis Lectures On Computer Architecture Vijay Nagarajan Ebook Full Chapter
PDF A Primer On Memory Consistency and Cache Coherence Second Edition Synthesis Lectures On Computer Architecture Vijay Nagarajan Ebook Full Chapter
https://textbookfull.com/product/industrial-electronic-circuits-
laboratory-manual-synthesis-lectures-on-electrical-
engineering-1st-edition-asadi/
https://textbookfull.com/product/lectures-on-von-neumann-
algebras-second-edition-serban-valentin-stratila/
https://textbookfull.com/product/a-primer-on-microeconomics-
volume-i-fundamentals-of-exchange-second-edition-thomas-m-
beveridge/
https://textbookfull.com/product/lectures-on-geometry-woodhouse/
Essentials of computer architecture Second Edition
Comer
https://textbookfull.com/product/essentials-of-computer-
architecture-second-edition-comer/
https://textbookfull.com/product/lectures-on-convex-optimization-
yurii-nesterov/
https://textbookfull.com/product/lectures-on-astrophysics-1st-
edition-steven-weinberg/
https://textbookfull.com/product/lectures-on-imagination-1st-
edition-paul-ricoeur/
https://textbookfull.com/product/lectures-on-logarithmic-
algebraic-geometry-arthur-ogus/
Series ISSN: 1935-3235
NAGARAJAN • ET AL
Synthesis Lectures on
Computer Architecture
Many modern computer systems, including homogeneous and heterogeneous architectures, support
shared memory in hardware. In a shared memory system, each of the processor cores may read and
write to a single shared address space. For a shared memory machine, the memory consistency model
defines the architecturally visible behavior of its memory system. Consistency definitions provide
rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of
supporting a memory consistency model, many machines also provide cache coherence protocols that
ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide
readers with a basic understanding of consistency and coherence. This understanding includes both the
issues that must be solved as well as a variety of solutions. We present both high-level concepts as well
as specific, concrete examples from real-world systems.
This second edition reflects a decade of advancements since the first edition and includes,
among other more modest changes, two new chapters: one on consistency and coherence for non-
CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on consistency
and coherence.
About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
Synthesis Lectures on
Computer Architecture
store.morganclaypool.com
Natalie Enright Jerger & Margaret Martonosi, Series Editors
A Primer on
Memory Consistency
and Cache Coherence
Second Edition
Synthesis Lectures on
Computer Architecture
Editors
Natalie Enright Jerger, University of Toronto
Margaret Martonosi, Princeton University
Founding Editor Emeritus
Mark D. Hill, University of Wisconsin, Madison
Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics
pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware
components to create computers that meet functional, performance and cost goals. The scope will
largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA,
MICRO, and ASPLOS.
Analyzing Analytics
Rajesh Bordawekar, Bob Blainey, and Ruchir Puri
2015
Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao
2015
v
Die-stacking Architecture
Yuan Xie and Jishen Zhao
2015
Shared-Memory Synchronization
Michael L. Scott
2013
Performance Analysis and Tuning for General Purpose Graphics Processing Units
(GPGPU)
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu
2012
On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009
The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
Bruce Jacob
2009
Transactional Memory
James R. Larus and Ravi Rajwar
2006
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
DOI 10.2200/S00962ED2V01Y201910CAC049
Lecture #49
Series Editors: Natalie Enright Jerger, University of Toronto
Margaret Martonosi, Princeton University
Founding Editor Emeritus: Mark D. Hill, University of Wisconsin, Madison
Series ISSN
Print 1935-3235 Electronic 1935-3243
A Primer on
Memory Consistency
and Cache Coherence
Second Edition
Vijay Nagarajan
University of Edinburgh
Daniel J. Sorin
Duke University
Mark D. Hill
University of Wisconsin, Madison
David A. Wood
University of Wisconsin, Madison
M
&C Morgan & cLaypool publishers
ABSTRACT
Many modern computer systems, including homogeneous and heterogeneous architectures,
support shared memory in hardware. In a shared memory system, each of the processor cores
may read and write to a single shared address space. For a shared memory machine, the memory
consistency model defines the architecturally visible behavior of its memory system. Consistency
definitions provide rules about loads and stores (or memory reads and writes) and how they act
upon memory. As part of supporting a memory consistency model, many machines also provide
cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date.
The goal of this primer is to provide readers with a basic understanding of consistency and co-
herence. This understanding includes both the issues that must be solved as well as a variety
of solutions. We present both high-level concepts as well as specific, concrete examples from
real-world systems.
This second edition reflects a decade of advancements since the first edition and includes,
among other more modest changes, two new chapters: one on consistency and coherence for
non-CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on
consistency and coherence.
KEYWORDS
computer architecture, memory consistency, cache coherence, shared memory,
memory systems, multicore processor, heterogeneous architecture, GPU, accelera-
tors, semantics, verification
xi
Contents
Preface to the Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
2 Coherence Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Baseline System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The Problem: How Incoherence Could Possibly Occur . . . . . . . . . . . . . . . . . . 10
2.3 The Cache Coherence Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 (Consistency-Agnostic) Coherence Invariants . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Maintaining the Coherence Invariants . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 The Granularity of Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.3 When is Coherence Relevant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1 The Big Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 Specifying Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3 Example of a Simple Coherence Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Overview of Coherence Protocol Design Space . . . . . . . . . . . . . . . . . . . . . . . . 96
6.4.1 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4.2 Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4.3 Major Protocol Design Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
CHAPTER 1
Introduction to Consistency
and Coherence
Many modern computer systems and most multicore chips (chip multiprocessors) support
shared memory in hardware. In a shared memory system, each of the processor cores may read
and write to a single shared address space. These designs seek various goodness properties, such
as high performance, low power, and low cost. Of course, it is not valuable to provide these
goodness properties without first providing correctness. Correct shared memory seems intuitive
at a hand-wave level, but, as this lecture will help show, there are subtle issues in even defining
what it means for a shared memory system to be correct, as well as many subtle corner cases in
designing a correct shared memory implementation. Moreover, these subtleties must be mas-
tered in hardware implementations where bug fixes are expensive. Even academics should master
these subtleties to make it more likely that their proposed designs will work.
Designing and evaluating a correct shared memory system requires an architect to under-
stand memory consistency and cache coherence, the two topics of this primer. Memory consistency
(consistency, memory consistency model, or memory model) is a precise, architecturally visible
definition of shared memory correctness. Consistency definitions provide rules about loads and
stores (or memory reads and writes) and how they act upon memory. Ideally, consistency def-
initions would be simple and easy to understand. However, defining what it means for shared
memory to behave correctly is more subtle than defining the correct behavior of, for example,
a single-threaded processor core. The correctness criterion for a single processor core partitions
behavior between one correct result and many incorrect alternatives. This is because the pro-
cessor’s architecture mandates that the execution of a thread transforms a given input state into
a single well-defined output state, even on an out-of-order core. Shared memory consistency
models, however, concern the loads and stores of multiple threads and usually allow many cor-
rect executions while disallowing many (more) incorrect ones. The possibility of multiple cor-
rect executions is due to the ISA allowing multiple threads to execute concurrently, often with
many possible legal interleavings of instructions from different threads. The multitude of correct
executions complicates the erstwhile simple challenge of determining whether an execution is
correct. Nevertheless, consistency must be mastered to implement shared memory and, in some
cases, to write correct programs that use it.
The microarchitecture—the hardware design of the processor cores and the shared mem-
ory system—must enforce the desired consistency model. As part of this consistency model
2 1. INTRODUCTION TO CONSISTENCY AND COHERENCE
support, the hardware provides cache coherence (or coherence). In a shared-memory system
with caches, the cached values can potentially become out-of-date (or incoherent) when one of
the processors updates its cached value. Coherence seeks to make the caches of a shared-memory
system as functionally invisible as the caches in a single-core system; it does so by propagating a
processor’s write to other processors’ caches. It is worth stressing that unlike consistency which
is an architectural specification that defines shared memory correctness, coherence is a means to
supporting a consistency model.
Even though consistency is the first major topic of this primer, we begin in Chapter 2
with a brief introduction to coherence because coherence protocols play an important role in
providing consistency. The goal of Chapter 2 is to explain enough about coherence to understand
how consistency models interact with coherent caches, but not to explore specific coherence
protocols or implementations, which are topics we defer until the second portion of this primer
in Chapters 6–9.
Fig. 117.—Two fragments from the Balawat gates. British Museum. Drawn by
Saint-Elme Gautier.
We should have been willing, had it been possible, to make
further extracts from this curious series of reliefs; to have shown,
here naked prisoners defiling under the eyes of the conqueror, there
Assyrian archers shooting at the heaped-up heads of their slain
enemies. But we have perforce been content with giving, by a few
carefully chosen examples, a fair idea of the work that intervened
between the sculptors of Assurnazirpal and those of the Sargonids.
It is probable that the scheme of this vast composition was due to
a single mind; from one end to the other there is an obvious similarity
of thought and style. But several different hands must have been
employed upon its execution, which is far from being of equal merit
throughout. It is on examining the original that we are struck by these
inequalities. Thus, in some of the long rows of captives the handling
is timid and without meaning, while in others it has all the firmness
and decision of the best among the alabaster or limestone reliefs;
the muscular forms, the action of the calf and knee, are well
understood and frankly reproduced. The passages we have chosen
for illustration are among the best in this respect. Taking them all in
all these bronze reliefs are among the works that do most honour to
Assyrian art.
The only monument that has come down to us from the reign of
Vulush III., the successor of Samas-vul, is a statue, or rather a pair
of statues, of Nebo; the better of the two is reproduced in Fig. 15 of
our first volume. These sacred images are of very slight merit from
an art point of view; we should hardly have referred to them but for
their votive inscriptions. From these we learn that they were
consecrated in the Temple of Nebo by the prefect of Calah in order
to bespeak the protection of that god for the king. But the latter is not
named alone; the faithful subject says that he offers these idols “for
his master Vulush and his mistress Sammouramit.”
In this latter name it is difficult not to recognize the Semiramis of
the Greeks, and we are led to ask ourselves whether the queen of
Vulush may not have afforded a prototype for that legendary
princess. This association of a female name with that of the king is
almost without parallel either in Chaldæa or Assyria. In royal
documents, as well as in those of a more private character, there is
no more mention of the royal wives than if they did not exist. Only
one explanation can be given of the apparent anomaly, and that is
that Sammouramit, for reasons that may be easily guessed, enjoyed
a quite exceptional position. It was in those days that, from one reign
to another, the princes of Calah attempted to complete the
subjugation of Chaldæa. It may have happened that in order to put
an end to a state of never-ending rebellion, Vulush married the
heiress of some powerful and popular family of the lower country,
and, that he might be looked upon as the legitimate ruler of Babylon,
joined her name with his in the royal style and title. This hypothesis
finds some confirmation in what Herodotus tells us about Semiramis.
She was, he says, queen of Babylon five generations before Nitocris,
which would be about a century and a half. He adds that she caused
the quays of the Euphrates to be built.[242] This takes us back to
rather beyond the middle of the eighth century b.c., that is very near
to the date which Assyrian chronology would fix for the reign of
Vulush (810–781). As the last representative of the old national
dynasty, this Semiramis, associated as she was in the exercise, or at
least in the show, of sovereign power both in Assyria and Chaldæa,
would not be forgotten by her countrymen, and the population of
Babylon would be especially likely to magnify the part she had
played. There is nothing fabulous in the tradition as Herodotus gives
it, although it may, perhaps, go beyond the truth here and there.
Ctesias, however, goes much farther. He brings together and
amplifies tales which had already received many additions in the half
century that separated him from Herodotus, and he thus creates the
type of that Semiramis, the wife of Ninus and the conqueror of all
Asia, who so long held an undeserved place in ancient history.[243]
The last Calah prince who has left us anything is Tiglath-Pileser
II. (745–727). We have already described how his palace was
destroyed by Esarhaddon, who employed its materials for his own
purposes.[244] At the British Museum there are a few fragments
which have been recognized by their inscriptions as belonging to his
work (Vol. I. Fig. 26)[245]; they are quite similar to those of his
immediate predecessors.
With the new dynasty founded by Sargon at the end of the eighth
century taste changed fast enough. In those bas-reliefs in the
Khorsabad palace which represent that king’s campaigns, many
details are treated in a spirit very different from that of former days.
Trees, for instance, are no longer abstract signs standing for no one
kind of vegetation more than another; the sculptor begins to notice
their distinguishing features and to give their proper physiognomy to
the different countries overrun by the Assyrians. But these landscape
backgrounds are not to be found in all the bas-reliefs of Khorsabad.
[246]
The art of Sargon was an art of transition. While on the one hand
it endeavoured to open up new ground, on the other it travelled on
the old ways and followed many of the ancient errors; it had a
marked predilection for figures larger than nature, and bas-reliefs
treating of royal pageants and processions remind us by the
simplicity of their conception of those of Assurnazirpal. We have
already given many fragments (Vol I. Figs. 22–24, and 29), and now
we give another, a vizier and a eunuch standing before the king in
the characteristic attitude of respect (Fig. 118). The inscription which
cut the figures of Assurnazirpal so awkwardly in two has
disappeared; the proportions have gained in slenderness, and the
muscular development, though still strongly marked, has lost some
of its exaggeration. All this shows progress, and yet on the whole the
Louvre relief is less happy in its effect than the best of the Nimroud
sculptures in the British Museum. The execution is neither so firm
nor so frank; the relief is much higher and the modelling a little heavy
and bulbous in consequence. This result may also be caused to
some extent by the nature of the material, which is a softer alabaster
than was employed, so far as we know, in any other part of Assyria.
At Nimroud a fine limestone was chiefly used.
We shall be contented with mentioning the stele of Sargon, found
near Larnaca, in Cyprus, in 1845. It is most important as an historical
monument; it proves that, as a sequel to his Syrian conquests, the
terror of Sargon’s name was so widespread that even the inhabitants
of the islands thought it prudent to declare themselves his vassals,
and to set up his image as a sign of homage rendered and
allegiance sworn. But the stone is now too much broken to be of any
great interest as a work of art.[247]
The artistic masterpiece of this epoch is the bronze lion figured in
our Plate XI. It had been suggested that its use was to hold down the
cords of a tent or the lower edge of tapestries, a purpose for which
the weight of the bronze and the ring fixed in its back make it well
suited. This idea had to be abandoned, however, when a whole
series of similar figures marked with the name of Sennacherib was
found. Their execution was hardly equal to that of the lion we have
figured, but their general characteristics were the same, and they
had rings on their backs.[248]
These lions are sixteen in number; they form a series in which
the size of the animal becomes steadily smaller with each example;
the largest is a foot long, the smallest hardly more than an inch. The
decrease seems to follow a certain rule, but rust has affected them
too greatly for it to be easy to base any metrological calculation upon
their weight. But all doubt as to their use is removed by the
inscriptions in cuneiform and in ancient Aramaic characters with
which several of them are engraved. The Aramaic inscriptions all
begin with the word mine; then comes a figure indicating the number
of mines, or of subdivisions of the mine that the weight represents;
finally, there is the name of some personage, who may perhaps have
been a magistrate charged with the regulation and verification of
weights.
Fig. 118.—Bas-relief from Khorsabad. Height 9 feet 5
inches. Louvre.
With the accession of Sennacherib, a sensible change comes
over the aspect of the reliefs. What until now has been the exception
becomes the rule. On almost every slab we find a complex and
carefully treated landscape background. The artist is not satisfied
with indicating the differences between conifers, cypresses, and
pines (Vol. I. Figs, 41–43), palms (ib. Figs. 30 and 34; and above,