Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

A Primer on Memory Consistency and

Cache Coherence Second Edition


Synthesis Lectures on Computer
Architecture Vijay Nagarajan
Visit to download the full and correct content document:
https://textbookfull.com/product/a-primer-on-memory-consistency-and-cache-coheren
ce-second-edition-synthesis-lectures-on-computer-architecture-vijay-nagarajan/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Industrial Electronic Circuits Laboratory Manual


(Synthesis Lectures on Electrical Engineering) 1st
Edition Asadi

https://textbookfull.com/product/industrial-electronic-circuits-
laboratory-manual-synthesis-lectures-on-electrical-
engineering-1st-edition-asadi/

Lectures on von Neumann Algebras Second Edition ■erban


Valentin Str■til■

https://textbookfull.com/product/lectures-on-von-neumann-
algebras-second-edition-serban-valentin-stratila/

A Primer on Microeconomics, Volume I: Fundamentals of


Exchange Second Edition Thomas M. Beveridge

https://textbookfull.com/product/a-primer-on-microeconomics-
volume-i-fundamentals-of-exchange-second-edition-thomas-m-
beveridge/

Lectures on geometry Woodhouse

https://textbookfull.com/product/lectures-on-geometry-woodhouse/
Essentials of computer architecture Second Edition
Comer

https://textbookfull.com/product/essentials-of-computer-
architecture-second-edition-comer/

Lectures on Convex Optimization Yurii Nesterov

https://textbookfull.com/product/lectures-on-convex-optimization-
yurii-nesterov/

Lectures on Astrophysics 1st Edition Steven Weinberg

https://textbookfull.com/product/lectures-on-astrophysics-1st-
edition-steven-weinberg/

Lectures on Imagination 1st Edition Paul Ricœur

https://textbookfull.com/product/lectures-on-imagination-1st-
edition-paul-ricoeur/

Lectures on Logarithmic Algebraic Geometry Arthur Ogus

https://textbookfull.com/product/lectures-on-logarithmic-
algebraic-geometry-arthur-ogus/
Series ISSN: 1935-3235

NAGARAJAN • ET AL
Synthesis Lectures on
Computer Architecture

Series Editor: Natalie Enright Jerger, University of Toronto


Margaret Martonosi, Princeton University

A PRIMER ON MEMORY CONSISTENCY AND CACHE COHERENCE, 2E


A Primer on Memory Consistency and Cache Coherence
Second Edition
Vijay Nagarajan, University of Edinburgh
Daniel J. Sorin, Duke University
Mark D. Hill, University of Wisconsin, Madison
David A. Wood, University of Wisconsin, Madison

Many modern computer systems, including homogeneous and heterogeneous architectures, support
shared memory in hardware. In a shared memory system, each of the processor cores may read and
write to a single shared address space. For a shared memory machine, the memory consistency model
defines the architecturally visible behavior of its memory system. Consistency definitions provide
rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of
supporting a memory consistency model, many machines also provide cache coherence protocols that
ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide
readers with a basic understanding of consistency and coherence. This understanding includes both the
issues that must be solved as well as a variety of solutions. We present both high-level concepts as well
as specific, concrete examples from real-world systems.
This second edition reflects a decade of advancements since the first edition and includes,
among other more modest changes, two new chapters: one on consistency and coherence for non-
CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on consistency
and coherence.

About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis

MORGAN & CLAYPOOL


books provide concise, original presentations of important research and
development topics, published quickly, in digital and print formats.

Synthesis Lectures on
Computer Architecture
store.morganclaypool.com
Natalie Enright Jerger & Margaret Martonosi, Series Editors
A Primer on
Memory Consistency
and Cache Coherence
Second Edition
Synthesis Lectures on
Computer Architecture
Editors
Natalie Enright Jerger, University of Toronto
Margaret Martonosi, Princeton University
Founding Editor Emeritus
Mark D. Hill, University of Wisconsin, Madison
Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics
pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware
components to create computers that meet functional, performance and cost goals. The scope will
largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA,
MICRO, and ASPLOS.

A Primer on Memory Consistency and Cache Coherence, Second Edition


Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood
2020

Innovations in the Memory System


Rajeev Balasubramonian
2019

Cache Replacement Policies


Akanksha Jain and Calvin Lin
2019

The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition


Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan
2018

Principles of Secure Processor Architecture Design


Jakub Szefer
2018

General-Purpose Graphics Processor Architectures


Tor M. Aamodt, Wilson Wai Lun Fung, and Timothy G. Rogers
2018
iv

Compiling Algorithms for Heterogenous Systems


Steven Bell, Jing Pu, James Hegarty, and Mark Horowitz
2018

Architectural and Operating System Support for Virtual Memory


Abhishek Bhattacharjee and Daniel Lustig
2017

Deep Learning for Computer Architects


Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks
2017

On-Chip Networks, Second Edition


Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh
2017

Space-Time Computing with Temporal Neural Networks


James E. Smith
2017

Hardware and Software Support for Virtualization


Edouard Bugnion, Jason Nieh, and Dan Tsafrir
2017

Datacenter Design and Management: A Computer Architect’s Perspective


Benjamin C. Lee
2016

A Primer on Compression in the Memory Hierarchy


Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood
2015

Research Infrastructures for Hardware Accelerators


Yakun Sophia Shao and David Brooks
2015

Analyzing Analytics
Rajesh Bordawekar, Bob Blainey, and Ruchir Puri
2015

Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao
2015
v
Die-stacking Architecture
Yuan Xie and Jishen Zhao
2015

Single-Instruction Multiple-Data Execution


Christopher J. Hughes
2015

Power-Efficient Computer Architectures: Recent Advances


Magnus Själander, Margaret Martonosi, and Stefanos Kaxiras
2014

FPGA-Accelerated Simulation of Computer Systems


Hari Angepat, Derek Chiou, Eric S. Chung, and James C. Hoe
2014

A Primer on Hardware Prefetching


Babak Falsafi and Thomas F. Wenisch
2014

On-Chip Photonic Interconnects: A Computer Architect’s Perspective


Christopher J. Nitta, Matthew K. Farrens, and Venkatesh Akella
2013

Optimization and Mathematical Modeling in Computer Architecture


Tony Nowatzki, Michael Ferris, Karthikeyan Sankaralingam, Cristian Estan, Nilay Vaish, and
David Wood
2013

Security Basics for Computer Architects


Ruby B. Lee
2013

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale


Machines, Second Edition
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle
2013

Shared-Memory Synchronization
Michael L. Scott
2013

Resilient Architecture Design for Voltage Variation


Vijay Janapa Reddi and Meeta Sharma Gupta
2013
vi
Multithreading Architecture
Mario Nemirovsky and Dean M. Tullsen
2013

Performance Analysis and Tuning for General Purpose Graphics Processing Units
(GPGPU)
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu
2012

Automatic Parallelization: An Overview of Fundamental Compiler Techniques


Samuel P. Midkiff
2012

Phase Change Memory: From Devices to Systems


Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran
2011

Multi-Core Cache Hierarchies


Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar
2011

A Primer on Memory Consistency and Cache Coherence


Daniel J. Sorin, Mark D. Hill, and David A. Wood
2011

Dynamic Binary Modification: Tools, Techniques, and Applications


Kim Hazelwood
2011

Quantum Computing for Computer Architects, Second Edition


Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong
2011

High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities


Dennis Abts and John Kim
2011

Processor Microarchitecture: An Implementation Perspective


Antonio González, Fernando Latorre, and Grigorios Magklis
2010

Transactional Memory, Second Edition


Tim Harris, James Larus, and Ravi Rajwar
2010
vii
Computer Architecture Performance Evaluation Methods
Lieven Eeckhout
2010

Introduction to Reconfigurable Supercomputing


Marco Lanzagorta, Stephen Bique, and Robert Rosenberg
2009

On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009

The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
Bruce Jacob
2009

Fault Tolerant Computer Architecture


Daniel J. Sorin
2009

The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale


Machines
Luiz André Barroso and Urs Hölzle
2009

Computer Architecture Techniques for Power-Efficiency


Stefanos Kaxiras and Margaret Martonosi
2008

Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency


Kunle Olukotun, Lance Hammond, and James Laudon
2007

Transactional Memory
James R. Larus and Ravi Rajwar
2006

Quantum Computing for Computer Architects


Tzvetan S. Metodi and Frederic T. Chong
2006
Copyright © 2020 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.

A Primer on Memory Consistency and Cache Coherence, Second Edition


Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood
www.morganclaypool.com

ISBN: 9781681737096 paperback


ISBN: 9781681737102 ebook
ISBN: 9781681737119 hardcover

DOI 10.2200/S00962ED2V01Y201910CAC049

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE

Lecture #49
Series Editors: Natalie Enright Jerger, University of Toronto
Margaret Martonosi, Princeton University
Founding Editor Emeritus: Mark D. Hill, University of Wisconsin, Madison
Series ISSN
Print 1935-3235 Electronic 1935-3243
A Primer on
Memory Consistency
and Cache Coherence
Second Edition

Vijay Nagarajan
University of Edinburgh

Daniel J. Sorin
Duke University

Mark D. Hill
University of Wisconsin, Madison

David A. Wood
University of Wisconsin, Madison

SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #49

M
&C Morgan & cLaypool publishers
ABSTRACT
Many modern computer systems, including homogeneous and heterogeneous architectures,
support shared memory in hardware. In a shared memory system, each of the processor cores
may read and write to a single shared address space. For a shared memory machine, the memory
consistency model defines the architecturally visible behavior of its memory system. Consistency
definitions provide rules about loads and stores (or memory reads and writes) and how they act
upon memory. As part of supporting a memory consistency model, many machines also provide
cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date.
The goal of this primer is to provide readers with a basic understanding of consistency and co-
herence. This understanding includes both the issues that must be solved as well as a variety
of solutions. We present both high-level concepts as well as specific, concrete examples from
real-world systems.
This second edition reflects a decade of advancements since the first edition and includes,
among other more modest changes, two new chapters: one on consistency and coherence for
non-CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on
consistency and coherence.

KEYWORDS
computer architecture, memory consistency, cache coherence, shared memory,
memory systems, multicore processor, heterogeneous architecture, GPU, accelera-
tors, semantics, verification
xi

Contents
Preface to the Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Preface to the First Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

1 Introduction to Consistency and Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Consistency (a.k.a., Memory Consistency, Memory Consistency Model,
or Memory Model) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Coherence (a.k.a., Cache Coherence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Consistency and Coherence for Heterogeneous Systems . . . . . . . . . . . . . . . . . . 6
1.4 Specifying and Validating Memory Consistency Models and Cache
Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 A Consistency and Coherence Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 What This Primer Does Not Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Coherence Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Baseline System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The Problem: How Incoherence Could Possibly Occur . . . . . . . . . . . . . . . . . . 10
2.3 The Cache Coherence Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 (Consistency-Agnostic) Coherence Invariants . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 Maintaining the Coherence Invariants . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.2 The Granularity of Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4.3 When is Coherence Relevant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Memory Consistency Motivation and Sequential Consistency . . . . . . . . . . . . . 17


3.1 Problems with Shared Memory Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 What is a Memory Consistency Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Consistency vs. Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Basic Idea of Sequential Consistency (SC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 A Little SC Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
xii
3.6 Naive SC Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 A Basic SC Implementation with Cache Coherence . . . . . . . . . . . . . . . . . . . . 27
3.8 Optimized SC Implementations with Cache Coherence . . . . . . . . . . . . . . . . . 29
3.9 Atomic Operations with SC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.10 Putting it All Together: MIPS R10000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.11 Further Reading Regarding SC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Total Store Order and the x86 Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . 39


4.1 Motivation for TSO/x86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Basic Idea of TSO/x86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 A Little TSO/x86 Formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Implementing TSO/x86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.1 Implementing Atomic Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.2 Implementing Fences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5 Further Reading Regarding TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.6 Comparing SC and TSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Relaxed Memory Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Opportunities to Reorder Memory Operations . . . . . . . . . . . . . . . . . . 56
5.1.2 Opportunities to Exploit Reordering . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 An Example Relaxed Consistency Model (XC) . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.1 The Basic Idea of the XC Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2.2 Examples Using Fences Under XC . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2.3 Formalizing XC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2.4 Examples Showing XC Operating Correctly . . . . . . . . . . . . . . . . . . . . 62
5.3 Implementing XC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Atomic Instructions with XC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.2 Fences With XC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.3 A Caveat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4 Sequential Consistency for Data-Race-Free Programs . . . . . . . . . . . . . . . . . . . 68
5.5 Some Relaxed Model Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5.1 Release Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5.2 Causality and Write Atomicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xiii
5.6 Relaxed Memory Model Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.6.1 RISC-V Weak Memory Order (RVWMO) . . . . . . . . . . . . . . . . . . . . 75
5.6.2 IBM Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.7 Further Reading and Commercial Relaxed Memory Models . . . . . . . . . . . . . . 81
5.7.1 Academic Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.7.2 Commercial Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.8 Comparing Memory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.8.1 How Do Relaxed Memory Models Relate to Each Other and
TSO and SC? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.8.2 How Good Are Relaxed Models? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.9 High-Level Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6 Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1 The Big Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2 Specifying Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3 Example of a Simple Coherence Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Overview of Coherence Protocol Design Space . . . . . . . . . . . . . . . . . . . . . . . . 96
6.4.1 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4.2 Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4.3 Major Protocol Design Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7 Snooping Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


7.1 Introduction to Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2 Baseline Snooping Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.1 High-Level Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.2.2 Simple Snooping System Model: Atomic Requests, Atomic
Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2.3 Baseline Snooping System Model: Non-Atomic Requests,
Atomic Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.2.4 Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
7.2.5 Protocol Simplifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3 Adding the Exclusive State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3.2 Getting to the Exclusive State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.3.3 High-Level Specification of Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 124
xiv
7.3.4 Detailed Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.3.5 Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.4 Adding the Owned State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4.2 High-Level Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.4.3 Detailed Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.4.4 Running Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.5 Non-Atomic Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.5.2 In-Order vs. Out-of-Order Responses . . . . . . . . . . . . . . . . . . . . . . . 133
7.5.3 Non-Atomic System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.5.4 An MSI Protocol with a Split-Transaction Bus . . . . . . . . . . . . . . . . . 135
7.5.5 An Optimized, Non-Stalling MSI Protocol with a
Split-Transaction Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.6 Optimizations to the Bus Interconnection Network . . . . . . . . . . . . . . . . . . . . 142
7.6.1 Separate Non-Bus Network for Data Responses . . . . . . . . . . . . . . . . 142
7.6.2 Logical Bus for Coherence Requests . . . . . . . . . . . . . . . . . . . . . . . . . 143
7.7 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.7.1 Sun Starfire E10000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
7.7.2 IBM Power5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.8 Discussion and the Future of Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

8 Directory Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151


8.1 Introduction to Directory Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
8.2 Baseline Directory System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.2.1 Directory System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.2.2 High-Level Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.2.3 Avoiding Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.2.4 Detailed Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.2.5 Protocol Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.2.6 Protocol Simplifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
8.3 Adding the Exclusive State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.3.1 High-Level Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.3.2 Detailed Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.4 Adding the Owned State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8.4.1 High-Level Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
xv
8.4.2 Detailed Protocol Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.5 Representing Directory State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.5.1 Coarse Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.5.2 Limited Pointer Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.6 Directory Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.6.1 Directory Cache Backed by DRAM . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.6.2 Inclusive Directory Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8.6.3 Null Directory Cache (With no Backing Store) . . . . . . . . . . . . . . . . 176
8.7 Performance and Scalability Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.7.1 Distributed Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
8.7.2 Non-Stalling Directory Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
8.7.3 Interconnection Networks Without Point-to-Point Ordering . . . . . 180
8.7.4 Silent vs. Non-Silent Evictions of Blocks in State S . . . . . . . . . . . . . 182
8.8 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.8.1 SGI Origin 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.8.2 Coherent HyperTransport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.8.3 Hypertransport Assist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.8.4 Intel QPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
8.9 Discussion and the Future of Directory Protocols . . . . . . . . . . . . . . . . . . . . . 189
8.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

9 Advanced Topics in Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


9.1 System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.1.1 Instruction Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.1.2 Translation Lookaside Buffers (TLBs) . . . . . . . . . . . . . . . . . . . . . . . . 192
9.1.3 Virtual Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.1.4 Write-Through Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.1.5 Coherent Direct Memory Access (DMA) . . . . . . . . . . . . . . . . . . . . . 194
9.1.6 Multi-Level Caches and Hierarchical Coherence Protocols . . . . . . . 195
9.2 Performance Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.2.1 Migratory Sharing Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.2.2 False Sharing Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.3 Maintaining Liveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.3.1 Deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.3.2 Livelock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
9.3.3 Starvation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
9.4 Token Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
xvi
9.5 The Future of Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

10 Consistency and Coherence for Heterogeneous Systems . . . . . . . . . . . . . . . . . 211


10.1 GPU Consistency and Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
10.1.1 Early GPUs: Architecture and Programming Model . . . . . . . . . . . . 212
10.1.2 Big Picture: GPGPU Consistency and Coherence . . . . . . . . . . . . . . 216
10.1.3 Temporal Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
10.1.4 Release Consistency-directed Coherence . . . . . . . . . . . . . . . . . . . . . . 226
10.2 More Heterogeneity Than Just GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
10.2.1 Heterogeneous Consistency Models . . . . . . . . . . . . . . . . . . . . . . . . . 237
10.2.2 Heterogeneous Coherence Protocols . . . . . . . . . . . . . . . . . . . . . . . . . 240
10.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
10.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

11 Specifying and Validating Memory Consistency Models and Cache


Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
11.1.1 Operational Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
11.1.2 Axiomatic Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
11.2 Exploring the Behavior of Memory Consistency Models . . . . . . . . . . . . . . . 260
11.2.1 Litmus Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
11.2.2 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
11.3 Validating Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
11.3.1 Formal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
11.3.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.4 History and Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
11.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273


xvii

Preface to the Second Edition


This second edition of the Primer differs from the nearly decade-old first edition (2011) due
to the addition of two chapters and modest other changes. New Chapter 10 addresses emerg-
ing work on non-CPU accelerators, principally general-purpose GPUs, that often implement
consistency and coherence together. New Chapter 11 points to formal work and tools on con-
sistency and coherence that have greatly advanced since the Primer’s first edition. Other changes
are more modest and include the following: Chapter 2 expands the definition of coherence to
admit the GPU-like solutions of Chapter 10; Chapters 3 and 4 discuss recent advances in SC
and TSO enforcement, respectively; and Chapter 5 adds a RISC-V case study.
The first edition of this work was supported in part by the U.S. National Science Foun-
dation (CNS-0551401, CNS0720565, CCF-0916725, CCF-0444516, and CCF-0811290),
Sandia/DOE (#MSN123960/ DOE890426), Semiconductor Research Corporation (contract
2009-HJ-1881), and the University of Wisconsin (Kellett Award to Hill). The views expressed
herein are not necessarily those of the NSF, Sandia, DOE, or SRC. The second edition is
supported by EPSRC under grant EP/M027317/1 to the University of Edinburgh, the U.S.
National Science Foundation (CNS-1815656, CCF-1617824, CCF-1734706), and a John
P. Morgridge Professorship to Hill. The views expressed herein are those of the authors only.
The authors thank Vasileios Gavrielatos, Olivier Giroux, Daniel Lustig, Kelly Shaw,
Matthew Sinclair, and the students of the Spring 2019 edition of UW-Madison Graduate Par-
allel Computer Architecture (CS/ECE 757) taught by Hill for their insightful and detailed
comments that improved this second edition, even as the authors take responsibility for its final
content.
The authors also thank Margaret Martonosi, Natalie Enright Jerger, Michael Morgan,
and their editorial team for enabling the production of this edition of the Primer.
Vijay thanks Nivi, Sattva, Swara, his parents, and in-laws for their love. Vijay also thanks
IIT Madras for supporting his sabbatical, during which time most of the new chapters were
written. In particular, he fondly reminisces the cricket nets sessions at the IIT Chemplast ground
in the peak of Chennai “winter.”
Dan thanks Deborah, Jason, and Julie for their love and for putting up with him taking
the time to work on another edition of this Primer.
Mark wishes to thank Sue, Nicole, and Gregory for their love and support.
xviii PREFACE TO THE SECOND EDITION
David thanks his coauthors for putting up with his deadline-challenged work style, his
parents Roger and Ann Wood for inspiring him to be a second-generation Computer Sciences
professor, and Jane, Alex, and Zach for helping him remember what life is all about.

Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood


January 2020
xix

Preface to the First Edition


This primer is intended for readers who have encountered memory consistency and cache co-
herence informally, but now want to understand what they entail in more detail. This audience
includes computing industry professionals as well as junior graduate students.
We expect our readers to be familiar with the basics of computer architecture. Remember-
ing the details of Tomasulo’s algorithm or similar details is unnecessary, but we do expect readers
to understand issues like architectural state, dynamic instruction scheduling (out-of-order exe-
cution), and how caches are used to reduce average latencies to access storage structures.
The primary goal of this primer is to provide readers with a basic understanding of consis-
tency and coherence. This understanding includes both the issues that must be solved as well as
a variety of solutions. We present both high-level concepts as well as specific, concrete examples
from real-world systems. A secondary goal of this primer is to make readers aware of just how
complicated consistency and coherence are. If readers simply discover what it is that they do not
know—without actually learning it—that discovery is still a substantial benefit. Furthermore,
because these topics are so vast and so complicated, it is beyond the scope of this primer to cover
them exhaustively. It is not a goal of this primer to cover all topics in depth, but rather to cover
the basics and apprise the readers of what topics they may wish to pursue in more depth.
We owe many thanks for the help and support we have received during the development
of this primer. We thank Blake Hechtman for implementing and testing (and debugging!) all
of the coherence protocols in this primer. As the reader will soon discover, coherence protocols
are complicated, and we would not have trusted any protocol that we had not tested, so Blake’s
work was tremendously valuable. Blake implemented and tested all of these protocols using the
Wisconsin GEMS simulation infrastructure [http://www.cs.wisc.edu/gems/].
For reviewing early drafts of this primer and for helpful discussions regarding various top-
ics within the primer, we gratefully thank Trey Cain and Milo Martin. For providing additional
feedback on the primer, we thank Newsha Ardalani, Arkaprava Basu, Brad Beckmann, Bob
Cypher, Singh, Venkatanathan Varadarajan, Derek Williams, and Meng Zhang. While our re-
viewers provided great feedback, they may or may not agree with all of the final contents of this
primer.
This work is supported in part by the National Science Foundation (CNS-
0551401, CNS-0720565, CCF-0916725, CCF-0444516, and CCF-0811290), Sandia/DOE
(#MSN123960/DOE890426), Semiconductor Research Corporation (contract 2009-HJ-
1881), and the University of Wisconsin (Kellett Award to Hill). The views expressed herein
are not necessarily those of the NSF, Sandia, DOE, or SRC.
xx PREFACE TO THE FIRST EDITION
Dan thanks Deborah, Jason, and Julie for their love and for putting up with him taking
the time to work on another Synthesis Lecture. Dan thanks his Uncle Sol for helping inspire
him to be an engineer in the first place. Lastly, Dan dedicates this book to the memory of Rusty
Sneiderman, a treasured friend of thirty years who will be dearly missed by everyone who was
lucky enough to have known him.
Mark wishes to thank Sue, Nicole, and Gregory for their love and support.
David thanks his coauthors for putting up with his deadline-challenged work style, his
parents Roger and Ann Wood for inspiring him to be a second-generation Computer Sciences
professor, and Jane, Alex, and Zach for helping him remember what life is all about.

Daniel J. Sorin, Mark D. Hill, and David A. Wood


November 2011
1

CHAPTER 1

Introduction to Consistency
and Coherence
Many modern computer systems and most multicore chips (chip multiprocessors) support
shared memory in hardware. In a shared memory system, each of the processor cores may read
and write to a single shared address space. These designs seek various goodness properties, such
as high performance, low power, and low cost. Of course, it is not valuable to provide these
goodness properties without first providing correctness. Correct shared memory seems intuitive
at a hand-wave level, but, as this lecture will help show, there are subtle issues in even defining
what it means for a shared memory system to be correct, as well as many subtle corner cases in
designing a correct shared memory implementation. Moreover, these subtleties must be mas-
tered in hardware implementations where bug fixes are expensive. Even academics should master
these subtleties to make it more likely that their proposed designs will work.
Designing and evaluating a correct shared memory system requires an architect to under-
stand memory consistency and cache coherence, the two topics of this primer. Memory consistency
(consistency, memory consistency model, or memory model) is a precise, architecturally visible
definition of shared memory correctness. Consistency definitions provide rules about loads and
stores (or memory reads and writes) and how they act upon memory. Ideally, consistency def-
initions would be simple and easy to understand. However, defining what it means for shared
memory to behave correctly is more subtle than defining the correct behavior of, for example,
a single-threaded processor core. The correctness criterion for a single processor core partitions
behavior between one correct result and many incorrect alternatives. This is because the pro-
cessor’s architecture mandates that the execution of a thread transforms a given input state into
a single well-defined output state, even on an out-of-order core. Shared memory consistency
models, however, concern the loads and stores of multiple threads and usually allow many cor-
rect executions while disallowing many (more) incorrect ones. The possibility of multiple cor-
rect executions is due to the ISA allowing multiple threads to execute concurrently, often with
many possible legal interleavings of instructions from different threads. The multitude of correct
executions complicates the erstwhile simple challenge of determining whether an execution is
correct. Nevertheless, consistency must be mastered to implement shared memory and, in some
cases, to write correct programs that use it.
The microarchitecture—the hardware design of the processor cores and the shared mem-
ory system—must enforce the desired consistency model. As part of this consistency model
2 1. INTRODUCTION TO CONSISTENCY AND COHERENCE
support, the hardware provides cache coherence (or coherence). In a shared-memory system
with caches, the cached values can potentially become out-of-date (or incoherent) when one of
the processors updates its cached value. Coherence seeks to make the caches of a shared-memory
system as functionally invisible as the caches in a single-core system; it does so by propagating a
processor’s write to other processors’ caches. It is worth stressing that unlike consistency which
is an architectural specification that defines shared memory correctness, coherence is a means to
supporting a consistency model.
Even though consistency is the first major topic of this primer, we begin in Chapter 2
with a brief introduction to coherence because coherence protocols play an important role in
providing consistency. The goal of Chapter 2 is to explain enough about coherence to understand
how consistency models interact with coherent caches, but not to explore specific coherence
protocols or implementations, which are topics we defer until the second portion of this primer
in Chapters 6–9.

1.1 CONSISTENCY (A.K.A., MEMORY CONSISTENCY,


MEMORY CONSISTENCY MODEL, OR MEMORY
MODEL)
Consistency models define correct shared memory behavior in terms of loads and stores (mem-
ory reads and writes), without reference to caches or coherence. To gain some real-world intu-
ition on why we need consistency models, consider a university that posts its course schedule on-
line. Assume that the Computer Architecture course is originally scheduled to be in Room 152.
The day before classes begin, the university registrar decides to move the class to Room 252.
The registrar sends an e-mail message asking the website administrator to update the online
schedule, and a few minutes later, the registrar sends a text message to all registered students
to check the newly updated schedule. It is not hard to imagine a scenario—if, say, the website
administrator is too busy to post the update immediately—in which a diligent student receives
the text message, immediately checks the online schedule, and still observes the (old) class lo-
cation Room 152. Even though the online schedule is eventually updated to Room 252 and the
registrar performed the “writes” in the correct order, this diligent student observed them in a
different order and thus went to the wrong room. A consistency model defines whether this be-
havior is correct (and thus whether a user must take other action to achieve the desired outcome)
or incorrect (in which case the system must preclude these reorderings).
Although this contrived example used multiple media, similar behavior can happen in
shared memory hardware with out-of-order processor cores, write buffers, prefetching, and mul-
tiple cache banks. Thus, we need to define shared memory correctness—that is, which shared
memory behaviors are allowed—so that programmers know what to expect and implementors
know the limits to what they can provide.
Shared memory correctness is specified by a memory consistency model or, more simply,
a memory model. The memory model specifies the allowed behavior of multithreaded programs
1.1. CONSISTENCY 3
executing with shared memory. For a multithreaded program executing with specific input data,
the memory model specifies what values dynamic loads may return and, optionally, what possible
final states of the memory are. Unlike single-threaded execution, multiple correct behaviors are
usually allowed, making understanding memory consistency models subtle.
Chapter 3 introduces the concept of memory consistency models and presents sequential
consistency (SC), the strongest and most intuitive consistency model. The chapter begins by
motivating the need to specify shared memory behavior and precisely defines what a memory
consistency model is. It next delves into the intuitive SC model, which states that a multi-
threaded execution should look like an interleaving of the sequential executions of each con-
stituent thread, as if the threads were time-multiplexed on a single-core processor. Beyond this
intuition, the chapter formalizes SC and explores implementing SC with coherence in both
simple and aggressive ways, culminating with a MIPS R10000 case study.
In Chapter 4, we move beyond SC and focus on the memory consistency model imple-
mented by x86 and historical SPARC systems. This consistency model, called total store order
(TSO), is motivated by the desire to use first-in–first-out write buffers to hold the results of
committed stores before writing the results to the caches. This optimization violates SC, yet
promises enough performance benefit to inspire architectures to define TSO, which permits
this optimization. In this chapter, we show how to formalize TSO from our SC formalization,
how TSO affects implementations, and how SC and TSO compare.
Finally, Chapter 5 introduces “relaxed” or “weak” memory consistency models. It moti-
vates these models by showing that most memory orderings in strong models are unnecessary.
If a thread updates ten data items and then a synchronization flag, programmers usually do
not care if the data items are updated in order with respect to each other but only that all data
items are updated before the flag is updated. Relaxed models seek to capture this increased or-
dering flexibility to get higher performance or a simpler implementation. After providing this
motivation, the chapter develops an example relaxed consistency model, called XC, wherein
programmers get order only when they ask for it with a FENCE instruction (e.g., a FENCE
after the last data update but before the flag write). The chapter then extends the formalism of
the previous two chapters to handle XC and discusses how to implement XC (with considerable
reordering between the cores and the coherence protocol). The chapter then discusses a way in
which many programmers can avoid thinking about relaxed models directly: if they add enough
FENCEs to ensure their program is data-race free (DRF), then most relaxed models will appear
SC. With “SC for DRF,” programmers can get both the (relatively) simple correctness model of
SC with the (relatively) higher performance of XC. For those who want to reason more deeply,
the chapter concludes by distinguishing acquires from releases, discussing write atomicity and
causality, pointing to commercial examples (including an IBM Power case study), and touching
upon high-level language models ( Java and CCC).
Returning to the real-world consistency example of the class schedule, we can observe that
the combination of an email system, a human web administrator, and a text-messaging system
4 1. INTRODUCTION TO CONSISTENCY AND COHERENCE
represents an extremely weak consistency model. To prevent the problem of a diligent student
going to the wrong room, the university registrar needed to perform a FENCE operation after
her email to ensure that the online schedule was updated before sending the text message.

1.2 COHERENCE (A.K.A., CACHE COHERENCE)


Unless care is taken, a coherence problem can arise if multiple actors (e.g., multiple cores) have
access to multiple copies of a datum (e.g., in multiple caches) and at least one access is a write.
Consider an example that is similar to the memory consistency example. A student checks the
online schedule of courses, observes that the Computer Architecture course is being held in
Room 152 (reads the datum), and copies this information into her calendar app in her mobile
phone (caches the datum). Subsequently, the university registrar decides to move the class to
Room 252, updates the online schedule (writes to the datum) and informs the students via a
text message. The student’s copy of the datum is now stale, and we have an incoherent situation.
If she goes to Room 152, she will fail to find her class. Examples of incoherence from the world of
computing, but not including computer architecture, include stale web caches and programmers
using un-updated code repositories.
Access to stale data (incoherence) is prevented using a coherence protocol, which is a
set of rules implemented by the distributed set of actors within a system. Coherence protocols
come in many variants but follow a few themes, as developed in Chapters 6–9. Essentially, all
of the variants make one processor’s write visible to the other processors by propagating the
write to all caches, i.e., keeping the calendar in sync with the online schedule. But protocols
differ in when and how the syncing happens. There are two major classes of coherence protocols.
In the first approach, the coherence protocol ensures that writes are propagated to the caches
synchronously. When the online schedule is updated, the coherence protocol ensures that the
student’s calendar is updated as well. In the second approach, the coherence protocol propagates
writes to the caches asynchronously, while still honoring the consistency model. The coherence
protocol does not guarantee that when the online schedule is updated, the new value will have
propagated to the student’s calendar as well; however, the protocol does ensure that the new
value is propagated before the text message reaches her mobile phone. This primer focuses on
the first class of coherence protocols (Chapters 6–9) while Chapter 10 discusses the emerging
second class.
Chapter 6 presents the big picture of cache coherence protocols and sets the stage for the
subsequent chapters on specific coherence protocols. This chapter covers issues shared by most
coherence protocols, including the distributed operations of cache controllers and memory con-
trollers and the common MOESI coherence states: modified (M), owned (O), exclusive (E),
shared (S), and invalid (I). Importantly, this chapter also presents our table-driven methodology
for presenting protocols with both stable (e.g., MOESI) and transient coherence states. Tran-
sient states are required in real implementations because modern systems rarely permit atomic
transitions from one stable state to another (e.g., a read miss in state Invalid will spend some
1.2. COHERENCE (A.K.A., CACHE COHERENCE) 5
time waiting for a data response before it can enter state Shared). Much of the real complex-
ity in coherence protocols hides in the transient states, similar to how much of processor core
complexity hides in micro-architectural states.
Chapter 7 covers snooping cache coherence protocols, which initially dominated the com-
mercial market. At the hand-wave level, snooping protocols are simple. When a cache miss oc-
curs, a core’s cache controller arbitrates for a shared bus and broadcasts its request. The shared
bus ensures that all controllers observe all requests in the same order and thus all controllers can
coordinate their individual, distributed actions to ensure that they maintain a globally consis-
tent state. Snooping gets complicated, however, because systems may use multiple buses and
modern buses do not atomically handle requests. Modern buses have queues for arbitration and
can send responses that are unicast, delayed by pipelining, or out-of-order. All of these fea-
tures lead to more transient coherence states. Chapter 7 concludes with case studies of the Sun
UltraEnterprise E10000 and the IBM Power5.
Chapter 8 delves into directory cache coherence protocols that offer the promise of scal-
ing to more processor cores and other actors than snooping protocols that rely on broadcast.
There is a joke that all problems in computer science can be solved with a level of indirection.
Directory protocols support this joke: A cache miss requests a memory location from the next
level cache (or memory) controller, which maintains a directory that tracks which caches hold
which locations. Based on the directory entry for the requested memory location, the controller
sends a response message to the requestor or forwards the request message to one or more ac-
tors currently caching the memory location. Each message typically has one destination (i.e.,
no broadcast or multicast), but transient coherence states abound as transitions from one stable
coherence state to another stable one can generate a number of messages proportional to the
number of actors in the system. This chapter starts with a basic MSI directory protocol and then
refines it to handle the MOESI states E and O, distributed directories, less stalling of requests,
approximate directory entry representations, and more. The chapter also explores the design of
the directory itself, including directory caching techniques. The chapter concludes with case
studies of the old SGI Origin 2000 and the newer AMD HyperTransport, HyperTransport
Assist, and Intel QuickPath Interconnect (QPI).
Chapter 9 deals with some, but not all, of the advanced topics in coherence. For ease of
explanation, the prior chapters on coherence intentionally restrict themselves to the simplest
system models needed to explain the fundamental issues. Chapter 9 delves into more compli-
cated system models and optimizations, with a focus on issues that are common to both snoop-
ing and directory protocols. Initial topics include dealing with instruction caches, multilevel
caches, write-through caches, translation lookaside buffers (TLBs), coherent direct memory ac-
cess (DMA), virtual caches, and hierarchical coherence protocols. Finally, the chapter delves
into performance optimizations (e.g., targeting migratory sharing and false sharing) and a new
protocol family called Token Coherence that subsumes directory and snooping coherence.
6 1. INTRODUCTION TO CONSISTENCY AND COHERENCE
1.3 CONSISTENCY AND COHERENCE FOR
HETEROGENEOUS SYSTEMS
Modern computer systems are predominantly heterogeneous. A mobile phone processor today
not only contains a multicore CPU, it also has a GPU and other accelerators (e.g., neural net-
work hardware). In the quest for programmability, such heterogeneous systems are starting to
support shared memory. Chapter 10 deals with consistency and coherence for such heteroge-
neous processors.
The chapter starts by focusing on GPUs, arguably the most popular accelerators today.
The chapter observes that GPUs originally chose not to support hardware cache coherence,
since GPUs are designed for embarrassingly parallel graphics workloads that do not synchronize
or share data all that much. However, the absence of hardware cache coherence leads to pro-
grammability and/or performance challenges when GPUs are used for general-purpose work-
loads with fine-grained synchronization and data sharing. The chapter discusses in detail some of
the promising coherence alternatives that overcome these limitations—in particular, explaining
why the candidate protocols enforce the consistency model directly rather than implementing
coherence in a consistency-agnostic manner. The chapter concludes with a brief discussion on
consistency and coherence across CPUs and the accelerators.

1.4 SPECIFYING AND VALIDATING MEMORY


CONSISTENCY MODELS AND CACHE COHERENCE
Consistency models and coherence protocols are complex and subtle. Yet, this complexity must
be managed to ensure that multicores are programmable and that their designs can be validated.
To achieve these goals, it is critical that consistency models are specified formally. A formal
specification would enable programmers to clearly and exhaustively (with tool support) under-
stand what behaviors are permitted by the memory model and what behaviors are not. Second,
a precise formal specification is mandatory for validating implementations.
Chapter 11 starts by discussing two methods for specifying systems—axiomatic and
operational—focusing on how these methods can be applied for consistency models and co-
herence protocols. Then the chapter goes over techniques for validating implementations—
including processor pipeline and coherence protocol implementations—against their specifi-
cation. The chapter discusses both formal methods and informal testing.

1.5 A CONSISTENCY AND COHERENCE QUIZ


It can be easy to convince oneself that one’s knowledge of consistency and coherence is sufficient
and that reading this primer is not necessary. To test whether this is the case, we offer this pop
quiz.
1.6. WHAT THIS PRIMER DOES NOT DO 7
Question 1: In a system that maintains sequential consistency, a core must issue coherence
requests in program order. True or false? (Answer is in Section 3.8)
Question 2: The memory consistency model specifies the legal orderings of coherence transac-
tions. True or false? (Section 3.8)
Question 3: To perform an atomic read–modify–write instruction (e.g., test-and-set), a core
must always communicate with the other cores. True or false? (Section 3.9)
Question 4: In a TSO system with multithreaded cores, threads may bypass values out of the
write buffer, regardless of which thread wrote the value. True or false? (Section 4.4)
Question 5: A programmer who writes properly synchronized code relative to the high-level
language’s consistency model (e.g., Java) does not need to consider the architecture’s mem-
ory consistency model. True or false? (Section 5.9)
Question 6: In an MSI snooping protocol, a cache block may only be in one of three coherence
states. True or false? (Section 7.2)
Question 7: A snooping cache coherence protocol requires the cores to communicate on a bus.
True or false? (Section 7.6)
Question 8: GPUs do not support hardware cache coherence. Therefore, they are unable to
enforce a memory consistency model. True or False? (Section 10.1).
Even though the answers are provided later in this primer, we encourage readers to try to
answer the questions before looking ahead at the answers.

1.6 WHAT THIS PRIMER DOES NOT DO


This lecture is intended to be a primer on coherence and consistency. We expect this mate-
rial could be covered in a graduate class in about ten 75-minute classes (e.g., one lecture per
Chapter 2 to Chapter 11).
For this purpose, there are many things the primer does not cover. Some of these include
the following.
• Synchronization. Coherence makes caches invisible. Consistency can make shared mem-
ory look like a single memory module. Nevertheless, programmers will probably need
locks, barriers, and other synchronization techniques to make their programs useful. Read-
ers are referred to the Synthesis Lecture on Shared-Memory synchronization [2].
• Commercial Relaxed Consistency Models. This primer does not cover the subtleties of
the ARM, PowerPC, and RISC-V memory models, but does describe which mechanisms
they provide to enforce order.
Another random document with
no related content on Scribd:
nothing produced by the sculptors, painters, and ornamentists who
were employed on the decoration of those great buildings which
even the Jewish prophets could not help admiring, while they abused
the princes who built them and the gods to whom they were
consecrated. The earliest Greek travellers, such as Herodotus and
Ctesias, only saw the ruins of these magnificent structures and of
their rich adornment of enamels, frescoes, and sculptured figures;
and yet how great was their wonder! We can hardly reflect without
emotion upon what we have lost in great works in stone and metal
carried out in the style of which certain fragments from Tello and a
few terra-cotta statuettes give us some faint idea.[230]

Fig. 111.—The Caillou Michaux, obverse. Drawn by Saint-Elme Gautier.


Fig. 112.—The Caillou Michaux, reverse. Drawn by Saint-Elme Gautier.
That such works did once exist we are told by the Greek
historians. Herodotus, after having described the temple of Bel and
the sanctuary on its summit in which no image of the deity was set
up, goes on, “In this temple at Babylon there is another sanctuary
lower down, where a great seated statue of Zeus may be seen.[231]
Near this statue there is a large table of gold, the throne and its
steps are of the same material. The whole, according to the
Chaldæans, is worth eight hundred talents of gold ... at one time the
sacred inclosure also contained a statue of massive gold twelve
cubits high. I did not see it. I content myself with repeating what the
Chaldæans told me about it. Darius, the son of Hystaspes, formed a
project to carry it off, but he did not dare to execute it. Xerxes, the
son of Darius, caused the priest to be put to death by whom the
enterprise was opposed, and took possession of the statue.” We
here have the evidence of an eye-witness. The seated statue of Bel,
without being of the colossal size ascribed by the Chaldæans to the
image destroyed by Darius, must yet, if we may judge from the
expression of Herodotus, have been larger than nature. We may
gather some notion as to its pose and general appearance from
certain figures carved upon the cylinders (Fig. 40), just as, in
Greece, the more famous and venerable of her religious statues
were reproduced upon coins and gems. As to this Babylonian statue,
the one doubt we have relates to the value put upon it by the
Chaldæans. Had the statue and its surroundings really been of
massive gold, would the Persians have spared it when the other was
overthrown and broken up? It is possible that in spite of the
historian’s assertion the work he describes was only gilded bronze.
And as for the image twelve cubits high, we may express the
same doubts. Ctesias seems to have received better information as
to how these figures were made than Herodotus, and, through
Diodorus, he tells us that they consisted of metal plates beaten into
shape with the hammer.[232] Whether Ctesias or his informants did
or did not exaggerate their true dimensions (Diodorus speaks of a
Bel forty feet high), or whether these figures were of gold or gilded
brass, is of comparatively slight importance; we are interested chiefly
in the information he gives as to the method of fabrication. Ever
since the discovery of the Balawat gates proved to what a height the
student art of Assyria carried the manipulation of metal by the
repoussé process, we have had no difficulty in believing that the
sculptors of Babylon in the time of Nebuchadnezzar could build up
images of colossal size and fine decorative effect by means of
plaques united with rivets. If we may believe the rest of Diodorus’s
description, the Chaldæan artists combined the glory of gold and
silver with the purity of ivory and the bright and varied colours of
precious stones. And all this we see good reason to admit when we
have examined at the British Museum those ivories in which lapis
lazuli and other substances of the same kind even now fill up the
hollows of the design, while the field still glitters here and there with
some last fragments of the gold with which it was once incrusted.
The skilful workmen who discovered the secret of this kind of
mosaic, may very well have learnt to combine these beautiful
materials so well that the statues upon which they were used would
even have rivalled the chryselephantine masterpieces of Phidias; in
richness and harmony of tones, at least, if not in nobility and purity of
form.
§ 6. Assyrian Sculpture.

Assyrian sculpture is far from leading us into the remote centuries


from which some of the Chaldæan works must date. It had no period
of infancy or childish effort. The Semites of the north were the pupils
of their southern brothers, from whom they obtained an art already
mature. The oldest known Assyrian monument dates from the reign
of Tiglath-Pileser I., or about the end of the twelfth century b.c.; it is a
bas-relief chiselled upon a rock near the sources of the Tigris, about
fifty miles north of Diarbekir and near the village of Korkhar. It
represents the king standing upright, his right hand extended and his
left holding a sceptre; at present, however, we only know it by the
very poor sketch given by Professor Rawlinson.[233] It is almost the
only monument extant from the time when the capital of the
monarchy was on the site now known as Kaleh-Shergat. One other
may be named, the female torso in the British Museum, to which we
have already referred;[234] on it the name of Assurbilkala, who
succeeded Tiglath-Pileser I., may be read.
The monumental history of Assyria really begins two centuries
later, with the great buildings erected by Assurnazirpal at Calah
(Nimroud), his favourite residence. Assyrian art then reached a level
that, speaking generally, it never surpassed. In the following
centuries it innovated, it became more complex and certainly more
refined, but it produced nothing essentially nobler than certain
Nimroud bas-reliefs, in which the king is seated among his great
officers or before his gods, and always in the attitude of prayer and
sacrifice. We have already given several examples of these reliefs
(Vol. I. Fig. 4, and above, Figs. 15 and 64); we may here add one
more (Fig. 113). Leaning on his bow with his left hand, the king,
richly dressed, lifts in his right the patera whose contents he is about
to pour as a libation to the deity. Facing him stands a gigantic
eunuch, who waves over his master’s head one of those fly-flappers
that, with the parasol, have always been among the insignia of
Oriental royalty (see Plate X.).
These figures are rather short in their proportions, and the
muscular development of their arms, which alone are bare, is
violently exaggerated, but yet as a whole the work has a certain
grandeur and nobility. The lines are well balanced. Both the king and
his attendant seem fully impressed with the gravity of the rite over
which they are busy. There is dignity in their attitudes, but no
stiffness; their gestures are easy and expressive without being too
much accented. In our engraving we have only been able to include
the two isolated figures, in the original there are several more all
occupied over the same rite. Even the British Museum has only a
few fragments from these vast compositions. For those who saw
them in their original completeness, well lighted and distributed in
their right order along the walls of spacious saloons, they must have
seemed majestic enough.
In his palace decorations the Assyrian artist set himself to free his
figures from all unnecessary surroundings and to simplify his theme
as much as he could. But we must make a distinction between those
reliefs that may be called historical, such as the pictures of battles
and sieges, and those in which the king is shown in the
accomplishment of some duty belonging to his position, and part of
his daily or periodical routine. It is to the latter class that the most
carefully-executed works belong. In these no particular locality is
specified; like that of the Panathenaic procession, it is left
undetermined, and the mind of the spectator is silently invited to fill it
in for himself. Those who frequented the palace were accustomed to
see the king upon his throne, or traversing the wide quadrangle, or
pouring libations on the altar that stood in front of the temple; so that
they had no difficulty in imagining all that the sculptor had left unsaid.
In the hunting pictures the same method was followed with but little
modification. A flat surface suggesting the unbroken expanse of the
desert, was the only indication of a locus in quo.
Fig. 113.—Assurnazirpal offering a libation. Height 7 feet 8
inches. British Museum.
Drawn by Saint-Elme Gautier.
It would have been difficult, or rather impossible, to adhere to
such a rule in those reliefs in which the actual incidents of military
expeditions were retraced. In them the sculptor thought it necessary
to insert such details as would permit the various episodes
commemorated to be identified. One of the simplest means of
insuring the desired result was to render not only buildings, such as
castles and fortified towns, but also the natural features of the scene,
with the greatest possible truth. This the Assyrian artist did, as a rule,
with excellent judgment. Thus, if an action or campaign had been
fought in a mountainous country, he made use of a kind of lattice-
work or reticulation, which every spectator thoroughly understood
(see Vol. I. Figs. 39 and 43); if among forests, he introduced
numerous trees among his figures. He made little attempt to
distinguish between one kind of tree and another, but in most cases
employed forms as conventional as that by which he indicated hills
(Fig. 114).

Fig. 114.—Tree on a river bank;


from Layard.
One of the chief merits and most striking features of Assyrian
sculpture is, then, its power of selection, its rejection of all that is
superfluous, its comprehension, in fact, of the true spirit and special
conditions of the art. The field has none of those encumbering
accessories which, under the pretext of furnishing and defining, only
serve, so to speak, to take away air and elbow-room from the
figures. When certain complementary features are required to make
the subject clear, the sculptor introduces them, but he never gives
more than is strictly necessary. He never gives way to the temptation
to exaggerate such details, or treats them as if they had an interest
and importance of their own. Such sobriety found its reward. His
work no doubt remained faulty in many respects and inferior to that
of his Egyptian forerunner, still more to that of his Greek successor;
but yet it had an air of frankness, of pride and dignity, to which the
more complex and superficially more skilful compositions of the
following epoch too seldom attained.
The good qualities of this early Assyrian school are no less
conspicuous in the colossal figures with which the doorways of
palaces and temples were decorated. The head of the winged bull
has nowhere a more lofty expression or one more full of dignity than
at Nimroud (see below, Fig. 133). The chisels of these northern
artists never created anything more bold, energetic, and lifelike than
the figures from the small temple built by Assurnazirpal (Vol. I. Fig.
188); we need only mention the colossal lion in the British Museum
(Plate VIII.) and the grimacing demon whom a beneficent god seems
to be expelling from the sanctuary in spite of his threats and grinning
teeth.[235]
And yet this art which is so masterly in some respects is very
primitive and naïve in others. We cannot help being amazed, for
instance, at the wide band of wedges that the scribe has been
allowed to cut across all the lines and contours left by the sculptor.
This proceeding is to be explained, of course, by the essentially
historical and anecdotic character of Assyrian art, but nevertheless it
betrays the contempt for æsthetic effect which is one of the
characteristics of archaic art in Assyria. This feature is by no means
without importance, and Sir Henry Layard seems to us to have been
ill advised in deliberately suppressing it. In his otherwise faithful
reproductions of the best preserved among the bas-reliefs of
Assurnazirpal he has everywhere left out the continuous band of
inscription which runs across them at about two-thirds of their height.
By such a proceeding he has sensibly modified their decorative
value.[236]
We must be on our guard against attributing such primitive
simplicity to inexperience in the use of the chisel. In the finest works
of later years that instrument was never wielded with more assured
skill than in the delicate carvings in which the embroidery on the
royal robes are reproduced. We have already put several of these
motives before our readers; in Fig. 115 we give a last exquisite
morsel. It shows a winged lion with the head of a woman, and a king
or priest who holds one of her paws in his left hand, while with his
right he seems to threaten her with a mace.

Fig. 115.—Detail from the royal robe of Assurnazirpal; from Layard.


Such dexterity as this is not to be seen in works in the round (see
Fig. 60). But with the reign of Assurnazirpal commences another
series of royal monuments in which the artist, not being compelled to
quit work in relief, felt himself more at home. We refer to those
round-headed steles on which the standing figure of the king is
relieved against a flat ground bordered by a raised edge. An
inscription is engraved sometimes upon the bed of the relief,
sometimes on the reverse of the stele. An effigy of Assurnazirpal
belonging to this class is now in the British Museum. It was
discovered still standing in the entrance to one of the temples built
by that sovereign on the platform of Calah. Before the stele there
was an altar similar to that shown on page 256 of our first volume.
This altar is also in the British Museum.[237] From the existence of
these steles it has been concluded, with no little probability, that the
Assyrian kings, or at least some of them, received divine honours
after their deaths. We have chosen that of Samas-vul II. for
reproduction, on account of its good condition (Fig. 116). It differs but
little from the stele of Assurnazirpal. High up in the field and in front
of the head may be noticed symbols like those on the land marks
(see Figs. 111, 112, and 143). The king’s right hand is raised in the
attitude of adoration. In his left he holds a sceptre, with a ball of ivory
or metal at one end and a tassel at the other. These steles must
have been set up in great numbers. We find them represented in the
reliefs (Vol. I. Figs. 42 and 112, and Plate XII.) and upon cylinders
(ib. Fig. 69). They were raised as a sign of annexation in conquered
countries, and an invocation engraved upon the stone put them
under the protection of the Assyrian gods, who were charged with
the punishment of any who might lay hands upon them.[238]
In the British Museum there are fragments of a sculptured obelisk
on which the wars and hunts of Assurnazirpal are figured. It is taller
than that of his son Shalmaneser II., being nearly ten feet high, but
as the material is a soft limestone, it is in far worse preservation; we
only mention it to show that Assyrian art was in possession of all its
resources in the time of this king.
Under none of the princes who reigned at Calah did sculpture
show any sensible change of style; but yet, perhaps, in certain
passages of the Balawat gates we may recognize the first signs of a
tendency that was to become strongly marked under the Sargonids.
The field of the relief there contains a far greater number of
picturesque and explanatory details than the great bas-reliefs of
Nimroud. The campaigns and victories of Shalmaneser II. was the
theme put before the sculptor. In order to do it justice he had to carry
the spectator into countries of various aspects, and to give their true
character to military struggles whose conditions were incessantly
changing. He did not think success was to be attained by confining
himself to figuring the cities and fortresses besieged and taken by
the Assyrian army; he introduced features for the purpose of
determining the seat of war. Such accessories were better placed
among figures on a small scale than among those surpassing or
even approaching life size; and without knowing exactly why, the
artist seems to have been warned of this by a secret and delicate
instinct. These strips of bronze are ten inches high; each is divided
into two horizontal divisions by a narrow band of rosettes, which is
also repeated at the top and bottom of each strip. The figures are on
an average about three and a half inches high (see Fig. 117 and
Plate XII.).
Fig. 116.—Stele of Samas-vul II. Height
7 feet 2 inches. British Museum.
Drawn by Saint-Elme Gautier.
PLATE XII

FROM THE BALAWAT GATES


British Museum
Our Plate XII. is an exact copy from a part of the band marked B
in the provisional numeration adopted by Dr. Birch.[239] According to
the inscription upon it this part of the work commemorates a sacrifice
offered by Shalmaneser on the borders of the Lake Van, in Armenia.
The figure of the king is not included in our plate, but it contains all
the sacred vessels of which he made use for the ceremony.
Beginning on the left we find a sort of great candelabrum, a three-
legged altar, and two standards upon tripods. Must these be
accepted as military ensigns of the same class as that shown in Fig.
46 or as religious emblems of the sun and moon? The question is
hardly one for us to discuss. Next comes a stele raised upon a rock,
or perhaps carved upon its surface. Other reliefs of the same series
show us that Shalmaneser erected these steles in every country he
conquered. Further to the right we see soldiers throwing into the lake
the limbs of the animals sacrificed. This must be an offering to the
deity of its waters, perhaps to Anou, who was believed to reside in
rivers and lakes as well as in the sea. The denizens of the lake seize
upon the morsels thus put in their way; among them we may
recognize a large fish, a tortoise, and a quadruped, that may
perhaps be an otter.[240]
In the lower division we see the Assyrian army on the march. On
the right Mr. Pinches recognizes a fortified camp in which horses
were left for flight in case of defeat. There is, indeed, one of these
fortified walls shown in projection, of which we have already spoken,
but the horse is placed upon a clearly indicated arch. What is this
arch doing in the middle of the camp? We ask ourselves whether this
circular structure may not be intended to represent a fortified tête-de-
pont. It is abundantly proved that the Assyrians and Chaldæans
made great use of the vault. Why should they not have employed it
for bridges elsewhere than at Babylon? and wherever there were
bridges on the great roads and near their own frontiers what could
be more natural than to defend them by works flanked, like this, with
towers? The horse would then be about to cross the bridge, and his
introduction would be explained simply by the sculptor’s desire to
give all possible clearness to a representation which could never be
complete. He seems to advance with some precaution as if the floor
of the bridge, which is indicated merely by a straight line, was made
of tree trunks or roughly squared planks badly joined. We offer this
hypothesis for what it may be worth. Next come two archers, and
then chariots. The ground must be difficult, for not only does the
driver support his horses with a tightened rein, but a man on foot
walks in front and holds them by the head.
We find a scene entirely similar but still better treated in the upper
division of another plaque (see Fig. 117).[241] Here we may see that
the chariots are progressing not without difficulty and even danger, in
the very bed of a torrent. The movement of the men who lead the
horses is well understood and skilfully rendered; we feel how
carefully they have to conduct their advance among the blocks of
stone that encumber the bed of the stream and the tumbling water
that conceals the nature of the ground. In the lower division we are
presented with one of those scenes that are so common in Assyrian
reliefs. The king in his royal robes appears on the left; a line of
prisoners guarded by archers approach him and beg for mercy, while
the foremost among them “kiss the dust beneath his feet,” to use an
oriental expression in its most literal sense.

Fig. 117.—Two fragments from the Balawat gates. British Museum. Drawn by
Saint-Elme Gautier.
We should have been willing, had it been possible, to make
further extracts from this curious series of reliefs; to have shown,
here naked prisoners defiling under the eyes of the conqueror, there
Assyrian archers shooting at the heaped-up heads of their slain
enemies. But we have perforce been content with giving, by a few
carefully chosen examples, a fair idea of the work that intervened
between the sculptors of Assurnazirpal and those of the Sargonids.
It is probable that the scheme of this vast composition was due to
a single mind; from one end to the other there is an obvious similarity
of thought and style. But several different hands must have been
employed upon its execution, which is far from being of equal merit
throughout. It is on examining the original that we are struck by these
inequalities. Thus, in some of the long rows of captives the handling
is timid and without meaning, while in others it has all the firmness
and decision of the best among the alabaster or limestone reliefs;
the muscular forms, the action of the calf and knee, are well
understood and frankly reproduced. The passages we have chosen
for illustration are among the best in this respect. Taking them all in
all these bronze reliefs are among the works that do most honour to
Assyrian art.
The only monument that has come down to us from the reign of
Vulush III., the successor of Samas-vul, is a statue, or rather a pair
of statues, of Nebo; the better of the two is reproduced in Fig. 15 of
our first volume. These sacred images are of very slight merit from
an art point of view; we should hardly have referred to them but for
their votive inscriptions. From these we learn that they were
consecrated in the Temple of Nebo by the prefect of Calah in order
to bespeak the protection of that god for the king. But the latter is not
named alone; the faithful subject says that he offers these idols “for
his master Vulush and his mistress Sammouramit.”
In this latter name it is difficult not to recognize the Semiramis of
the Greeks, and we are led to ask ourselves whether the queen of
Vulush may not have afforded a prototype for that legendary
princess. This association of a female name with that of the king is
almost without parallel either in Chaldæa or Assyria. In royal
documents, as well as in those of a more private character, there is
no more mention of the royal wives than if they did not exist. Only
one explanation can be given of the apparent anomaly, and that is
that Sammouramit, for reasons that may be easily guessed, enjoyed
a quite exceptional position. It was in those days that, from one reign
to another, the princes of Calah attempted to complete the
subjugation of Chaldæa. It may have happened that in order to put
an end to a state of never-ending rebellion, Vulush married the
heiress of some powerful and popular family of the lower country,
and, that he might be looked upon as the legitimate ruler of Babylon,
joined her name with his in the royal style and title. This hypothesis
finds some confirmation in what Herodotus tells us about Semiramis.
She was, he says, queen of Babylon five generations before Nitocris,
which would be about a century and a half. He adds that she caused
the quays of the Euphrates to be built.[242] This takes us back to
rather beyond the middle of the eighth century b.c., that is very near
to the date which Assyrian chronology would fix for the reign of
Vulush (810–781). As the last representative of the old national
dynasty, this Semiramis, associated as she was in the exercise, or at
least in the show, of sovereign power both in Assyria and Chaldæa,
would not be forgotten by her countrymen, and the population of
Babylon would be especially likely to magnify the part she had
played. There is nothing fabulous in the tradition as Herodotus gives
it, although it may, perhaps, go beyond the truth here and there.
Ctesias, however, goes much farther. He brings together and
amplifies tales which had already received many additions in the half
century that separated him from Herodotus, and he thus creates the
type of that Semiramis, the wife of Ninus and the conqueror of all
Asia, who so long held an undeserved place in ancient history.[243]
The last Calah prince who has left us anything is Tiglath-Pileser
II. (745–727). We have already described how his palace was
destroyed by Esarhaddon, who employed its materials for his own
purposes.[244] At the British Museum there are a few fragments
which have been recognized by their inscriptions as belonging to his
work (Vol. I. Fig. 26)[245]; they are quite similar to those of his
immediate predecessors.
With the new dynasty founded by Sargon at the end of the eighth
century taste changed fast enough. In those bas-reliefs in the
Khorsabad palace which represent that king’s campaigns, many
details are treated in a spirit very different from that of former days.
Trees, for instance, are no longer abstract signs standing for no one
kind of vegetation more than another; the sculptor begins to notice
their distinguishing features and to give their proper physiognomy to
the different countries overrun by the Assyrians. But these landscape
backgrounds are not to be found in all the bas-reliefs of Khorsabad.
[246]
The art of Sargon was an art of transition. While on the one hand
it endeavoured to open up new ground, on the other it travelled on
the old ways and followed many of the ancient errors; it had a
marked predilection for figures larger than nature, and bas-reliefs
treating of royal pageants and processions remind us by the
simplicity of their conception of those of Assurnazirpal. We have
already given many fragments (Vol I. Figs. 22–24, and 29), and now
we give another, a vizier and a eunuch standing before the king in
the characteristic attitude of respect (Fig. 118). The inscription which
cut the figures of Assurnazirpal so awkwardly in two has
disappeared; the proportions have gained in slenderness, and the
muscular development, though still strongly marked, has lost some
of its exaggeration. All this shows progress, and yet on the whole the
Louvre relief is less happy in its effect than the best of the Nimroud
sculptures in the British Museum. The execution is neither so firm
nor so frank; the relief is much higher and the modelling a little heavy
and bulbous in consequence. This result may also be caused to
some extent by the nature of the material, which is a softer alabaster
than was employed, so far as we know, in any other part of Assyria.
At Nimroud a fine limestone was chiefly used.
We shall be contented with mentioning the stele of Sargon, found
near Larnaca, in Cyprus, in 1845. It is most important as an historical
monument; it proves that, as a sequel to his Syrian conquests, the
terror of Sargon’s name was so widespread that even the inhabitants
of the islands thought it prudent to declare themselves his vassals,
and to set up his image as a sign of homage rendered and
allegiance sworn. But the stone is now too much broken to be of any
great interest as a work of art.[247]
The artistic masterpiece of this epoch is the bronze lion figured in
our Plate XI. It had been suggested that its use was to hold down the
cords of a tent or the lower edge of tapestries, a purpose for which
the weight of the bronze and the ring fixed in its back make it well
suited. This idea had to be abandoned, however, when a whole
series of similar figures marked with the name of Sennacherib was
found. Their execution was hardly equal to that of the lion we have
figured, but their general characteristics were the same, and they
had rings on their backs.[248]
These lions are sixteen in number; they form a series in which
the size of the animal becomes steadily smaller with each example;
the largest is a foot long, the smallest hardly more than an inch. The
decrease seems to follow a certain rule, but rust has affected them
too greatly for it to be easy to base any metrological calculation upon
their weight. But all doubt as to their use is removed by the
inscriptions in cuneiform and in ancient Aramaic characters with
which several of them are engraved. The Aramaic inscriptions all
begin with the word mine; then comes a figure indicating the number
of mines, or of subdivisions of the mine that the weight represents;
finally, there is the name of some personage, who may perhaps have
been a magistrate charged with the regulation and verification of
weights.
Fig. 118.—Bas-relief from Khorsabad. Height 9 feet 5
inches. Louvre.
With the accession of Sennacherib, a sensible change comes
over the aspect of the reliefs. What until now has been the exception
becomes the rule. On almost every slab we find a complex and
carefully treated landscape background. The artist is not satisfied
with indicating the differences between conifers, cypresses, and
pines (Vol. I. Figs, 41–43), palms (ib. Figs. 30 and 34; and above,

You might also like