Welcome to Scribd!

4 Memory Models

Uploaded by

0% found this document useful (0 votes)

23 views19 pages

The document discusses memory hierarchies and cache optimizations. It explains that fast memory is expensive while slow memory is cheap, leading to multi-level memory hierarchies with caches. It notes that most programs have small working sets that fit in fast caches, providing good performance. The document outlines cache designs and discusses how prefetching, algorithms like hopscotch hashing, and multicore issues like cache coherency can impact performance. It provides examples of more complex memory hierarchies in NUMA systems and accelerators like GPUs.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

23 views19 pages

4 Memory Models

Uploaded by

savio77

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 19

Search inside document

Memory Models for HPC

David Chisnall
February 14, 2011
The Memory Hierarchy: Motivation
Fast memory is expensive
Slow memory is cheap
Example Memory Hierarchy
Location Size Access Time (cycles)
Registers < 1KB 1
L1 Data Cache 32KB 4
L2 Cache 256KB 12
L3 Cache 2MB+ 26-32
Main Memory 1GB+ 150+
Cache and Working Sets
Most programs have a small working set
Each part of the program works on a small amount of data
Storing the working set in fast memory gives the same speed
as storing everything in fast memory
(But is much cheaper)
What goes in Cache?
Memory that was accessed recently
Memory near memory that was accessed recently
Memory that will be accessed soon (hopefully!)
Cache Design
Organised in cache lines, typically 64 bytes (8-16 pointer-sized
values)
Simple hash table indexed by some bits in the address
Why do we care?
Cache is from the French meaning hidden programmers
dont normally see it
We do, because we care about performance!
Line size means that its usually very fast to access something
within 32 bytes of the last thing we accessed
Hash conicts mean that we can evict stu from cache too
soon
Explicit Prefetching
Working out what data will be used next is hard!
The CPU tries to guess based on linear access (sometimes)
The compiler can usually make better guesses
The programmer can make even better ones
Prefetching Example

for (int i=0 ; i<pixelCount ; i++)

{
__builtin_prefetch (& pixels[i+1]);
pixels[i].r *= 1.1;
pixels[i].g *= 1.1;
pixels[i].b *= 1.1;
}

__builtin_prefetch() is a GCC extension
Expands to prefetch instruction
This prefetches the pixel needed for the next loop
Is This Sensible?

__builtin_prefetch (& pixels[i+1]);

Pixel is 1/4 of a cache line (16 bytes)
Next pixel is in the current cache line 3/4 of the time
Prefetch will take 12+ cycles
We arent giving it that long
Prefetching pixel i+4 is better
Cache Optimizing: Hopscotch Hash Algorithm
1. Try to insert value at hash
2. If its full, linear scan up to 32 elements along for spare slot,
insert if possible
3. If not, see if any of the 32 values can be moved to a
secondary location, insert in their freed slot
4. If not, resize the table and retry
Hopscotch Hash Performance
Values always within 32 spots of rst lookup
Usually in same cache line
Always within 2 cache lines
Very fast!
Also well-suited to concurrency
The Multicore Problem
Each core has its own cache
Two threads try to access the same data
What happens?
Cache Coherency
Two cores reading the same data, both have copy in cache
One tries to modify, must invalidate other cache rst
Getting data from another cores cache is typically 40-60
cycles
For best performance, threads should operate on dierent data
Going Further: NUMA
Other parts of the memory hierarchy see the same problem
CPU has local memory, accesses other processors memory
indirectly
Common example: Modern x86 chips
Extreme example: SGI Altix
Altix Pointers
36-bit node oset (64GB per node)
11-bit node ID
Main memory used to cache remote nodes memory
Cache coherency problem across nodes, not just cores
Similar problem on all single system image clusters
Look, No Cache: Cell
Cell SPU cores have 256KB of local memory
Same idea as cache, but explicitly managed
Programmer copies between it and main memory
Tries to do work entirely in local memory
Very fast, not so easy to use
No Cache: Stream Processors
Caches work well with certain categories of problem
Very badly with others
Some algorithms run on entire data set...
...but in a predictable sequence
Stream processors are optimised to deliver the entire contents
of memory to the program quickly in order
Typically also have some local memory
Examples: Most DSPs, GPUs
Questions / Feedback?
Is the speed okay? Too fast? Too slow?
Is the material too abstract? Too specic?

BS en 10213 - 2016
Document32 pages
BS en 10213 - 2016
Артем Титов
100% (4)
Cache Memory - January 2014
Document90 pages
Cache Memory - January 2014
Dare Devil
No ratings yet
Module 4: Memory System Organization & Architecture
Document97 pages
Module 4: Memory System Organization & Architecture
Surya Sunder
No ratings yet
2.2 DD2356 Threads
Document22 pages
2.2 DD2356 Threads
Daniel Araújo
No ratings yet
04 Cache Memory
Document36 pages
04 Cache Memory
mubarra shabbir
No ratings yet
Computer Organization & Architecture: Cache Memory
Document52 pages
Computer Organization & Architecture: Cache Memory
muhammad farooq
No ratings yet
Memory Hierarchy: Haresh Dagale Dept of ESE
Document32 pages
Memory Hierarchy: Haresh Dagale Dept of ESE
mailstonaik
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
Document32 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
Radwa Tawfeek
No ratings yet
23 Cache Memory Basics 11-03-2024
Document19 pages
23 Cache Memory Basics 11-03-2024
suhas kodakandla
No ratings yet
Memory Hierarchy
Document10 pages
Memory Hierarchy
Mayank Khullar
No ratings yet
Week 5 Discussion
Document38 pages
Week 5 Discussion
Siri Varshitha Reddy Lingareddy
No ratings yet
Main Memory: IS104 Operating Systems Concept
Document62 pages
Main Memory: IS104 Operating Systems Concept
GABRIEL BUFFON HARAHAP (00000061688)
No ratings yet
11 Cache Memory, Internal, External
Document102 pages
11 Cache Memory, Internal, External
bezelx1
No ratings yet
Cache and Memory Systems
Document100 pages
Cache and Memory Systems
Jay Jaber
No ratings yet
CH06 Memory Organization
Document85 pages
CH06 Memory Organization
Biruk Kassahun
No ratings yet
Computer Architecture and Organization (Eeng 3192) : 4. The Memory Unit
Document25 pages
Computer Architecture and Organization (Eeng 3192) : 4. The Memory Unit
Esuyawkal Adugna
No ratings yet
Chache Memory, Internal Memory and External Memory
Document113 pages
Chache Memory, Internal Memory and External Memory
Ephraim
No ratings yet
Chapter - 3 Memory Systems
Document78 pages
Chapter - 3 Memory Systems
cudarun
No ratings yet
This Unit: Caches: - Basic Memory Hierarchy Concepts
Document24 pages
This Unit: Caches: - Basic Memory Hierarchy Concepts
jameskipkosgei
No ratings yet
Memory Systems
Document36 pages
Memory Systems
Bisma Amir
No ratings yet
Lecture Memory
Document55 pages
Lecture Memory
Mohamad Yassine
No ratings yet
Memory Organization
Document57 pages
Memory Organization
vpriyacse
No ratings yet
Lecture Memory Removed
Document33 pages
Lecture Memory Removed
Mohamad Yassine
No ratings yet
Main Memory
Document57 pages
Main Memory
Bilal Warraich
No ratings yet
Cache Memory
Document39 pages
Cache Memory
hariprasathk
No ratings yet
Memory Cache
Document18 pages
Memory Cache
Funsuk Vangdu
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
Document43 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
abbas
No ratings yet
MemoryManagement PDF
Document44 pages
MemoryManagement PDF
alihamza
No ratings yet
Chapter5-The Memory System
Document36 pages
Chapter5-The Memory System
v69gnf5csy
No ratings yet
R RRRRRRRR Final
Document28 pages
R RRRRRRRR Final
Rachell Benemerito
No ratings yet
Memory Management
Document71 pages
Memory Management
Raza Ahmad
No ratings yet
Memory Design: SOC and Board-Based Systems
Document48 pages
Memory Design: SOC and Board-Based Systems
SathiyaSeela
No ratings yet
Chapter 5
Document87 pages
Chapter 5
supersibas11
No ratings yet
Memory2 PDF
Document93 pages
Memory2 PDF
vidishsa
No ratings yet
Storage Management
Document11 pages
Storage Management
Simbiso
No ratings yet
5 1
Document39 pages
5 1
tinni09112003
No ratings yet
Week 4
Document13 pages
Week 4
hussmalik69
No ratings yet
CH 06
Document58 pages
CH 06
Yohannes Dereje
No ratings yet
9장 캐시
Document65 pages
9장 캐시
민규
No ratings yet
Lecture-04 & 05, Adv. Computer Architecture, CS-522
Document63 pages
Lecture-04 & 05, Adv. Computer Architecture, CS-522
torabgull
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
Document53 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
Mark
No ratings yet
1-MulticoreArchitecture Basics
Document34 pages
1-MulticoreArchitecture Basics
pdvdm
No ratings yet
Lecture 12: Memory Hierarchy - Cache Optimizations: CSCE 513 Computer Architecture
Document69 pages
Lecture 12: Memory Hierarchy - Cache Optimizations: CSCE 513 Computer Architecture
Fahim Shaik
No ratings yet
Lecture-04, Adv. Computer Architecture, CS-522
Document39 pages
Lecture-04, Adv. Computer Architecture, CS-522
torabgull
No ratings yet
Week 4
Document12 pages
Week 4
Kamran Anwar
No ratings yet
03 Hardware
Document38 pages
03 Hardware
Rasha Elsayed Sakr
No ratings yet
Lecture 3
Document24 pages
Lecture 3
Mohsin Ali
No ratings yet
Core Performance
Document13 pages
Core Performance
Narendra Sv
No ratings yet
MemoryAbstraction
Document23 pages
MemoryAbstraction
Tu Tu
No ratings yet
COSS - Contact Session - 4
Document34 pages
COSS - Contact Session - 4
Prasana Venkatesh
No ratings yet
Memory
Document57 pages
Memory
dokumbaro30
No ratings yet
Lecture 9-10 Computer Organization and Architecture
Document25 pages
Lecture 9-10 Computer Organization and Architecture
shashank kumar
No ratings yet
More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture
Document31 pages
More Elaborations With Cache & Virtual Memory: CMPE 421 Parallel Computer Architecture
djrive
No ratings yet
Unit 5 Memory System
Document77 pages
Unit 5 Memory System
subithav
No ratings yet
Cache Memory: 13 March 2013
Document80 pages
Cache Memory: 13 March 2013
Guru Balan
No ratings yet
Chapter 03
Document58 pages
Chapter 03
Tram Nguyen
No ratings yet
Multi-Core Architectures
Document43 pages
Multi-Core Architectures
vinotd1
100% (1)
Memory Part II
Document56 pages
Memory Part II
Ronald Ferrufino
No ratings yet
Memory Based Approach
Document7 pages
Memory Based Approach
adnan
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Lock Free
Document4 pages
Lock Free
savio77
No ratings yet
CS109A Notes For Lecture 3/15/96 Representing Strings
Document2 pages
CS109A Notes For Lecture 3/15/96 Representing Strings
savio77
No ratings yet
Notes 17
Document3 pages
Notes 17
savio77
No ratings yet
CS109A Notes For Lecture 3/8/96 Data Structures: Linked List Next Array
Document4 pages
CS109A Notes For Lecture 3/8/96 Data Structures: Linked List Next Array
savio77
No ratings yet
Notes 13
Document4 pages
Notes 13
savio77
No ratings yet
CS109A Notes For Lecture 2/28/96 Binary Trees: Empty
Document3 pages
CS109A Notes For Lecture 2/28/96 Binary Trees: Empty
savio77
No ratings yet
CS109A Notes For Lecture 1/31/96 Measuring The Running Time of Programs Example
Document4 pages
CS109A Notes For Lecture 1/31/96 Measuring The Running Time of Programs Example
savio77
No ratings yet
Notes 12
Document4 pages
Notes 12
savio77
No ratings yet
CS109A Notes For Lecture 3/4/96 Priority Queues: Priorities
Document3 pages
CS109A Notes For Lecture 3/4/96 Priority Queues: Priorities
savio77
No ratings yet
Notes 10
Document4 pages
Notes 10
savio77
No ratings yet
CS109A Notes For Lecture 1/17/96 Simple Inductions: Basis Inductive Step Inductive Hypothesis Not
Document4 pages
CS109A Notes For Lecture 1/17/96 Simple Inductions: Basis Inductive Step Inductive Hypothesis Not
savio77
No ratings yet
CS109A Notes For Lecture 1/24/96 Proving Recursive Programs Work
Document3 pages
CS109A Notes For Lecture 1/24/96 Proving Recursive Programs Work
savio77
No ratings yet
CS109A Notes For Lecture 1/26/96 Running Time
Document4 pages
CS109A Notes For Lecture 1/26/96 Running Time
savio77
No ratings yet
CS109A Notes For Lecture 1/12/96 The Essence of Proof: Modus Ponens
Document5 pages
CS109A Notes For Lecture 1/12/96 The Essence of Proof: Modus Ponens
savio77
No ratings yet
System & Network Performance Tuning: Hal Stern
Document184 pages
System & Network Performance Tuning: Hal Stern
savio77
No ratings yet
CS109A Notes For Lecture 1/19/96 Recursive Denition of Expressions Basis: Induction
Document3 pages
CS109A Notes For Lecture 1/19/96 Recursive Denition of Expressions Basis: Induction
savio77
No ratings yet
Unix Commands
Document16 pages
Unix Commands
Vageesh Porwal
No ratings yet
CS109A Notes For Lecture 1/10/96 Major Theme: Data Models: Static Part Dynamic Part
Document4 pages
CS109A Notes For Lecture 1/10/96 Major Theme: Data Models: Static Part Dynamic Part
savio77
No ratings yet
I - Systems Programming Meets Full Dependent Types: Edwin C. Brady
Document12 pages
I - Systems Programming Meets Full Dependent Types: Edwin C. Brady
savio77
No ratings yet
Current Sensing Chip Resistors: NCST Series
Document6 pages
Current Sensing Chip Resistors: NCST Series
savio77
No ratings yet
Android Development For C# and Visual Studio 2012
Document2 pages
Android Development For C# and Visual Studio 2012
johnvhorcajo
No ratings yet
DTC P0560 System Voltage: Monitor Description
Document4 pages
DTC P0560 System Voltage: Monitor Description
jeremih alhegn
No ratings yet
Mercedes-Benz Vito-Van-CrewCab W639-I Specifications 201001
Document7 pages
Mercedes-Benz Vito-Van-CrewCab W639-I Specifications 201001
Ikm Km
No ratings yet
High Voltage Coils
Document20 pages
High Voltage Coils
Marco Sussuarana
No ratings yet
1404 E EUC 3 Manual and Automatic Compensation of Reactive Power
Document20 pages
1404 E EUC 3 Manual and Automatic Compensation of Reactive Power
Раденко Марјановић
No ratings yet
S525 Shears - en
Document35 pages
S525 Shears - en
noname1789
No ratings yet
Sinumerik 802d SL
Document51 pages
Sinumerik 802d SL
Tony Marasca
No ratings yet
Clean Limitless Free Energy Establishing A Tesla Tower in Serbia
Document9 pages
Clean Limitless Free Energy Establishing A Tesla Tower in Serbia
Branko R Babic
No ratings yet
ESI工程学
Document34 pages
ESI工程学
苏毅
No ratings yet
500kva PCC3201 QSX15 Wiring
Document39 pages
500kva PCC3201 QSX15 Wiring
jesus ontiveros
No ratings yet
Bussmann Advanced Guide To SCCRs
Document11 pages
Bussmann Advanced Guide To SCCRs
Brian Burke Sr.
No ratings yet
01-07-2021-1625129537-6-Impact - Ijrhal-14. Ijrhal - Design and Analysis of Flexvator
Document4 pages
01-07-2021-1625129537-6-Impact - Ijrhal-14. Ijrhal - Design and Analysis of Flexvator
Impact Journals
No ratings yet
AIChE Wilhelm April 08 Process Safety
Document16 pages
AIChE Wilhelm April 08 Process Safety
mostafa_1000
No ratings yet
Reflector Antennas PDF
Document78 pages
Reflector Antennas PDF
sankararao gullipalli
No ratings yet
OHS Procedures in Installing Devices
Document7 pages
OHS Procedures in Installing Devices
Amir M. Villas
100% (1)
Asme Section Ix Interpretations
Document72 pages
Asme Section Ix Interpretations
Emmanuel
No ratings yet
Java Mail To Yahoo and Gmail Accounts
Document2 pages
Java Mail To Yahoo and Gmail Accounts
AshishKumarMishra
No ratings yet
C32 Marine Engine Electrical System: Harness and Wire Electrical Schematic Symbols
Document4 pages
C32 Marine Engine Electrical System: Harness and Wire Electrical Schematic Symbols
ait mimoune
No ratings yet
Indicator: User Instructions
Document32 pages
Indicator: User Instructions
franki pdg
No ratings yet
Sap Labs
Document91 pages
Sap Labs
Sachin Singh
100% (2)
Architectural Details and Representation: Sanjana Gupta SRNO24 2013PA0072
Document8 pages
Architectural Details and Representation: Sanjana Gupta SRNO24 2013PA0072
Sanjana Gupta
No ratings yet
(For Sub Engineer) Constructon Materials Sangha
Document65 pages
(For Sub Engineer) Constructon Materials Sangha
suman subedi
No ratings yet
Ship Registration and Classification.
Document9 pages
Ship Registration and Classification.
RENU
100% (2)
Sonalika International Di-20 Gardentrac - 4wd - Tractor
Document23 pages
Sonalika International Di-20 Gardentrac - 4wd - Tractor
Prannen Batra
100% (3)
A Microcontroller Based System For Determining Instantaneous Wind Speed and
Document4 pages
A Microcontroller Based System For Determining Instantaneous Wind Speed and
mhasan_swin
No ratings yet
Safety Rules Before Starting Work
Document3 pages
Safety Rules Before Starting Work
Gebshet Woldetsadik
No ratings yet
ISO Symbols
Document49 pages
ISO Symbols
tarneem
100% (1)
MicroPower MTM-HF3200, 6000 y 8000 (Cargador) (MultIdioma) PDF
Document68 pages
MicroPower MTM-HF3200, 6000 y 8000 (Cargador) (MultIdioma) PDF
Juan Carlos Rubio Fresco
No ratings yet
Aerodynamic 1
Document6 pages
Aerodynamic 1
BOB
No ratings yet