Professional Documents
Culture Documents
25 e 50 Beb 5 Aad 8 F 60
25 e 50 Beb 5 Aad 8 F 60
and Electronics Ⅱ
Jubee Tada
Graduate School of Science and Engineering,
Yamagata University
Tel:0238-26-3576
E-mail:jubee@yz.yamagata-u.ac.jp
Effects of Memory Hierarchy
1
A Problem with improving performance by using parallelism
Solution:
• Cache Memory
2
Relationship between processor and main memory
• The processor reads instructions and necessary data
from memory and writes the results to memory.
• Without data transfer to and from memory, the
processor cannot operate.
# Register
Address Address Data
0
1 Variable A
0 Instruction 1
Data 4 Instruction 2
…
…
15
Data 100 Variable A
104 Variable B
…
…
ALU
3
Changes in processor performance and memory access times
Memory Processor
100000
10000
1000
100
10
1
1980 1983 1985 1989 1992 1996 1998 2000 2004 2007 2010 2012
4
To fill the performance gap
5
Memory hierarchy
• Combination of small-
Processor High-
capacity, high-speed speed
6
Behavior of cache read
• block
– Unit of data exchange in a cache system Processor
• Data exchange in a cache system
– The processor sends the address of the
block containing the required data to the
cache.
Cache memory
– If the block exists in the cache, send it to
the processor
→Cache Hit
– If it does not exist, read the block from
main memory and store it in the cache.
→Cache Miss Main memory
7
Memory stall
Main
Processor Cache
memory
1cycle Several to
hundreds of
cycles 8
Behavior at a cache hit/miss
0x2000 0000…0010 0x0000 0000…0000
Request data 0x5101 0010…0001 0x0001 0000…0001
Processor at 0x2000
0xA01A 0110…1010
…
Data 0xFF13 1100…0010 0xFFFF 1111…0000
Cache(Hit) Main memory
…
0xFF13 1100…0010 0xFFFF 1111…0000
…
Data 0xFF13 1100…0010 0xFFFF 1111…0000
Data
Cache(Miss) Main memory
9
Handling instructions and data
• Unified cache
– It stores instructions and data in one cache.
– It can reduce hardware costs.
• Split cache
– It stores instructions and data in each cache.
– It can avoid structural hazards.
– Harvard architecture
Instruction
Instruction
/Data Instruction Cache
12
How to choose a storage location
Address Data
000...000 0
000...001 9
Index Data 000...010 3
000 0 000...011 5
9
...
...
001
010 1 011...010 1
011 2 011...011 2
100 4
...
...
101 7
101...100 4
110 2
101...101 7
111 5
...
...
111...100 6
Find the index 111...101 9
corresponding to the 111...110 2
address and store it at that 111...111 5
location 13
Equations for the index
Data Hit
10
...
1021
1022
1023
20 32
20
=
16
Behavior at loading
18
Problems with the write-through method
19
Storing methods of a cache
• Write through
– Writes to both cache and main memory
• Write buffer
– A device that temporarily stores writes.
• Write back
– Writes only the blocks targeted for replacement
back to main memory
20
Write buffer
→L1 cache
Levels in Level2
• The next level cache of the memory
hierarchy
Access time
L1 cache Level3
→L2 cache
• The cache closest to Main ・・・
memory
Level n
→LLC(Last Level Cache) (LLC)
long
Memory capacity
23
Methods for improving cache performance
24
Cold-start miss
26
Structure of a cache(large block size)
31 30 29 28 ... 15 14 13 12 ... 5 4 3 2 1 0
Block Byte
Tag Index offset offset
Data Hit
10
...
1021
1022
1023
18 32
18
=
27
Effects of block size on a cache miss rate
David A.Patterson/John L.Hennessy, “Computer Organization and Design, Fifth Edition: The Hardware/Software Interface”
28
Capacity miss
…
3 A[3]
4 A[4] 45 A[45]
5 A[5] 46 A[46]
…
6 A[6]
…
3 3
4 202 202
5 203 203
…
…
6
7 402 402
• Solution:
– Adopt set associative cache
Increase the hardware cost
Increase the access time of the cache
30
Reducing the cache miss rate using the set associative
Full associative
Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data Tag Data
34
Structure of a cache (2-way set associative)
31 30 29 28 ... 14 13 12 11 10 9 ... 3 2 1 0
Byte
Tag Index offset
21 9
Index Valid Tag Data Valid Tag Data
0
1
...
511
= 32 = 32
2-1multiplexer
Hit Data 35
Effects of associativity on cache miss rate
David A.Patterson/John L.Hennessy, “Computer Organization and Design, Fifth Edition: The Hardware/Software Interface”
36
Trends of Microprocessors
37
History of the Intel Core i processor
39
Extensions of SIMD instructions
• AVX-512
– Register length changed from conventional AVX2 (256-bit)
to 512-bit
– If it is single precision (32-bit), 16 operations can be
performed simultaneously.
– Number of registers increased from 16 to 32
– Addition of various new instructions
– Predication support
• By using the mask register, it is possible to set whether or not to
execute an operation individually.
• Like ARM's conditional execution, it is possible to implement code
that includes branch judgment without branch instructions.
– Not compatible with Intel's 12th generation Core (Alder
Lake)
40
big.LITTLE
41
Increasing in cache capacity
David A.Patterson/John L.Hennessy, “Computer Organization and Design, Fifth Edition: The Hardware/Software Interface”
43
Yield and die cost
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑎 𝑊𝑎𝑓𝑒𝑟
𝐷𝑖𝑒𝑠 𝑝𝑒𝑟 𝑊𝑎𝑓𝑒𝑟 ≈
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑎 𝑑𝑖𝑒
1
𝑌𝑖𝑒𝑙𝑑 = 2
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑎 𝐷𝑖𝑒
1 + 𝐷𝑖𝑓𝑒𝑐𝑡𝑠 𝑝𝑒𝑟 𝑢𝑛𝑖𝑡 𝑎𝑟𝑒𝑎 ×
2
Silocon TSV
NVIDIA H100
46
Conclusions
48