Professional Documents
Culture Documents
Ee6304 MJ Lec 7
Ee6304 MJ Lec 7
Myoungsoo Jung Assistant Professor Department of Electrical Engineering University of Texas at Dallas
Views of Memory
Real machines have limited amounts of memory
Do y you think, , oh, , this computer p only y has 128MB so Ill write my code this way f you run on a d different fferent What happens if machine?
When programming, you dont care about y how much real memory there is Even if you use a lot, memory can always b paged d to di k be disk
0-2GB
Kernel
Stack
Stack
which unfortunately y is often < 4GB and is almost m never 4GB per process and is never 16 exabytes per process
Pages
Memory is divided into pages, which are nothing more than fixed sized and aligned regions of memory
Page Table
Map from virtual addresses to physical locations
0K 0K 4K 8K 12K Page Table implements this VP mapping 4K 8K 12K 16K 20K 24K Virtual Addresses Entry includes permissions (e.g., read readonly) 28K Physical Addresses
0xFC519
Page Tables
0K 4K 8K 12K
0K 4K 8K 12K
What is in a Page Table Entry (or PTE)? Pointer to actual page Permission P i i bit bits: valid, lid read-only, d l read-write, d it write-only it l Example: Intel x86 architecture PTE: Address same format previous slide (10, 10, 12-bit offset) Intermediate I di page tables bl called ll d Di Directories i Page Frame Number (Physical Page Number) 31-12 P: W: U: PWT: PCD: A: D: L: Free 0 L D A UW P (OS) 11-9 8 7 6 5 4 3 2 1 0
PWT T
Present (same as valid bit in other architectures) Writeable User accessible Page write transparent: external cache write-through Page cache disabled (page cannot be cached) Accessed: page has been accessed recently Dirty y (PTE only): y page p g has been modified recently y L=14MB page (directory only). Bottom 22 bits of virtual address serve as offset
PCD D
10 bits
12 bits
Offset
Address: Page #
Physical Physical
Offset
4KB
PageTablePtr
4 bytes
Single-Level Page Table Large 32 bit address 1M 4KB pages for a 32-bit entries Each process needs own page table! Multi-Level Page Table Can allow sparseness of page table Portions of table can be swapped to disk
4 bytes
TLB Review
TLBs are:
Small typically yp y not more than 128 256 entries Fully Associative
CPU
Virtual Address
TLB
Cached? C h d? Yes No
Physical Address dd
Physical Memory
Translate (MMU) Data Read or Write (untranslated) Question is one of page locality: does it exist? Instruction accesses spend a lot of time on the same page (since accesses sequential) Stack accesses have definite locality of reference locality but still some some Data accesses have less page locality, Can we have a TLB hierarchy? Sure: multiple levels at different sizes/speeds
Implementing LRU
Have LRU counter for each line in a set When line accessed
Get old value X of its counter Set its counter to max value y other line in the set For every
If counter larger than X, decrement it W When replacement p m needed
Check for pages not used recently g used Table Mark p pages g as Page not recently y
dirty used
Replace an old page, not the oldest page Details: per physical p y page: p g Hardware use bit p Hardware sets use bit on each reference If use bit isnt set, means not referenced in a long time On page fault: Advance Ad clock l k h hand d ( (not t real l ti time) ) Check use bit: 1used recently; clear and leave alone 0selected candidate for replacement
0 0 1 1 0
...
Memory D-Cache
Write Reg WB
I-Cache
TLB
0xx User segment (caching based on PT/TLB entry) 100 Kernel physical space, cached 101 Kernel physical space, uncached p 11x Kernel virtual space Allows context switching among 64 user processes without TLB flush
PA
P page no.
offset 10
Machines with TLBs go one step further: they overlap TLB lookup with cache access. access
4K Cache 4 bytes
1 K
FN Data
Hit/ Miss
What if cache size is increased to 8KB? Overlap not complete Need to do something else Another option: Virtual Caches Tags in cache are virtual addresses Translation only happens on cache misses
TLB misses are significant in processor performance f most systems cant access all of 2nd level cache without TLB misses!
Caches, TLBs, Virtual Memory all understood by examining i i h how th they deal d l with ith 4 questions: ti 1) Where can block be placed? 2) How is block found? 3) What block is replaced on miss? 4) How are writes handled? Today VM allows many processes to share single memory without having to swap all processes to disk;
(H d (Hardware) )
Raise priority
add
External Interrup pt
subi slli
lw
r1,20(r0)
Restore registers Clear current Int Disable All Ints Restore priority p y RTE
Alternative: Polling
Disable Network Intr subi slli lw lw add dd sw lw beq lw lw addi sw Clear r4,r1,#4 r4 r4 #2 r4,r4,#2 r2,0(r4) r3,4(r4) r2,r2,r3 2 2 3 8(r4),r2 r1,12(r0) r1,no_mess r1,20(r0) r2,0(r1) r3,r0,#5 0(r1),r3 Network Intr
Handler
no_mess:
Interrupts good for infrequent/irregular events Interrupts good for ensuring regular/predictable service of events.
Trap/Interrupt classifications
Traps: relevant to the current process Faults, arithmetic traps, and synchronous traps Invoke software on behalf of the currently executing process Interrupts: caused by asynchronous, outside events I/O devices requiring service (DISK, network) Clock interrupts (real time scheduling) Machine h Checks: h k caused d b by serious h hardware d f failure l Not always restartable Indicate I di t th that t b bad d thi things h have h happened. d Non-recoverable ECC error Machine room fire Power outage
N Network Interrupt
Interrupt Controller
Prior rity Encoder Inte errupt Mask M IntID Interrupt p
CPU
Int D Disable
Timer
Network
Software f Interrupt
Control
NMI
Interrupts inv invoked ked with interrupt lines fr from m devices Interrupt controller chooses interrupt request to honor Mask enables/disables interrupts p Priority encoder picks highest enabled interrupt Software Interrupt Set/Cleared by Software Interrupt identity specified with ID line CPU can disable all interrupts with internal flag Non-maskable interrupt line (NMI) cant be disabled
Raise priority Reenable All Ints Save registers lw r1,20(r0) lw r2,0(r1) , ( ) addi r3,r0,#5 sw 0(r1),r3 Restore registers Clear current Int Disable All Ints Restore priority RTE
An interrupt or exception is considered precise if there is a single instruction (or interrupt point) for which: hi h
All instructions before that have committed their state No following instructions (including the interrupting instruction) have modified any state.
Precise Interrupts/Exceptions
This means, that y you can restart execution at the interrupt point and get the right answer
Implicit in our previous example of a device interrupt: Interrupt I t t point i t i is at t fi first t lw l instruction i t ti
Exte ernal Inter rrupt
add subi bi slli r1,r2,r3 r4,r1,#4 4 1 #4 r4,r4,#2
In nt handle er
lw lw add sw
Key observation: architected state only change in memory and register write stages.
Interrupt point described as: <PC+4 there> (branch was taken) <PC+4,there> or <PC+4,PC+8> (branch was not taken)
On SPARC, interrupt hardware produces pc p (next ( pc) p ) and npc On MIPS, only pc must fix point in software
sin( x ) x
Restartability doesnt require preciseness. However, preciseness i makes k it a lot l t easier i to t restart. t t Simplify the task of the operating system a lot Less state needs to be saved away if unloading process process. Quick to restart (making for fast interrupts)
Summary: Interrupts
Interrupts and Exceptions either interrupt the current instruction or happen between instructions
All instructions before that point have completed p No instructions after or including that point have completed