Professional Documents
Culture Documents
Os 3
Os 3
MEMORY MANAGEMENT
MAIN MEMORY
Memory is central to the operation of a modern computer system. Memory consists of a large
array of bytes, each with its own address.
The CPU fetches instructions from memory according to the value of the program counter.
These instructions may cause additional loading from and storing to specific memory addresses.
Basic Hardware
All memory accesses are equivalent i.e.the memory hardware doesn't know what a particular
part of memory is being used.
The CPU can only access its registers and main memory. It cannot, for example, make direct
access to the hard drive, so any data stored there must first be transferred into the main
memory chips before the CPU can work with it.
Memory accesses to registers are very fast, generally one clock tick, and a CPU may be able to
execute more than one machine instruction per clock tick.
Memory accesses to main memory are comparatively slow, and may take a number of clock
ticks to complete. The basic idea of the cache is to transfer chunks of memory at a time from the
main memory to the cache, and then to access individual memory locations one at a time from
the cache.
User processes must be restricted so that they only access memory locations that "belong" to
that particular process. Every memory access made by a user process is checked against these
two registers, base register and a limit register and if a memory access is attempted outside the
valid range, then a fatal error is generated.
1
Operating Systems 2
The base register holds the smallest legal physical memory address; the limit register specifies
the size of the range.
Hardware Address Protection
Figure 3.2 Hardware address protection with base and limit registers.
Protection of memory space is accomplished by having the CPU hardware compare every
address generated in user mode with the registers. Any attempt by a program executing in user mode
to access operating-system memory or other users’ memory results in a trap to the operating
system, which treats the attempt as a fatal error.
Address binding
In general, to run a program, it must be brought into memory. The collection of processes onthe
disk that is waiting to be brought into memory for execution is called input queue. User programs go
through several steps before being run. Address binding is defined as mapping of instructions and
data from one address to another address in memory.
Three different stages of binding are:
Compile time: Must generate absolute code if memory location is known in prior.
Load time: Must generate relocatable code if memory location is not known at compile time.
Execution time: If the process can be moved during its execution from one memory segment
to another, then binding must be delayed until run time.Need hardware support for address
maps (e.g., base and limit registers).
Operating Systems 3
Advantage
A routine is loaded only when it is needed. This method is particularly useful when large
amounts of code are needed to handle infrequently occurring cases, such as error routines.
Dynamic loading does not require special support from the OS. It is the responsibility of the
users to design their programs.
Dynamic Linking
Dynamically linked libraries are system libraries that are linked to user programs when the
programs are run.
The concept of dynamic linking is similar to that of dynamic loading. Here, the linking is done
at execution time.
With dynamic linking, a stub is included in the image for each library-routine reference.
The stub is a small piece of code that indicates how to locate the appropriate memory-resident
library routine or how to load the library if the routine is not already present.
Operating Systems 5
When the stub is executed, it checks to see whether the needed routine is already in
memory.
If not, the program loads the routine into memory.
This feature can be extended to library updates (such as bug fixes). A library may be replaced
by a new version, and all programs that reference the library will automatically use the new
version.
SWAPPING
Memory Protection
Memory protection is required to protect OS from the user processes and user processes from
one another.
A relocation register contains the value of the smallest physical address.
Operating Systems 6
The limit register contains the range of logical addresses (for example, relocation = 100040 and
limit = 74600).
Each logical address must be less than the limit register. If a logical address is greater than the
limit register, then there is an addressing error and it is trapped.
The MMU maps the logical address dynamically at run time, by adding the logical address to
the relocation register. This added value is the physical address which is sent to the memory.
When the CPU scheduler selects a process for execution, the dispatcher loads the relocation
and limit registers with the correct values.
Every address generated by a CPU is checked against these registers; both the operating
system and the other users’ programs and data are protected from being modified by this
running process.
Memory Allocation
One of the simplest methods for allocating memory is to divide memory into several fixed-sized
partitions. Each partition may contain exactly one process.
Thus, the degree of multiprogramming is bound by the number of partitions.
Multiple-partition method
When a partition is free, a process is selected from the input queue and is loaded into the free
partition.
When the process terminates, the partition becomes available for another process.
This method is no longer in use. The method described next is a generalization of the fixed-
partition scheme (called MVT); it is used primarily in batch environments.
Variable-partition scheme
The operating system keeps a table indicating which parts of memory are available and which
are occupied.
Initially the whole of available memory is treated as one large block of memory called a hole.
The processes that enter a system are maintained in an input queue.
Operating Systems 7
When a process arrives and needs memory, the system searches the set for a hole that is large
enough for this process.
If the hole is too large, it is split into two parts. One part is allocated to the arriving process;
the other is returned to these hole.
When a process terminates, it releases its block of memory, which is then placed back in these
to holes. If the new hole is adjacent to other holes, these adjacent holes are merged to form one
larger hole.
Fragmentation
The main drawback of contiguous memory allocation is fragmentation i.e., memory is available
but cannot be used.
There are two types of fragmentation:
Internal fragmentation
Memory block assigned to process is bigger than requested. When memory is free
internally, that is inside a process but it cannot be used.
External fragmentation
It exists when there is enough total memory space to satisfy a request, but the
available spaces are not contiguous; storage is fragmented into a large number of small holes.
Both the first-fit and best-fit strategies for memory allocation suffer from external
fragmentation.
One solution to the problem of external fragmentation is compaction. The goal is to
shuffle the memory contents so as to place all free memory together in one large block.
The simplest compaction algorithm is to move all processes toward one end of memory; all holes
move in the other direction, producing one large hole of available memory. This scheme can be
expensive.
Operating Systems 8
SEGMENTATION
An important aspect of memory management that became unavoidable with paging is the
separation of the user's view of memory and the actual physical memory.
The user's view of memory is not the same as the actual physical memory. The user's view is
mapped onto physical memory.
Basic Method
Users prefer to view memory as a collection of variable-sized segments, with no necessary
ordering among segments.
Consider how you think of a program when you are writing it. You think of it as a main
program with a set of methods, procedures, or functions.
Segmentation Hardware
Example
Considering the example, there are five segments numbered from 0 through 4. The segments
are stored in physical memory as shown. The segment table has a separate entry for each segment,
giving the beginning address of the segment in physical memory (or base) and the length of that
segment (or limit).
For example, segment 2 is 400 bytes long and begins at location 4300.
Thus, a reference to byte 53 of segment 2 is mapped onto location 4300 + 53 = 4353.
A reference to segment 3, byte 852, is mapped to 3200 (the base of segment 3) + 852 = 4052.
A reference to byte 1222 of segment 0 would result in a trap to the operating system, as this
segment is only 1,000 bytes long.
Operating Systems 10
PAGING
Paging is a memory-management scheme that permits the physical address space of a process
to be non-contiguous.
Paging avoids external fragmentation and the need forcompaction, whereas segmentation does
not.
Basic Method
The page size (like the frame size) is defined by the hardware. The size of a page is typically a
power of 2, varying between 512 bytes and 16 MB per page, depending on the computer
architecture.
If the logical address size is 2^m and the page size is 2^n, then the high-order m-n bits of a
logical address designate the page number and the remaining n bits represent the offset.
Thus, the logical address is as follows:
Example
Consider the memory where in the logical address, n= 2 and m = 4. Using a page size of 4 bytes
and a physical memory of 32 bytes (8 pages), it is shown that how the user's view of memory can be
mapped into physical memory.
Solution:
Logical address 0 is page 0, offset 0. Indexing into the page table, it is found that page 0 is in
frame 5. Thus, logical address 0 maps to physical address 20 [= (5 × 4) +0].
Logical address 3 (page 0, offset 3) maps to physical address 23 [= (5 × 4) + 3].
Logical address 4 is page 1, offset 0; According to the page table, page 1 is mapped to frame 6.
Thus, logical address 4 maps to physical address 24 [= (6 × 4) + 0].
Logical address 13 maps to physical address 9.
Figure: Free frames (a) before allocation and (b) after allocation
An important aspect of paging is the clear separation between the user's view of memory and
the actual physical memory.
The user program views memory as one single space, containing only this one program. In fact,
the user program is scattered throughout physical memory, which also holds other programs.
The logical addresses are translated into physical addresses by the address-translation
hardware. This mapping is hidden from the user and is controlled by the OS.
The user process has no way of addressing memory outside of its page table, and the table
includes only those pages that the process owns.
Since the OS is managing physical memory, it must be aware of the allocation details of
physical memory-which frames are allocated, which frames are available, how many total
frames there are, and so on.
This information is generally kept in a data structure called a frame table. The frame table has
one entry for each physical page frame, indicating whether the latter is free or allocated and, if
it is allocated, to which page of which process or processes
Hardware Support
Implementation of Page Table
Page table is kept in main memory. Page-table base register (PTBR) points to the page table.
Page-table length register (PRLR) indicates size of the page table.
In this scheme every data/instruction access requires two memory accesses. One for the page
table and one for the data/instruction.
The two memory access problem can be solved by the use of a special fast-lookup hardware
cache called associative memory or translation look-aside buffers (TLBs)
Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies
each process to provide address-space protection for that process.
Operating Systems 14
TLB
The TLB is associative, high-speed memory. Each entry in the TLB consists of two parts:
a key (or tag) and a value.
When the associative memory is presented with an item, the item is compared with all keys
simultaneously.
If the item is found, the corresponding value field is returned.
The TLB contains only a few of the page-table entries.
When a logical address is generated by the CPU, its page number is presented to the TLB.
If the page number is not in the TLB (known as a TLB miss), a memory reference to the page
table must be made.
Depending on the CPU, this may be done automatically in hardware or via an interrupt to the
operating system.
If the page number is found, its frame number is immediately available and is used to access
memory.
Hit Ratio - The percentage of times that the page number of interest is found in the TLB is
called the hit ratio.
If we fail to find the page number in the TLB then we must first access memory for the page
table and frame number (100 nanoseconds) and then access the desired byte in memory (100
nanoseconds), for a total of 200 nanoseconds.
effective access time = 0.80 × 100 + 0.20 × 200 = 120 nanoseconds
For a 99-percent hit ratio, which is much more realistic, we have
effective access time = 0.99 × 100 + 0.01 × 200 = 101 nanoseconds
Protection
Memory protection is implemented by associating protection bit with each frame. Normally,
these bits are kept in the page table. One bit can define a page to be read-write or read-only
Every reference to memory goes through the page table to find the correct frame number. At
the same time that the physical address is being computed, the protection bits can be checked
to verify that no writes are being made to a read-only page.
An attempt to write to a read-only page causes a hardware trap to the operating system (or
memory-protection violation).
One additional bit is generally attached to each entry in the page table: a valid-invalid bit.
When this bit is set to ``valid'', the associated page is in the process's logical
address space and is thus a legal (or valid) page.
When the bit is set to ``invalid'', the page is not in the process's logical address
space.
Illegal addresses are trapped by use of the valid-invalid bit. The OS sets this bit for each page
to allow or disallow access to the page.
Shared Pages
An advantage of paging is the possibility of sharing common code. This consideration is
particularly important in a time-sharing environment
Reentrant code is non-self-modifying code; it never changes during execution. Thus, two
or more processes can execute the same code at the same time.
Each process has its own copy of registers and data storage to hold the data for the
process's execution. The data for two different processes will, of course, be different.
Only one copy of the editor need be kept in physical memory. Each user's page table
maps onto the same physical copy of the editor, but data pages are mapped onto
different frames.
VIRTUAL MEMORY
Virtual memory is a technique that allows the execution of processes that are not completely in
memory.
One major advantage of this scheme is that programs can be larger than physical
memory.
Further, virtual memory abstracts main memory into an extremely large, uniform array
of storage, separating logical memory as viewed by the user from physical memory.
Virtual memory also allows processes to share files easily and to implement shared memory.
Virtual memory is not easy to implement, however, and may substantially decrease
performance if it is used carelessly.
Operating Systems 17
Background
The instructions being executed must be in physical memory.
An examination of real programs shows us that, in many cases, the entire program (in memory)
is not needed.
Programs often have code to handle unusual error conditions (seldom used).
Arrays, lists, and tables are often allocated more memory than they actually need.
Certain options and features of a program may be used rarely.
The ability to execute a program that is only partially in memory would offer many benefits:
A program would no longer be constrained by the amount of physical memory that is
available (simplifying the programming task).
Because each user program could take less physical memory, more programs could be
run at the same time, with a corresponding increase in CPU utilization and throughput
but with no increase in response time or turnaround time.
Less I/O would be needed to load or swap each user program into memory, so each user
program would run faster.
Virtual memory involves the separation of logical memory as perceived by users from physical
memory. This separation allows an extremely large virtual memory to be provided for
programmers when only a smaller physical memory is available.
Figure: Diagram showing virtual memory that is larger than physical memory
The virtual address space of a process refers to the logical (or virtual) view of how a process is
stored in memory. Typically, this view is that a process begins at a certain logical address-say,
address O -and exists in contiguous memory.
Operating Systems 18
The large blank space (or hole) between the heap and the stack is part of the virtual address
space but will require actual physical pages only if the heap or stack grows.
heap to grow upward in memory as it is used for dynamic memory allocation
stack to grow downward in memory through successive function calls
Virtual address spaces that include holes are known as sparse address spaces. Using a sparse
address space is beneficial because the holes can be filled as the stack or heap segments grow or
if we wish to dynamically link libraries (or possibly other shared objects) during program
execution.
In addition to separating logical memory from physical memory, virtual memory also allows
files and memory to be shared by two or more processes through page sharing.
DEMAND PAGING
Consider how an executable program might be loaded from disk into memory.
One option is to load the entire program in physical memory at program execution time.
However, a problem with this approach is that we may not initially need the entire
program in memory.
Demand paging load pages only as they are needed.
Operating Systems 19
A demand-paging system is similar to a paging system with swapping where processes reside
in secondary memory (usually a disk).
To execute a process, process must be swapped into the memory. Instead of swapping entire
process, a page that is needed is swapped. A lazy swapper is used for this purpose.
A lazy swapper is a swapper never swaps a page into memory unless that page will be
needed.
A swapper manipulates entire processes, whereas a pager is concerned with the
individual pages of a process. The term pager is used instead of swapper, in connection
with demand paging.
Basic Concepts
When a process is to be swapped in, the pager guesses which pages will be used before the
process is swapped out again.
It avoids reading into memory pages that will not be used anyway, decreasing the swap time
and the amount of physical memory needed.
Some form of hardware support is needed to distinguish between the pages that are in memory
and the pages that are on the disk.
The valid -invalid bit scheme can be used for this purpose.
This time however, when this bit is set to ``valid'', the associated page is both legal and
in memory.
If the bit is set to ``invalid'', the page either is not valid (that is, not in the logical
address space of the process) or is valid but is currently on the disk.
The page-table entry for a page that is brought into memory is set as usual, but the
page-table entry for a page that is not currently in memory is either simply marked
invalid or contains the address of the page on disk.
While the process executes and accesses pages that are memory resident, execution proceeds
normally.
Operating Systems 20
Page fault
If the process tries to access a page that was not bought into memory.
Access to a page marked invalid causes a page-fault trap.
The paging hardware, in translating the address through the page table, will notice that the
invalid bit is set, causing a trap to the OS.
The procedure for handling this page fault is straightforward.
Figure: Page table when some pages are not in main memory.
1. OS checks the internal table, to determine whether the reference was a valid or invalid
memory access.
2. If the reference was invalid, terminate the process. If it was valid, but not in main memory.
3. Find a free frame.
4. Swap the page into newly allocated frame.
Operating Systems 21
PAGE REPLACEMENT
In order to make use of virtual memory, several processes are loaded into memory at the same
time. Since it is necessary to only load the pages that are actually needed by each process at any given
time, there is room to load many more processes than to load in the entire process.
However memory is also needed for other purposes (such as I/O buffering), and if some process
suddenly needs more pages and there aren't any free frames available.
Solutions:
1. Adjust the memory used by I/O buffering, etc., to free up some frames for user processes.
2. Swap some process out of memory completely, freeing up its page frames.
3. Find some page in memory that isn't being used right now, and swap that page only out to disk,
freeing up a frame that can be allocated to the process requesting it. This is known as page
replacement, and is the most common solution.
3. Read in the desired page and store it in the frame. Adjust all related page and frame tables to
indicate the change.
4. Restart the process that was waiting for this page.
There is an extra disk written to the page-fault handling, effectively doubling the time required
to process a page fault.
This can be alleviated by assigning a modify bit, or dirty bit to each page, indicating whether
or not it has been changed since it was last loaded in from disk.
If the dirty bit has not been set, then the page is unchanged, and does not need to be written
out to disk.
Otherwise the page write is required.
Requirements to implement a demand paging system.
Frame-allocation algorithm- how many frames are allocated to each process.
Page-replacement algorithm- how to select a page for replacement when there are no free
frames available.
Goal - to generate the fewest number of overall page faults. Because disk access is so slow relative to
memory access, even slight improvements to these algorithms can yield large improvements in overall
system performance.
Algorithms are evaluated using a given string of memory accesses known as a reference
string, which can be generated in one of three common ways:
1. Randomly generated, either evenly distributed or with some distribution curve based on observed
system behavior. This is the fastest and easiest approach.
2. Specifically designed sequences.
3. Recorded memory references from a live system. This may be the best approach, but the amount of
data collected can be enormous, on the order of a million addresses per second.
Operating Systems 24
The volume of collected data can be reduced by making two important observations:
1. Only the page number that was accessed is relevant. The offset within that page does not affect
paging operations.
2. Successive accesses within the same page can be treated as a single page request, because all
requests after the first are guaranteed to be page hits.
For example, if pages were of size 100 bytes, then the sequence of address requests ( 0100, 0432,
0101, 0612, 0634, 0688, 0132, 0038, 0420 ) would reduce to page requests ( 1, 4, 1, 6, 1, 0, 4 ). As the
number of available frames increases, the number of page faults should decrease.
The first three references (7, 0, 1) cause page faults and are brought into these empty frames.
The next reference (2) replaces page 7, because page 7 was brought in first.
Since 0 is the next reference and 0 is already in memory, we have no fault for this reference.
The first reference to 3 results in replacement of page 0, since it is now first in line.
Because of this replacement, the next reference, to 0, will fault.
Operating Systems 25
Page 1 is then replaced by page O. This process continues and there are 15 faults altogether.
Although FIFO is simple and easy, it is not always optimal, or even efficient.
An interesting effect that can occur with FIFO is Belady's anomaly, in which increasing the
number of frames available can actually increase the number of page faults that occur!
The discovery of Belady's anomaly lead to the search for an optimal page-replacement
algorithm, which is simply that which yields the lowest of all possible page-faults, and which
does not suffer from Belady's anomaly.
Such an algorithm does exist, and is called OPT or MIN. This algorithm is simply "Replace the
page that will not be used for the longest time in the future."
For example, by applying OPT to the same reference string used for the FIFO example, the
minimum number of possible page faults is 9. Since 6 of the page-faults are unavoidable (the
first reference to each new page), FIFO can be shown to require 3 times as many (extra) page
faults as the optimal algorithm.
The first three references cause faults that fill the three empty frames.
The reference to page 2 replaces page 7, because 7 will not be used until reference 18,
Whereas page 0 will be used at 5, and page 1 at 14.
The reference to page 3 replaces page 1, as page 1 will be the last of the three pages in memory
to be referenced again.
Operating Systems 26
The prediction behind LRU, the Least Recently Used, and algorithm is that the page that
has not been used in the longest time is the one that will not be used again in the near future.
Some view LRU as analogous to OPT, except looking backwards in time instead of forwards.
(OPT has the interesting property that for any reference string S and its reverse R, OPT will
generate the same number of page faults for S and for R. It turns out that LRU has this same
property.)
LRU yields 12 page faults, (as compared to 15 for FIFO and 9 for OPT.)
1. Counters. Every memory access increments a counter and the current value of this counter is
stored in the page table entry for that page. Then finding the LRU page involves simple searching
the table for the page with the smallest counter value.
2. Stack. Whenever a page is accessed, pull that page from the middle of the stack and place it on the
top. The LRU page will always be at the bottom of the stack.
Both implementations of LRU require hardware support, either for incrementing the counter or
for managing the stack, as these operations must be performed for every memory access.
Neither LRU or OPT exhibit Belady's anomaly. Both belong to a class of page-replacement
algorithms called stack algorithms, which can never exhibit Belady's anomaly.
A stack algorithm is one in which the pages kept in memory for a frame set of size N will
always be a subset of the pages kept for a frame size of N + 1.
Operating Systems 27
Many systems provide a reference bit for every entry in a page table, which is set anytime
that page is accessed.
Initially all bits are set to zero, and they can also all be cleared at any time.
One bit of precision is enough to distinguish pages that have been accessed since the last clear
from those that have not, but does not provide any finer grain of detail.
Additional-Reference-Bits Algorithm
Finer grain is possible by storing the most recent 8 reference bits for each page in an 8-bit byte
in the page table entry, which is interpreted as an unsigned int.
At periodic intervals (clock interrupts), the OS takes over, and right-shifts each of the reference
bytes by one bit.
The high-order (leftmost) bit is then filled in with the current value of the reference bit, and the
reference bits are cleared.
At any given time, the page with the smallest value for the reference byte is the LRU page.
Obviously the specific number of bits used and the frequency with which the reference byte is
updated are adjustable, and are tuned to give the fastest performance on a given hardware
platform.
Second-Chance Algorithm
The second chance algorithm is essentially a FIFO, except the reference bit is used to give
pages a second chance at staying in the page table.
When a page must be replaced, the page table is scanned in a FIFO ( circular queue ) manner.
If a page is found with its reference bit not set, then that page is selected as the next victim.
If, however, the next page in the FIFO does have its reference bit set, then it is given a second
chance:
The reference bit is cleared, and the FIFO search continues.
If some other page is found that did not have its reference bit set, then that page will be
selected as the victim, and this page ( the one being given the second chance ) will be
allowed to stay in the page table.
Operating Systems 28
If there are no other pages that do not have their reference bit set, then this page will be
selected as the victim when the FIFO search circles back around to this page on the second
pass.
If all reference bits in the table are set, then second chance degrades to FIFO, but also requires
a complete search of the table for every page-replacement.
As long as there are some pages whose reference bits are not set, then any page referenced
frequently enough gets to stay in the page table indefinitely.
This algorithm is also known as the clock algorithm, from the hands of the clock moving
around the circular queue.
The enhanced second chance algorithm looks at the reference bit and the modify bit (dirty
bit ) as an ordered page, and classifies pages into one of four classes:
( 0, 0 ) - Neither recently used nor modified.
( 0, 1 ) - Not recently used, but modified.
( 1, 0 ) - Recently used, but clean.
( 1, 1 ) - Recently used and modified.
This algorithm searches the page table in a circular fashion ( in as many as four passes ),
looking for the first page it can find in the lowest numbered category. i.e. it first makes a pass looking
for a ( 0, 0 ), and then if it can't find one, it makes another pass looking for a ( 0, 1 ), etc.
Page-Buffering Algorithms
There are a number of page-buffering algorithms to improve overall performance.
Maintain a certain minimum number of free frames at all times. When a page-fault
occurs, go ahead and allocate one of the free frames from the free list first, to get the
requesting process up and running again as quickly as possible, and then select a victim
page to write to disk and free up a frame as a second step.
Keep a list of modified pages, and when the I/O system is otherwise idle, have it write
these pages out to disk, and then clear the modify bits, thereby increasing the chance of
finding a "clean" page for the next potential victim.
Keep a pool of free frames, but remember what page was in it before it was made free.
Since the data in the page is not actually cleared out when the page is freed, it can be made
an active page again without having to load in any new data from disk.
ALLOCATION OF FRAMES
Minimum Number of Frames
Reasons:
If an instruction (and its operands) spans a page boundary, then multiple pages could be
needed just for the instruction fetch.
Memory references in an instruction touch more pages, and if those memory locations can
span page boundaries, then multiple pages could be needed for operand access also.
The worst case involves indirect addressing, particularly where multiple levels of indirect
addressing are allowed. Left unchecked, a pointer to a pointer to a pointer to a pointer to a . . . could
theoretically touch every page in the virtual address space in a single machine instruction, requiring
every virtual page be loaded into physical memory simultaneously.
So place a limit (say 16) on the number of levels with a counter initialized to the limit and
decremented with every level of indirection in an instruction. If the counter reaches zero, then an
"excessive indirection" trap occurs. This requires a minimum frame allocation of 17 per process.
Allocation Algorithms
i) Equal Allocation - If there are m frames available and n processes to share them, each process
gets m / n frames, and the leftovers are kept in a free-frame buffer pool.
ii) Proportional Allocation - Allocate the frames proportionally to the size of the process, relative to
the total size of all processes. So if the size of process i is S_i, and S is the sum of all S_i, then the
allocation for process P_i is a_i = m * S_i / S.
Variations on proportional allocation could consider priority of process rather than just their size.
All allocations fluctuate over time as the number of available free frames, m, fluctuates, and all
are also subject to the constraints of minimum allocation.
iii) Global versus Local Allocation - With local replacement, the number of pages allocated to a
process is fixed, and page replacement occurs only amongst the pages allocated to this process.
It allows processes to better control their own page fault rates, and leads to more consistent
performance of a given process over different system load levels.
With global replacement, any page may be a potential victim, whether it currently belongs to
the process seeking a free frame or not.
It is more efficient, and is the more commonly used approach.
iv) Non-Uniform Memory Access
It is assumed that all memory is equivalent, or at least has equivalent access times.
CPUs can access memory that is physically located on the same board much faster than the
memory on the other boards.
The basic solution is akin to processor affinity - trying to schedule processes on the same CPU
will minimize cache misses, also trying to allocate memory for those processes on the same boards will
minimize access times.
Operating Systems 31
The presence of threads complicates the picture, especially when the threads get loaded onto
different processors.
Solaris uses an lgroup as a solution, in a hierarchical fashion based on relative latency.
For example, all processors and RAM on a single board would probably be in the same lgroup.
Memory assignments are made within the same lgroup if possible or to the next nearest lgroup
otherwise. (Where "nearest" is defined as having the lowest access time).
THRASHING
A process that is spending more time in paging than executing is said to be thrashing.
Cause of Thrashing
When memory is filled up and processes started spending lots of time waiting for their pages to
page in, then CPU utilization would lower, causing the schedule to add in even more processes and
the system would grind to a halt.
Local page replacement policies can prevent one thrashing process from taking pages
away from other processes.
Figure: Thrashing
To prevent thrashing, a process is provided with as many frames as possible.
The locality model states that processes typically access memory references in a given
locality, making lots of references to the same general area of memory before moving
periodically to a new locality.
If many frames are involved in the current locality, then page faulting would occur primarily on
switches from one locality to another.
Operating Systems 32
Working-Set Model
The working set model is based on the concept of locality, and defines a working set
window, of length delta. Whatever pages are included in the most recent delta page
references are said to be in the processes working set window, and comprise its current
working set.
For example, suppose that the timer is set to go off after every 5000 references ( by any
process ), and can store two additional historical reference bits in addition to the
current reference bit.
Every time the timer goes off, the current reference bit is copied to one of the two
historical bits, and then cleared.
If any of the three bits is set, then that page was referenced within the last 15,000
references, and is considered to be in that processes reference set.
Finer resolution can be achieved with more historical bits and a more frequent timer,
at the expense of greater overhead.
Page-Fault Frequency
A more direct approach is to recognize that what really want to control is the page-fault
rate, and to allocate frames based on this directly measurable value.
If the page-fault rate exceeds a certain upper bound then that process needs more
frames, and if it is below a given lower bound, then it can afford to give up some of its
frames to other processes.
This list is typically populated using a page-replacement algorithm and most likely contains
free pages scattered throughout physical memory.
If a user process requests a single byte of memory, internal fragmentation will result, as the
process will be granted an entire page frame.
Kernel memory is often allocated from a free-memory pool different from the list used to satisfy
ordinary user-mode processes. There are several classic algorithms in place for allocating
kernel memory structures.
Buddy System
The buddy system allocates memory from a fixed-size segment consisting of physically
contiguous pages. Memory is allocated from this segment using a power-of-2 allocator, which
satisfies requests in units sized as a power of 2 (4 KB, 8 KB, 16 KB, and so forth).
A request in units not appropriately sized is rounded up to the next highest power of 2.
Forexample, a request for 11 KB is satisfied with a 16-KB segment.
Simple example:
Assume the size of a memory segment is initially 256 KB and the kernel requests 21 KB of
memory.
The segment is initially divided into two buddies—which we will call AL and AR—each 128 KB
in size. One of these buddies is further divided into two 64-KB buddies—BL and BR. However,
the next-highest power of 2 from 21 KB is 32 KB so either BL or BR is again divided into two
32-KB buddies, CL and CR. One of these buddies is used to satisfy the 21-KB request. CL is the
segment allocated to the 21-KB request.
This scheme is illustrated in the following Figure
Operating Systems 35
Slab Allocation
A second strategy for allocating kernel memory is known as slab allocation.
A slab is made up of one or more physically contiguous pages.
A cache consists of one or more slabs.
There is a single cache for each unique kernel data structure —for example, a separate cache
for the data structure representing process descriptors, a separate cache for file objects, a
separate cache for semaphores, and so forth.
Each cache is populated with objects that are instantiations of the kernel data structure the
cache represents.
For example, the cache representing semaphores stores instances of semaphore objects, the
cache representing process descriptors stores instances of process descriptor objects, and so
forth.
The relationship among slabs, caches, and objects is shown in Figure. The figure shows two
kernel objects 3 KB in size and three objects 7 KB in size, each stored in a separate cache.
The slab-allocation algorithm uses caches to store kernel objects.
When a cache is created, a number of objects—which are initially marked as free—are
allocated to the cache.
Operating Systems 36
The number of objects in the cache depends on the size of the associated slab.
For example, a 12-KB slab (made up of three contiguous 4-KB pages) could store six 2-
KB objects. Initially, all objects in the cache are marked as free.
When a new object for a kernel data structure is needed, the allocator can assign any free object
from the cache to satisfy the request. The object assigned from the cache is marked as used.
When the kernel has finished with an object and releases it, it is marked as free and returned to its
cache, thus making it immediately available for subsequent requests from the kernel.
Recent distributions of Linux now include two other kernel memory allocators— the SLOB and
SLUB allocators.
The SLOB allocator is designed for systems with a limited amount of memory, such as
embedded systems. SLOB (which stands for Simple List of Blocks) works by maintaining three
lists of objects:
small (for objects less than 256 bytes),
medium (for objects less than 1,024 bytes), and
large (for objects greater than 1,024 bytes).
The SLUB allocator replaced SLAB as the default allocator for the Linux kernel. SLUB
addresses performance issues with slab allocation by reducing much of the overhead required
by the SLAB allocator.
One change is to move the metadata that is stored with each slab under SLAB allocation to the
page structure the Linux kernel uses for each page.
Additionally, SLUB removes the per-CPU queues that the SLAB allocator maintains for objects
in each cache.
For systems with a large number of processors, the amount of memory allocated to these
queues was not insignificant. Thus, SLUB provides better performance as the number of
processors on a system increases.
Operating Systems 38
5. With neat diagram brief about Dynamic loading and relocation register.
CPU produces logical address
Logical address is added with the relocation register value to get the physical address
Operating Systems 39
15. In a system, page size is 2048 bytes and the process needs 72, 766 bytes of size. Find
out whether internal fragmentation occurs or not?
Page size = 2,048 bytes
Process size = 72,766 bytes
35 pages + 1,086 bytes
Internal fragmentation of 2,048 - 1,086 = 962 bytes
16. What is meant by TLB?
The two memory access problem can be solved by the use of a special fast-lookup hardware cache
called associative memoryor translation look-aside buffers (TLBs)
Some TLBs store address-space identifiers (ASIDs)in each TLB entry – uniquely identifies
each process to provide address-space protection for that process
PART-B