Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Memory_Managemen

CS431: Introduction to Operating Systems

Memory Management

Vijay Kumar

Memory_Managemen
t

Memory
1.
2.
3.
4.

Registers (all computations take place here)


Cache (all frequently accessed data may reside here)
Primary Memory
Mass Memory (All archiving is done here).

Physical Characteristics of Primary Memory:


1. A linear list of words (Size is defined by the Hardware ).
2. Each word is associated with a unique address (absolute).

Physical address

Physical memory

00000000
00000004
00000008
0000000C
0000000F
3.
4.
5.
6.
7.

Random access possible.


Reading time - fixed (1 cycle:
80 ns)
Writing time - fixed (1 and a half cycle: 120 ns)
Address type - physical (Absolute address).
Memory size can be expanded dynamically.

Data movement between


Disk Cache. Primary memory disk. Registers chache. Registers primary memory
Levels of real memory

Registers

Cache

Primary memory

Mass memory
Machine with no resident monitor
1. The entire memory belongs to one program.
2. The operator loads the program and provides commands to execute the program via some keys.

Memory_Managemen
t
3.
4.
5.

At the end of the execution the operator uses another set of keys to get the output.
Outputs are distributed manually.
Such system has limitations and cannot satisfy today's processing requirements.

Machine with resident monitor: The operator's functions are stored in the system as resident monitor. To
do this the entire memory is divided into monitor part and user part. The monitor is loaded permanently
into its portion of the memory. A program is loaded into the primary memory for execution.
Requirement: The monitor must be protected from the user program.
Solution: The only way this can be done is by precisely defining the higher boundary (end address) of the
monitor portion of memory. A fixed memory location (fence word) is used to hold this address. Any
memory access is checked against the contents of the fence word and access is granted only if the desired
address is larger.
Limitations
a.
b.
c.
d.
e.

Monitor cannot grow and shrink.


A fence word must be chosen carefully.
User program must be loaded at a fixed location.
If program size > memory then program cannot be executed.
Program must be recompiled whenever any change is made.

Improvement: Use a fence register instead of a fence word. The content of fence register can be changed
anytime.
System initialization: Load required value in the fence register.

Monitor
Fence word
User process area

32K
Memory management algorithm:

CPU generates an address (say paddr)


if (paddr >= fence) and (paddr < memory limit) then
access the location
else
begin
generate a trap for memory violation;
terminate the program;
Start another program
end;

Limitation
1.
2.

a program must be loaded at the compiled addresses.


If the monitor grows and shrinks then program must be recompiled before execution.

Relocation: Facility to relocate the program anywhere in the memory.


Requirements of relocation

Memory_Managemen
t
1.
2.
3.

The addresses assigned to the program by the compiler should be independent to physical
addresses. It should compile every program address starting from 0. This means the compiler
should generate logical addresses.
Program addresses should be relative to a fixed location.
The compiler should not load the program in the memory.

Implementation
1. Define a hardware register (relocation/base register). This will achieve the aim no. 2 above.
2. Convert the logical addresses generated by the CPU, using relocation register, to get the physical
address of the required location.
The CPU then accesses the contents of this location and continue with the program.

Memory

Base Register

1400

CPU

Logical Address 45

Physical Address 1445


program

1445

1485
Dynamic relocation using a relocation register
Observation
Logical address: program instruction address
Physical address: Real address of the program in memory.
Mapping from logical to physical: by the base/relocation register
Swapping: Consider the following situation Job 1 is very large (50K).
Job 2 is a small job (5K)
Suppose Job 1 is in the memory and is under execution.
Job 2 has to wait for Job 1 to finish.
What if Job 2 is a very high priority job? Result: job 2 must wait.
Improvement
Swap the Job 1 out of the memory briefly.
Schedule job 2 and let it finish then schedule Job 1.
Implementation: This scheme, under the resident monitor can be implemented by saving the current state
of Job 1 on the disk and Job 2 can be loaded for execution. Job 1 can be reloaded once Job 2 is finished.
There is some improvement, but we still could not run more than one job at a time.
Analysis: The main activity which affects the response time of a job is swapping activity. Suppose
Latency time - 8 milliseconds. Program size - 20K.
Transfer rate - 250,000 words/second = 250 words/ms. Transfer time (20K) - 20,000/250 = 80 ms
Total transfer time 8 (latency time) + 80 = 88 ms = 0.088 secs.
Number of Swaps: Minimum number of movement between the memory and the disk is 2. In its lifetime a
job may be swapped more than 2 and may have a large swapping time. Some efficiency can be gained by

Memory_Managemen
t
making the CPU time for a process longer than the swap time. In this way the CPU will not remain idle for a
longer period of time. Under this scenario the CPU time should be larger than 0.176 seconds (0.088 x 2).
How much to swap: Another improvement can be achieved by computing the exact swap amount, i.e., the
size of the program in the memory. It is not necessary to swap the entire user area, since the active program
may not be as large as 20/30 K (the size of user memory). Also the swap time can be reduced by improving
the disk hardware.
Overlapped Swapping: What about dividing the total running activity of a job between IO and CPU? In
this way IO is done by the IO processor and execution is done by the CPU. It seems all right. This means
the swapping activity would be done by the IO processor and the execution by the CPU. Good but what the
CPU will do when IO is doing the swapping? There is no job in the memory during this time. So CPU must
be given a job, while IO is swapping.

Monitor
Monitor Space

Buffer 1

Swap out

Buffer 2
Swap in

Fence Register
Running
Process
Overlapped Swapping

Implementation: This idea can be implemented by dividing the memory into three parts as shown in the
above diagram. Allow swapping and execution to proceed simultaneously. When one job is being
swapped from memory to disk, another job can be moved to the user area and can begin execution.
Restriction

A job for execution must be moved from a buffer to the user region.
A job to be swapped must be moved to one of the buffers from the user region.

Bottlenecks

Program movement in the memory takes time.


While a program is being swapped, the CPU is idle.

Improvements

Allow swapping in and program execution go simultaneously.


It should not be necessary to move job from a buffer to the user area for execution. The job can
execute from anywhere in the memory.

Solution Make fence register mobile. This means, move the fence register to the beginning address of the job
(job entry point) and scheduled the job for execution. No buffer to user area movement.

Memory_Managemen
t

Monitor
Swap out

Job area 1
Fence Register
Job area 2

Swap in

Job area 3
Movement of fence register
Weaknesses: CPU still idles. Address violation may happen.
Problems with swapping in general: If we want to swap a process, we must be sure that it is idle and has
not an outstanding I/O request. If a process waits for an I/O and if the I/O is asynchronously accessing
the user memory then it may try to access memory of some other process and generate addressing errors.
Solution: Never swap a process waiting for an I/O or execute I/O only in operating system buffer.

Multiple Partitions: Generalizing the swapping scheme


1.
2.
3.
4.

Increase the number of partitions.


Convert buffers into user area where program can be loaded for execution.
Avoid program movement inside the memory.
Extra hardware support for keeping track of these partitions.

Partitioning the memory: Each partition must be protected by being overwritten by programs in the
neighboring partition.
Two registers: Base register would store the beginning address of the process and the limit register would
store the end address of the process. Only two registers are needed since at any time only one program can
be executing.
Example:

Limit Register (LR) = 100. Program address <= 100. Base Register (BR) = 50.
Program address => 50.

CPU

Logical address

Base

Limit

<=

Memory

N
Addressing error
When a program is loaded into a partition for execution then every access to the memory is checked.
Suppose address generated by the CPU = LOGICALADDRESS.
if LOGICALADDRESS => (BR) and LOGICALADDRESS< = (LR) then access the location
else generate a trap: Memory Violation.

Memory_Managemen
t
Analysis of Multiple Partitions
Partitions size: variable. No. of partitions: fixed. Arrangement: partitions are contiguous.
Name of the scheme: Multiple Contiguous Fixed Partition Allocation (MFT). The commercial (IBM) name:
Multiprogramming with a Fixed number of Tasks (MFT).
A typical partition set

Total Memory Size


16 Partitions
4 Partitions
1 Partition

256K. Resident Monitor - 64K.


80K (5K each. Very small jobs).
60K (15K each. Average jobs).
52K (Very large Jobs).

These partition sizes may depend on the environment under which the system is going to function.

Memory allocation under MFT


First fit allocation: allocate the first available partition which is large enough to hold the job.
Algorithm:

Job size = 8K.


Scan the memory.
while not end of memory do
begin
found := false;
while partition not free do check the next partition;
if partition size >= job size then
begin
load the job; found := true
end;
end;
if not found then put the job in waiting queue

It selects the first partition > = than the job size. This implies that it may select a partition which is
much larger than the program and the next larger program may be denied a place in the memory.
Best Fit Allocation: Search for the most suitable partition for the job. The best choice is when partition size
= job size. This may happen seldom, so the next choice is the smallest available partition but large enough
to accommodate the job.
NOTE: If the memory partitions are sorted in ascending order according to their size then the first fit = the
best fit.
Worst Fit Allocation: Allocate the largest size partition to a job.
NOTE to students: Study the advantages and disadvantages of these three schemes.

Job scheduling schemes under MFT


One Queue for One Size Partition: Maintain a queue for each of the memory partition.
Example: One queue for 2K, one queue for 6K, and a queue for 12K. When a job arrives then it is put on the
appropriate queue if there is no memory partition available. If a job of 4K arrives and there is no partition of
size 4K then it is put in the queue of larger partition (6K).

Memory_Managemen
t

Monitor
2K 1K 2K

Q2

2K

3K 4K 4K

Q6

6K

Q8

11K

7K

12K

MFT with separate queues for each partition


One Queue for all the partitions: One queue serves all the partitions.

3K 4K 7K 11K 3K 1K 4K

Monitor
2K
6K

12K

MFT with one queue


Problems with MFT
1.
2.

During execution a process may demand more memory. If the partition in which the job is loaded
has some spare memory then the job's demand can be met otherwise the job is terminated with a
message : "Insufficient Memory".
A process is swapped out and when a larger partition becomes available then the job can be
swapped in into a larger area. Not so easy to do. It needs some hardware support.

Memory fragmentation: Portion of a partition that cannot be allocated to other process.


Internal fragmentation: Fragmentation inside a partition.
External fragmentation: Fragmentation outside a partition.
Criteria of a best partition set
1.
2.

Minimum fragmentation (external or internal).


Most processes can execute.

Memory_Managemen
t

Multiprogramming with Variable number of Tasks (MVT)


A process can grow or shrink during execution. The memory management system should be able to
handle such ad hoc requirement. In this way the memory allocation becomes dynamic and memory prepartition is not necessary. Since there is no memory partition, the internal fragmentation is nil, however,
the magnitude of external fragmentation would depend, up to some extent, on the job scheduling policy.
Memory allocating under MVT
1. Allocates required amount of memory to the first job
2. Record memory status in memory status table. During execution, if the job requires more memory
then the required amount is provided. But if there is no free memory then the job is suspended and
may be swapped out from the memory.
3. If memory request of the next job can be fulfilled then it is loaded in the memory. If sufficient
memory is not available then the job is put on hold and restarted when more memory becomes
available.
4. The memory status table is updated every time memory is allocated or released.
Job scheduling: FCFS

0
40K
J1

J2 ends

256K

230K
256K

J3

230K

J5

J4
170K
200K

170K
200K

230K

M
40K
Start J5
90K
100K

J4

J4

170K
200K
J3

J1 ends
100K

Start J4
200K

J3

J1
100K

100K

200K

M
40K

J1

100K
J2

M
40K

40K

230K

J3

230K

J3

256K

Observation
Storage allocation: Best-Fit or First-Fit. External fragmentation: Can be frequent.
Internal fragmentation: None, since there are no pre-partition.
Reason: Because jobs release their memory dynamically and it may not be sufficient for the next job on the
job queue. This implies that there might be many small unallocated memory pieces (holes) scattered all over
the memory.

Memory_Managemen
t

0
300K
500K
600K

0
M
J1
J2

400K
1000K
1200K

300K
500K
600K
800K
1200K

300K

J1

500K

J2

600K

J3

1000K

J4

1200K

M
J1
J2

0
300K
500K
600K

M
J1
J2

J4
J3
900K

J3
300K
900K

1500K
1900K

900K

J4

1500K
1900K

200K
2100K
a. Original

2100K

2100K

b. moved 600K

c. moved 400K

2100K

J4
J3

d. moved 200K

Solution: Compaction. Merge these pieces and relocate the existing jobs in the memory.
Algorithm
1. Check the memory status table to find the location (beginning and end addresses) of free memory
pieces.
2. Join all the adjacent pieces together and keep track of the largest number of adjacent pieces glued
together.
3. Join other pieces to this large piece.
4. Relocate all the affected programs.
Simple method: move all jobs towards one end of the memory and all free partitions to the other end, figure
b. Total number of words moved: 600K.
Second method: move job 4 above job 3, figure c. Total number of words moved: 400K.
Third method: move job 3 down below job 4, figure d. Total number of words moved: 200K.
Machines used memory compaction: SCOPE O/S of CDC 6600, PDP-10, Univac 1108. They used base and
limit registers to implement memory compaction.
Compaction Problems
1. The relocation of programs is expensive. In a heavily loaded system program relocation may take
up too much CPU time.
2. Frequent relocation may create addressing errors.
Limitations of these memory management mechanisms
1. All programs must be loaded into contiguous memory locations.
2. All programs must have its own copy of system routine or procedure. This means that several
copies of the same code may reside in the memory, i.e., no program sharing. This limitation,
however, may be eliminated by swapping but it would be expensive to do so because swapping
takes time and resource.

10

Memory_Managemen
t

Improvements
Paging: The mechanism is based on the idea that at a time only one instruction of a program is required by
the CPU. Furthermore, physical contiguity of program instructions are not necessary but they must have the
logical contiguity. For example a JUMP or GOTO instruction breaks the physical contiguity of these
instruction for correct execution of a program. Since a program is concerned only about the logical
contiguity, the memory management is free to break the physical contiguity. A logical contiguity, usually, is
provided by pointers. Linking instructions of a program via pointers to maintain logical contiguity is
impractical. Paging mechanism establishes the logical contiguity by restructuring the physical address
space and the logical address space.
Physical address space: The entire physical memory.
Logical address space: All program referenced addresses.

Paging Scheme and its Mechanism


page 0

page 1

page 2

page 3

page 2

Page table

page 1

Logical address space

page 0

5
6
7

page 3

Physical address space


Under this scheme the logical address space is divided into equal size pages. The physical memory is
divided into equal size blocks (frames). Page size = block (frame) size. The program pages are loaded into
the memory frames, one page to one memory block. The memory location of a page is stored in a table called
Page Table. Each program has a page table associated with it. In this way a page table serves as a mapping
device which maps a logical address into its physical address and thus helps to maintain the logical
contiguity. These program pages may be loaded into any available blocks and they need not have to be
physically contiguous. Page and memory frame sizes are fixed by the hardware. Compiler divides the
object code into pages and the loader loads them into memory blocks.
Page size: typically power of two. IBM 370 page size: 2048 or 4096 bytes. Nova 3/D page size: 4096
bytes. DEC-10 page size: 2048 bytes. Sigma 7 page size: 2048 bytes.
In general

if page size = P and logical address = U then


page number p = U div P and page offset d = U mod P

Instruction access requires: starting address of the page holding the instruction and the displacement (offset) from the start of this page.
n
m
Words of a 2 word size page can be accessed by n bits. Similarly if there are 2 total number of
blocks in the memory then we would need m bits to number all these blocks.
A page table of a program holds: the page number and the physical starting address of that page. The
page number is used to index the page table which gives the physical address of that page. This is nothing

11

Memory_Managemen
t
but the base address of that page in the memory. Into this address the page offset is added to go to the
desired instruction.
Example: Consider Page size = 4 words. Physical memory size = 32 words = 8 frames.

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p

page 0

page 1

page 2

page 3

p
r
o
g
r
a
m
p
a
g
e
s

0
frame 0
4

Page table
8

12

i
j
k
l
m
n
o
p

frame 1

frame 2

frame 3
16

Logical address space (Logical memory)

frame 4
20

24

28

a
b
c
d
e
f
g
h

frame 5

M
e
m
o
r
y
F
r
a
m
e
s

frame 6

frame 7

Physical address space (Physical memory)


CPU generates instruction address (logical address) = 0 .
Mapping this address to physical address

Logical address 0 is in page 0.


Page 0 is loaded in the memory at frame 5.
Displacement in page 0 or in frame 5 is 0.
Frame 5 starts from physical address (5 * 4) = 20.
Add displacement to it 20 + 0 = 20.
Physical address of the desired instruction = 20.

Paging hardware: p = page number. d = offset in the page. f = frame number where page is loaded.

12

Memory_Managemen
t

main memory

CPU

logical address

physical address

p
f
Page table
Implementation of Page Table: One register for one page table entry.
Not economical for large page tables. Since page tables of ready processes always reside in the main
memory, a Page Table Base Register (PTBR) points to the base address of running process's page table. In
this way each program can maintain its separate page table and switching of page table can be done by
changing the contents of PTBR.
Associative registers:
accessed in parallel.

Set of very fast, content addressable registers (look-aside memory) that can be

Improvement: A program requires only a part of its entire page table to get the address of the desired block.
Put only a few entries of a set of page tables in the associative registers.
Instruction access
1. Search the set of associative registers.
2. If page number found then proceed to get the memory address otherwise get the entire page table of
that process and get the frame number from the page table.
Memory access time under paging
Hit ratio: percentage of time a page is found in the associative registers.
Hit ratio increases with the number of associative registers. With 8 or 16 associative registers, a hit ratio
of 80 to 90% can be obtained. An 80% hit ratio means that 80% of the time we find the desired page number
in the associative registers.
Some useful data
Time taken to search associative registers: 50 ns.
Time taken to access the memory: 750 ns.
Total access time: 50 + 750 = 800 ns. when the page number is in the associative register.
Page number is not in the associative registers
Time to access the page table = 750 ns.
Time to access the desired word in the page table = 750 ns.
Total time = 1500 ns.
Effective access time = .80 * 800 + .20 * 1550 = 950 ns.
Access slows down from 750 ns to 950 ns. = 26.6%
For 90% hit ratio = .90 * 800 + .10 * 1550 = 875 ns.

13

Memory_Managemen
t

Access slows down from 750 ns to 875 ns. = 16.6%

Page Sharing
Program and data sharing is possible under paging. Sharing avoids providing a personal copy of a
page to each programs. This is required when re-entrant (pure procedure) codes are used by several
programs. Reentrant codes are those codes which are not modified during their execution, i.e., there is no
store operation in the code (no self address modification). Editor, compilers, loaders etc., are some of such
codes.
No sharing example
No. of users: 40.
Text editor size: 30K.
Data space: 5K.
Each user must have a copy of 30K of editor since they cannot share other user's copy of the editor.
Total memory required to support these users: = 40 35 = 1400K.
Sharing example

Ed1
Ed2
Ed3

3
4
6
1

0
1
2
3

3
4
6
7

Ed1
Ed2

Page table
for job1

Ed3

0
1
2
3

Page table
for job2

Ed1
Ed2
Ed3

Data 1

Data 2

Data 3

Job 1

Job 2

Job 3

3
4
6
7

0
1
2
3

Page table
for job3

0
1
2

Data 1
Data 3

Ed1

Ed2

5
6

Ed3

Data 2

8
9
10
Protection: To protect illegal access by a program.
Consider the following situation:
Address Space: 14 bits. (address from 0 to 16,383). Page size: 2K (2048 words)
Program size: 0 to 10,468 (pages from 0 to 5).
Total number of pages that can be addressed: 16,383/2048 = 8.
This means pages 0, 1, 2, 3, 4 and 5 are legal and pages 6 and 7 are illegal for this program. So if the
program tries to access pages 6 or 7 the system must trap it. To recognize the illegal page request invalid
bits are provided in the page table. The page table for the above system then looks like:

14

Memory_Managemen
t

Frame Valid/
number Invalid

Program space
00000

Page 2

Page 3

Page 0
Page 1

Page 4
10,468
12,287

Page 5

Page table

Segmentation
User's view of memory: the memory where his program is loaded is modularized as his program, i.e.,
segmented. The memory management technique that implements this view is called segmentation.

Main memory
Subroutine

Main program

Stack
SQRT

Symbol table

Implementation

Segment is referred to by a segment number.


During compilation the compiler constructs the segments.

A Pascal compiler might create separate segments for the global variables, local variables of each
procedure, procedure calls, stack to store parameters and link address, etc. The loader assigns segment
numbers to these segments.
To map the address generated by the CPU (logical address) a segment table is provided. A logical
address is consists of two parts: a segment number (s) and an offset in the segment (d). The segment
number is used in the segment table as an index. Each entry of the segment table has a segment base and a
segment limit. The hardware organization of this scheme is as follows:

15

Memory_Managemen
t

Segment table

Memory

s
CPU

Logical address

Limit

(s, d)

Base

<

N
Addressing error
Example
We illustrate the segmentation mechanism with the following example:

Stack
Seg 0

Segment 3
Limit Base

Subroutine X

Segment 0
Function SQRT

Symbol Table

Segment 1

Segment 4

Main Program

1000

1400

400

6300

Seg 3

400

4300

Seg 2

3 1100

3200

Seg 4

4700

4300
4700
5700

Segment Table

6300
Seg 1

Segment 2

2400
3200

1000

1400

6700

Physical Memory
Logical Address Space
There are five segments 0 through 4. Segments are stored in the physical memory as shown. The
segment table has a separate entry for each segment, giving the beginning address of the segment in the
memory (the base) and the length of that segment (the limit). For example, segment 2 is 400 words long,
beginning at location 4300. Thus a reference to a word 53 of segment 2 is mapped onto location 4300 + 53 =
4353. A reference to segment 3, word 852 is mapped to 3200 + 852 = 4052. A reference to word 1222 of
segment 0 would result in a trap to the O/S, since segment 0 is only 1000 words long.

16

Memory_Managemen
t

Virtual Memory
So far we managed to eliminate all the restrictions on program execution except:
The complete program must be loaded into the memory before execution can begin.
A careful analysis of program behavior indicates that a program does not require the entire to begin
execution.
Example:

read (a,b,c);
if a > b + c then go to look
else
begin
---------------end;
----------look:
begin
writeln ('Error in reading data. Check data');
------end;

A program during execution needs only one instruction at a time. This behavior of the program implies
that an instruction should be loaded into the memory only when required by the CPU.
Example: If a program declares an array of 100 100 but uses only 30 x 30 then 100 x 100 array would
have to be in the memory occupying 100 100 2 = 20000 = 20K (approx.). 18K memory will go waste. So
for efficient memory utilization use demand strategy.
Demand strategy: Load the entry point of the program only and the other parts of the program is loaded
when CPU asks for it.
Further investigation: It should be noted that the semantics and syntax of a program are not related at all
with the way memory is allocated to the program. This means a program written in any language will be
treated in exactly the same way, i.e., the binary of a program is completely independent of the high level
language. Of course the syntax of some of the language is such that it helps the memory management
technique to optimize the utilization of the memory. Language like ALGOL W or ALGOL 68 has dynamic
array structure where space for array is allocated during the run time. This should not be a common criteria
to build the OS.
The address space may be larger than the physical space. A successful execution of programs under
this environment requires that
a.
b.
c.
d.

Correct program entry point should be available to the CPU.


CPU should be able to generate its requirement.
The requirement should be interpreted correctly and error should be detected.
The requirement should be made available as soon as possible to the CPU.

17

Memory_Managemen
t
Implementation of demand strategy: The requested parts are loaded (overlaid) on that part of memory
which was occupied by that part of the program which is no longer needed. This overlay activity is
managed by Overlay Driver. The following diagram illustrates the structure. Consider a two-pass
assembler.
Pass1: Construct a symbol table. Code size: 8K. Pass2: Generate machine code. Code size:

10K.

Symbol table size: 14K. Common routines: 5K. Memory requirement: 37K. Memory available: 32K
This can be managed as follows (overlay driver in the memory):
1. Load pass1, symbol table and common routines from disk. OK, since pass1 does not need pass 2.
2. Load pass2, symbol table and common routines from disk. OK, since it does not need pass 1.
This arrangement requires some special registers to handle overlay.

SymbolTable

14K

Commonroutines

5K

Overlay driver

2K
Pass 2

Pass 1

Problems: Since overlay involves loading correct parts of the program, the program must be structured in a
special way. The programmer is must have a complete knowledge of the program execution and its data
structures. This may become very difficult for large programs. If an overlay error occurs then the program
may have to be restructured. A method similar to overlay is Dynamic Loading. The idea here is similar and
it loads program modules on demand. Program modules may be a procedure, a function, a subroutine,
library routines etc.
Demand Paging : Load program pages when demanded by the CPU. System architecture:

Valid/
Frame Invalid

Logical space
A

3
4

Page table

Physical space

Backing store

A
C

7
8
9

18

Memory_Managemen
t

Terms
Page Fault (Non-Equivalence): An event. It occurs when the desired page is not in the main memory.
A page fault for a missing page is indicated with i in the page table.
Page Replacement: An action. It decides which page from a completely full main memory should be
moved to disk to make room for the demanded page and transfers the demanded page from disk
Mechanism: The mechanism of demand paging is similar to ordinary paging. A logical address is
generated which is mapped onto a physical address by the paging hardware. The page may or may not be
in the physical memory. In the latter case the page is located on the disk and is moved to the memory. The
demand paging differs only on this point from the simple paging. The following diagram explains the
entire sequence of processing a page-fault.

OS

Load M

Process pages

4
Disk

Page table
Physical memory

Steps
1.
2.
3.
4.
5.
6.

CPU generates logical address for LOAD M. A reference to the page table indicates that the page is
not in the memory and gives its disk address.
A trap is generated. System goes in privilege mode.
The page is located on the disk.
Room is made available in memory and the page is moved.
Page table is updated to indicate that the page is in memory.
The instruction is restarted (system in user mode).

Implementation problems
1. If the page fault occurs on the instruction fetch, restart by fetching the instruction.
2. If a page fault occurs while fetching an operand, re-fetch the instruction, decode it again, and then
fetch the operand.
Consider the instruction: ADD A B
a. Fetch and decode ADD. b. Fetch A. c. Fetch B.

d. Add A and B. e. Store the sum in C.

ADD and operand A on page 1. Operand B on page 5.


Page 1 is in the memory and page 5 on the disk.
A page fault occurs when CPU tries to access B. Execution of the instruction incomplete. Execution
suspended. Room is made in the memory and the page is moved.

19

Memory_Managemen
t

Page table updated. ADD A B starts from the beginning.


3.

Problem occurs when one instruction may modify several different locations. Example: MOVE
CHARACTER instruction. IBM 370 MVC instruction can move up to 256 characters from one
location to another location. A page fault may occur just before the last move is completed. Partial
result is saved in the memory and the operation resumes from this point when the desired page is
brought into the memory.

4.

In auto-increment and auto-decrement on PDP (VAX also). In auto-increment addressing the


address is incremented by a word (2 bytes) after the operand has been fetched from the last
instruction. In auto-decrement addressing mode first the address is decremented and then the
resulting address is used to fetch the operand.
Example: MOV (R2)+, -(R3). Now if a page fault occurs when accessing the operand pointed by
register R3, the execution is suspended and the desired page is brought into the memory, page table
is updated and the execution resumes from the beginning. The original contents of R2 and R3 is
saved using a special register SR1. After the page fault R2 and R3 receive their original contents
and execution resumes.

Page Replacement Algorithms :

Decides which page is to be removed from the memory to make

room for new incoming page.


FIFO: simplest replacement algorithm. It removes the oldest page from the memory to make room for
incoming page. Example with a page reference string: 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2017 01.

7
0

7
1
0

2
0
1

3
2
3
1

0
2
3
0

4
4
3
0

2
4
2
0

3
4
2
3

0
2
3

0
1
3

0
1
2

1
7
1
2

7
7
0
2

7
0
1

FIFO page replacement


Problem: FIFO suffers from what is called Belady's anomaly where at some time the page fault rate may
increase as the number memory frames increases.
Example

Page reference string:

A, B, C, D, A, B, E, A, B, C, D, E. Memory can hold only one page.

CPU requires
Page Fault
page
A
Y
B
Y
C
Y
D
Y
A
Y
B
Y
E
Y
A
Y
B
Y
C
Y
D
Y
E
Y
Total number of page fault = 12.

Which page
in memory
A
B
C
D
A
B
E
A
B
C
D
E

20

Which page moved


out of memory
NONE
A
B
C
D
A
B
E
A
B
C
D

Memory_Managemen
t
Intuitively, more number of memory frames less number of page faults. Let us check this.
Page reference string:

A, B, C, D, A, B, E, A, B, C, D, E. Memory can hold three pages.

CPU requires
page

Page Fault

A
B
C
D
A
B
E
A
B
C
D
E

Which page
in memory

Y
Y
Y
Y
Y
Y
Y
N
N
Y
Y
N

Which page moved


out of memory

A
B
C
D
A
B
E
NONE
NONE
C
D
NONE

NONE
NONE
NONE
A
B
C
D
NONE
NONE
A
B
NONE

Number of page faults = 9


Page reference string:

A, B, C, D, A, B, E, A, B, C, D, E. Memory can hold four pages.

CPU requires
page

Page Fault

A
B
C
D
A
B
E
A
B
C
D
E

Which page
in memory

Y
Y
Y
Y
N
N
Y
Y
Y
Y
Y
Y

Which page moved


out of memory

A
B
C
D
NONE
NONE
E
A
B
C
D
E

NONE
NONE
NONE
NONE
NONE
NONE
A
B
C
D
E
A

Number of page faults = 10.


Optimal Replacement: On a page fault, replace that page which will not be required for the longest
period of time. An optimal page replacement algorithm generates the lowest number of page faults. It
never suffers from Belady's anomaly. Example

7
0

7
0

2
0

2
0

2
4

2
0

2
0

7
0

21

Memory_Managemen
t
Implementation of this algorithm is not possible since it requires future knowledge of the reference
string. However, the algorithm can serve as a standard for measuring the efficiency of other algorithms.
Least Recently Used (LRU): pages that have not been used for the longest period of time can be removed
from the memory. This is the Optimal page replacement algorithm looking backward in time, rather than
forward. Example

7
7

0
7
0

7
0

2
0

2
0

4
0

4
0

4
3

0
3

1
3

1
0

7
0

Implementation of LRU
Method 1: Use a clock. It is incremented whenever the page is referenced. Each page table entry has its
own clock and when a page is to be replaced, all page tables are searched to find the page whose reference
time is the lowest.
Method 2: Use a doubly linked list. Under this implementation, the most recently referenced page is moved
to the top of the list. The bottom page on the list is removed when a page fault occurs.
Implementation of LRU using a matrix
In this scheme the hardware maintains a matrix of n x n bits where n is the number of memory page
frames. Initially all bits of this matrix is set to zero indicating no reference is made. Whenever a page is
referenced the corresponding row bits of the matrix is set to 1's and the column bits are set to 0's. At any
instant whose row value is the lowest is the least recently used and is a candidate for replacement. The
working of the algorithm for 4 page frame memory is given in the following figure. Page 2(1) means
incoming page is 2 and it replaces the page stored at memory frame 1.
Reference string: 0 1 2 3 4 5 0 3 2 3 3 1 4 5 0 1

0
1
2
3

0 1
0 1

2 3
1 1

0 0
0 0

0
0

0
0

0 0

0
1
2
3

Page 0

4
5
2
3

2 3
1 1

1 0
0 0

1
0

1
0

0 0

0
1
2
3

Page 1

4 5
0 0

2 3
1 1

1 0
0 0

1
0

1
0

0 0

Page 5

0 1
0 0

4
5
0
3

2 3
0 1

1 0
1 1

0
0

1
1

0 0

0
1
2
3

Page 2

4 5
0 1

0 3
0 1

1 0
1 1

0
0

1
1

0 0

Page 0

0 1
0 0

4
5
0
3

2 3
0 0

1 0
1 1

0
0

0
0

1 1

4
1
2
3

Page 3

4 5
0 1

0 3
0 0

1 0
1 1

0
0

0
0

1 1

Page 3

0 1
0 0

2
5
0
3

2 3
1 1

0 0
0 1

0
0

0
0

0 1

Page 4

2 5
0 1

0 3
1 1

0 0
0 1

0
0

0
0

0 1

Page 2

4 1
0 1

2
4
0
3

2 4
0 0

0 3
1 1

1 0
0 0

1
0

1
0

0 0

Page 4

Least Frequently Used (LFU): Replace the page that is least frequently used or least intensively
referenced. In this replacement a page which has been brought most recently may be selected for removal
from the memory.

22

Memory_Managemen
t
Most Frequently Used: Pages which have been used most frequently are removed.
Not Used Recently: Pages not used recently are not likely to be used in the near future and they may be
replaced.
Minimum number of frames: What should be the minimum number of frames must be available for an
instruction to be executed?
Example: Suppose a system provides only one level of indirect addressing. In this case, the first operand
contains the address of the location which holds the operand value. To executed the instruction SUB A
(B) a minimum of three pages are required and the execution can be illustrated as follows:

Process pages
0

SUB A

Memory Frames

(B)

SUB A

(B)

(B)

Page holding (B) is replaced by the page holding B

Imagine what will happen when a memory has only 2 pages:

1, 2

1, 2, 3

3, 1

PDP 8 requires 3 pages, PDP 11 requires 6 (at least) and IBM 370 requires 8. Nova 3 (Data General)
allows multiple levels of indirection. In the worst case (theoretically) an instruction can reference entire
virtual address space.
So minimum number of frames per process is defined by the architecture of the instruction set while the
maximum number by the size of available memory.
Locality: Defines the locality of access. The observation is that processes tend to reference storage in nonuniform, highly localized pattern. It is an empirical (observed) rather than a theoretical property. Two
kinds:
Temporal Locality: Locality over time.
Example: If the weather is sunny at 3 p.m., then there is good chance (but certainly no guarantee) that the
weather was sunny at 2.40 p.m., and will be sunny at 3.30 p.m.
Spatial Locality: Nearby items tend to be similar.
Example: If it is sunny in one town then it is likely (but not guaranteed) that it would be sunny in the next
towns.

Locality in OS environment

23

Memory_Managemen
t
Processes tend to favor certain subsets of their pages (temporal locality), and these pages often tend to
be adjacent to one another (spatial locality). This does not mean that a process won't make a reference to a
new page.
Example (Temporal Locality): Storage locations referenced recently are likely to be referenced in the near
future. Supporting this observation are:
a. looping. b. subroutines. c. stacks and variables used for counting and totaling.
Example (Spatial Locality): Storage references tend to be clustered so that once a location is referenced, it is
highly likely that nearby locations will be referenced. Supporting this observation are:
a. array traversals
b. sequential code execution
c. the tendency of programmers to place related variable definitions near one another.
Process execution efficiency would be high if its favored subset of pages is in the memory. A program
may finish one locality and move to another locality.
For example, migration from one
subroutine/procedure to another. This implies that a program is made up of several locality. The idea of
locality gave rise to a storage allocation policy called Working Set Model.

Working Set Model


Working Set of a Program: A set of pages a process is actively referencing.
Working set storage management policy: maintain the working set of active programs in primary memory
during the lifetime of a process.
Implementation: Define a working set window. The set of pages visible through this window is the
working set of the process.
Example
Time:
t1.
Window: W.
No. of pages seen through W is 10 (2 6 1 5 7 7 7 7 5 1).
Working set of this process at t1 is: 1 2 5 6 7.
At time t2 it may be: 2 (3 4).
The working set model says that this process will run efficiently if the process always has its working
set in the memory.
Page reference trace

2 6 1 5 7 7 7 7 5 1 6 2 3 4 1 2 3 4 4 4 3 4 3 4 4 4 1 3 2 3 4

t1

t2

Let P1, P2 and P3 are three process in the memory. The working set size of P1 = WSS1.
The working set size of P2 = WSS2. The working set size of P3 = WSS3.
For efficient execution of P1, P2 and P3 the total number of pages which must be in the memory is:

WSSi
1

24

Memory_Managemen
t
Thrashing: Progress in process execution is less than the paging activity. Suppose P1 and P2 processes do
not have their complete working sets in the memory.
Result
1.
2.
3.
4.

Very high page fault high consequently very high paging traffic.
CPU will be busy, most of the time, in moving pages around. No progress in process execution.
Program response time and the throughput will go down.
When the throughput declines, the scheduler would think that there are not many processes in the
memory and so it will schedule new processes. This will make the situation worse and the CPU
will get more and more busy in paging traffic and the throughput would decline further. This
phenomenon is called Thrashing.

Problem with working set: Supporting the dynamic nature of working set window. Its behavior cannot be
predicted correctly. On the basis of program behavior one can only give approximate idea of window size.
Global Versus Local Allocation: We can apply the page replacement algorithms in two different ways:
1.
2.

On a page fault select a page from the entire active process page table. This is global page
allocation.
On a page fault replace pages from the process page table which generated the page fault. Local
page allocation.

The problem with global allocation is that the execution of one process may affect the execution of other
processes. A process has variable number of pages in the memory. The advantage is that a process may not
have to wait for memory frames longer than it would under local allocation.
Page Size: Explain in the class.

25

You might also like