Download as pdf or txt
Download as pdf or txt
You are on page 1of 145





1.1 What operating systems do
1.2 Operating System Structure
1.3 Operating System Operations
1.4 Process Management
1.5 Memory Management
1.6 Storage Management
1.7 Protection and Security
1.8 System Structures: Operating System Services
1.9 User Operating System Interface
1.10 System Calls
1.11 Types of System Calls
1.12 System Programs
1.13 Operating System Structure
1.14 Virtual Machines
1.15 Process Concept: Process Scheduling
1.16 Operations on Processes
1.17 Interprocess Communication
1.18 Multithreaded Programming: Multithreading Models
1.19 Process Scheduling: Scheduling Criteria
1.20 Scheduling Algorithms
1.21 Multiple- Processor Scheduling

Dept of ISE, Dr.AIT Page 2


An is a program that manages the computer hardware. It also provides a basis for application
programs and acts as an intermediary between the computer user and the computer hardware.

1.1 What operating systems do?

In general there are four components in any computer system

 The hardware:
The hardware consists of the central processing unit, the memory and the I/O devices. Which
form the basic computing devices.
 The application programs: they define ways in which resources are used to solve users’
computing problems.
Ex. Word processors, spreadsheets, compilers, browsers etc.
 The OS: It controls and coordinates the use of the hardware among the various application
programs for various users.
 The users: Those with computing problems either a human or another machine.

An operating system is similar to a government. Like a government, it performs no useful function

by itself. It simply provides an environment within which other programs can do useful work.
The abstract view of the computer system is as follows:
User 1 User 2 User 3 …………….. User n

compiler assembler editor ……………………………………..d/b

Operating system

Computer H/W

1.1.1 User View:

The users perception of the OS depends on three factors:
a. Purpose for which the computer is used.
b. The computing environment.
c. Degree of identity of the computer system with the purpose being served.

The user view of the computer varies by the interface being used:
1. PC: Monitor, keyboard, mouse and system unit. It is designed for one user to have all the
resources (monopolize its resources) and maximize performance. Easy to use is the design
criteria of the OS with importance given to performance rather than resource utilization. OS
optimized for single user expectations.
2. Terminal to Mainframe or minicomputer: Many users may be accessing the same system.
Sharing of resources and exchange of information may be the criteria. Design issue of OS is
to maximize resource utilization. I.e. assure that all available CPU time, memory and I/O are
used efficiently and no individual user takes more than his/her fair share.
3. Workstations: users can have the workstations connected to other workstations or servers.
Even though the users have dedicated resources at their disposal, they can also share

Dept of ISE, Dr.AIT Page 3


resources which are networked: like files, print servers etc. Here the OS is designed to
compromise between individual usability and resource utilization.
4. Handheld computers: They are basically standalone. They may be connected to networks and
other devices either directly by wire or through wireless modems and networking. These
devices have a limitation of power, speed and interface and hence perform relatives few
remote operations. OS design issue here is individual usability and performance per battery
5. Embedded: These computers have little or no user view. They may be embedded in home
appliances, automobiles etc. The OS is designed to run efficiently without user intervention
at times.

1.1.2. System View: Here the different views are:

 From the system point of view OS is the program that is close to h/w and more of a resource
allocator. When conflicting requests for resources (CPU time, memory space, file-storage
space, I/O devices) arise the OS has to decide how to allocate them to the users.
It acts as a manager: in case of conflicting requests for resources the OS has to decide how to
allocate them to specific programs and users. Happens mostly in mainframes and
 Another view is where the OS has to control various I/O devices and user program. The OS
acts as a control program which manages execution of user programs and prevents errors and
improper use of the computer especially I/O devices.

1.1.3 Defining OS
 The fundamental goal of computer systems is to execute user programs and to make solving
user problems easier. Toward this goal, computer hardware is constructed.
 Since bare hardware alone is not particularly easy to use, application programs are
developed. These programs require certain common operations, such as those controlling the
I/O devices.
 The common functions of controlling and allocating resources are then brought together into
one piece of software: the operating system.
 A more common definition is that the operating system is the one program running at all
times on the computer (usually called the kernel), with all else being systems programs and
application programs.

1.2 Operating System structure:

 One of the most important aspects of operating systems is the ability to multiprogram. A
single user cannot, in general, keep either the CPU or the I/O devices busy at all times.
 Multiprogramming increases CPU utilization by organizing jobs (code and data) so that the
CPU always has one to execute.
 The operating system keeps several jobs in memory simultaneously (Figure 1.1). This set of
jobs can be a subset of the jobs kept in the job pool—which contains all jobs that enter the
 The operating system picks and begins to execute one of the jobs in memory. Eventually, the
job may have to wait for some task, such as an I/O operation, to complete.

Dept of ISE, Dr.AIT Page 4





Figure 1.1 Memory layout for a multiprogramming system.

 In a non-multiprogrammed system, the CPU would sit idle. In a multiprogrammed system,
the operating system simply switches to, and executes, another job.
 When that job needs to wait, the CPU is switched to another job, and so on. Eventually, the
first job finishes waiting and gets the CPU back.
 As long as at least one job needs to execute, the CPU is never idle. This idea is common in
other life situations.
 Multiprogrammed systems provide an environment in which the various system resources
(for example, CPU, memory, and peripheral devices) are utilized effectively, but they do not
provide for user interaction with the computer system.

Time-Sharing System
 Time sharing (or multitasking) is a logical extension of multiprogramming.
 The CPU executes multiple jobs by switching among them, but the switches occur so
frequently that the users can interact with each program while it is running.
 An interactive (or hands-on) computer system, which provides direct communication
between the user and the system. The user can give instruction to the OS or to a program
directly through keyboard or a mouse, and waits for immediate results.
 A time-shared operating system allows multiple users to use the computer simultaneously.
Since each action or command are short in time shared system, only a little CPU time is
needed for each user.
 A time-shared operating system uses CPU scheduling and multiprogramming to provide each
user with a small portion of a time-shared computer.
 When a process executes, it typically executes for only a short time before it either finishes
or needs to perform I/O. I/O may be interactive; that is, output goes to a display for the user,
and input comes from a user keyboard, mouse, or other device.
 Since it has to maintain several jobs at a time, system should have memory management
Though systems differ from each other because they are organized on different lines, there are some
commonalities. Many commonalities are discussed here.

When only a single user is using the system (uniprogramming) the user cannot keep the CPU
busy all the time. This decreases CPU utilization and hence performance. A solution to this is to use

What is the basic goal of multiprogramming?

The goal of multiprogramming is to increase the CPU utilization by organizing jobs so that
the CPU always has something to execute.

A switch from uniprogramming i.e. only one process/job in the memory to
multiprogramming was a remarkable achievement. The job-scheduling notion gave way to

Dept of ISE, Dr.AIT Page 5


multiprogramming capability. The idea here is to increase the CPU utilization even further. The fact
is that no user program can either keep the CPU or I/O devices busy all the time. Thus concurrency
of operation between the CPU and I/O subsystem is exploited to get more work done by the CPU
and hence increase CPU utilization. Multiprogramming arrangement ensures synchronization of the
CPU and I/O activities in a simple manner. We space multiplex physical memory and time multiplex
the physical processor. Sometimes they are also called as multi-user system.

What is multiprogramming system?

Keeping the CPU always busy either with execution of instructions or I/O processing of one
or the other program/job in a system is called multiprogramming system. OR Rapid switching of the
CPU between different jobs is called multiprogramming.

The OS has to keep several jobs in memory simultaneously. Usually this is the subset of the
jobs kept in the job pool because of the limited primary memory. The OS picks up and starts
executing on of the jobs. Eventually this job may not need the CPU due to some reason like say an
I/O operation to complete. Usually a DMA is initiated which does not require CPU intervention.
Instead of having the CPU idle the OS switches to the next job in the memory. (This is called context
switching. Though context switching is an overhead and expensive the rationale is that the time loss
in context switch if far lower than time gained due to CPU utilization.) CPU remains with this job
until a request for I/O is made. Thus CPU switches from one job to another. Eventually all jobs
complete execution by acquiring the CPU in course of time. Interrupts are used to indicate
completion of I/O. Thus the OS does three major functions: Scheduling, memory management and
I/O management.

Multiprogramming is the first instance where the OS has to make decisions for the users.
Which means the OS is fairly complex.
 Switching from one job to the next involves huge effort on the part of the OS because it has
keep track of where it left off one job and from where to pick up the next.
 All jobs that enter the system are kept in the job pool on the disk. If the jobs have to be
brought to the memory and there is not enough space the OS has to decide which jobs to
bring in.
 This decision of which job to be brought to memory is called job scheduling which the OS
will have to do.
 When jobs are loaded into the memory requires that the different programs be handled
properly in the memory. So memory management has to be done by the OS.
 If several jobs are ready to run on the CPU a decision has to be made as to which job goes
first to the CPU. This decision making process is called CPU scheduling. The part of the OS
which decides on the job that will go to the CPU is called the dispatcher.
 The OS has to make sure that when multiple programs are running concurrently they do not
affect each others program.
 It may have to also take part in the I/O management. I/O interrupt handling has to be done
when a job comes back indication the completion of I/O processing.

 High CPU utilization.
 It appears that many programs allotted CPU almost simultaneously.

 Jobs may be different sizes so memory management is needed to accommodate them.

Dept of ISE, Dr.AIT Page 6


 Since many jobs are ready to run CPU scheduling will have to be done.
 Will have to do process management, disk and memory management.
 Multiprogrammed systems do not function well if there are only CPU bound jobs or only I/O
bound jobs.
 Does not allow user interaction with the computer.

For better performance we need a mix of CPU and I/O bound jobs. When a proper mix is there an
increase in degree of multiprogramming will yield higher throughput.
 The number of processes running simultaneously and competing for the CPU is called degree
of multiprogramming.

Multiprogramming System memory layout:

Time-Sharing System:
Time-sharing is a logical extension of Multiprogramming System. It is also called as a
multitasking system. In a non-interactive computing environment like in batch processing or
multiprogramming the user has no contact with his program during execution. Such systems provide
poor service to the user. In the new paradigm of time-sharing there is an interactive computing
environment and quick response to the user requests. One of the first Timesharing System was the
Compatible Time Sharing System (CTSS) developed by MIT and used on IBM 7094 and supported
a large number of users. It was followed by MULTICS (Multiplexed Information and Computing
(Note: Windows 95 is multitasking and not multiuser but LINUX is both multitasking and multiuser
system. As Time-sharing systems evolved processes were sometimes called as tasks)
The goal was to enable many users interact with the computer system simultaneously and each using
his or her own terminal keyboard and display device.
It is an interactive system or a hands-on computer system that provides direct communication
between the user and the system.
Time sharing is an operating system feature that allows several users to run several tasks
concurrently on one or more processors providing each user with his or her own input and output
terminal and providing an impression that he/she is the only one using the system.

 Interactive systems need to recognize a terminal as an input medium. Thus the user gives
instructions to the OS or a program directly using a keyboard or a mouse and waits for
immediate results. This time between request and service is called response time. The
effectiveness of the timesharing system is measured using its response time.
 Allows several users to share the computer simultaneously.
 The presence of other users is transparent to the user.
 The action or commands tend to be short and hence only little CPU time is needed for each
user. The CPU gives a small time slot called as the time slice. Time Slice δ is the largest
amount of CPU time any program can consume when scheduled to execute.
 The system switches rapidly from one user to another giving an impression that the entire
computer is dedicated to his/her use even though it is shared by many users.
 The time-sharing system uses CPU scheduling and multiprogramming to provide each user
with a small portion of a time-shared computer.

Dept of ISE, Dr.AIT Page 7


 Each user has at least one separate program/process in memory. Each process executes for a
short time before it either finishes or needs I/O operation. This interactive I/O takes place at
the users speed, during which time the CPU switches to another process of a different
program of another user.
 They have memory management and protection since many user tasks are in the memory
 It implements the concept of virtual memory (VM). In order to get reasonable response time
the jobs in the memory have to be swapped in and out of the main memory. To achieve this it
uses the concept of virtual memory which is a technique that allows the execution of a job
that may not be completely in the physical memory.
Advantage of VM is that:
The size of the program can be greater than the size of the physical memory.
It helps to separate the logical memory as viewed by the user from the physical
 Time sharing systems also provide file systems.
 They also provide disk management, a mechanism for concurrent execution and many CPU
scheduling techniques.
 It also provides job synchronization, and communication and deadlock handling.

Operating System Operations:

What drives the OS operations?
Interrupt drives the OS operations. If there are no services to be offered by the system then
the OS sits idle doing nothing. When an event occurs requiring the attention and services of the OS
then an interrupt is generated to signal the occurrence of the event. For each type of interrupt
separate segments of code determine what action to take.

Trap- A trap or an exception is a software generated interrupt caused either by an error or by a

specific request from a user program.

Dual Mode operation:

Since there is both OS code and the user code in the system (1) we need a mechanism to
distinguish between the two. To distinguish these codes the system needs to operate in different
modes. At the least we need two different modes of operations. Hence the name dual-mode of
operation. (2) It is also required to protect the OS from malicious users.

The two different modes of operation are user mode and kernel mode (also called as
supervisor mode, system mode or privileged mode).

Systems provide hardware support to distinguish between the modes. A bit called the mode
bit is added to the hardware to indicate the current mode. When the mode bit is 0 it indicates that
kernel mode is on and when the bit is set to 1 it implies the user mode. The mode bit thus indicates
the task being performed is either on behalf of the user or the OS.

When the user code is being executed the system is set to user mode. When the user requests
a service from the OS via a system call it must transition from the user mode to the kernel mode.

Dept of ISE, Dr.AIT Page 8


Transition from user mode to kernel mode

user process
User process executing Calls system call Return from system

kernel trap return

mode bit = 0 mode bit = 1

Execute system call

Privileged instructions
Some machine instructions that are executed only in the kernel mode and cannot be made
easily accessible to errant users are said to be privileged instructions. If an attempt is made to
execute the privileged instructions in the user mode the hardware treats it as an illegal operation and
traps to the OS. Ex. instruction to switch to user mode, I/O control, timer management and interrupt

System calls
System calls are a means by which the user program can ask the OS to perform tasks
reserved for the OS on the user program’s behalf. Or it is a method used by a process to request
action by the OS.
When a system call is invoked it usually takes the form of a trap to a specific location in the
interrupt vector.
When a system call is executed it is treated by the hardware as software interrupt. The
interrupt control passes to the interrupt vector to a service routine. The mode bit is set to kernel
mode. The kernel examines the interrupting instruction to determine what system call has occurred.
The parameters passed indicate what type of service the user is requesting. Additional information
can also be passed in registers or on the stack or in memory. The kernel verifies the parameters are
correct and legal, executes the request and returns control to the instruction following the system

 Very early in time MS-DOS was written for 8088 architecture with no mode bit. Hence the
user program could wipe out the OS.
 Recent one i.e. Pentium provides dual-mode of operation.
 Microsoft Windows 2000 and XP, Linux and Solaris also provide greater protection

Errors violating modes are detected by the hardware and handled by the OS. When an illegal
operation like say ex. access to address not in the users space; the hardware will trap to the OS.
Interrupt is transferred to the interrupt vector. When the error occurs the program is terminated
abnormally, an error message is given and a memory dump may be taken.

goals of a timer
 To ensure control over the CPU.
 To prevent the user program from getting stuck in an infinite loop.
 To prevent a program from not returning the resources that it holds.
 To prevent the user program from running too long.
Dept of ISE, Dr.AIT Page 9

A timer is set to act like an interrupt after a specified period of time. The period may be fixed
or a variable. A variable timer is implemented by a fixed rate clock and a counter. The OS sets the
counter and every time the clock ticks the counter is decremented. When the counter reaches zero an
interrupt occurs. Before turning over control to the user the OS ensures that the timer interrupt
occurred. If so the control is transferred the OS so that a suitable action can be taken. Instructions
with privilege can only modify the timer value.

Process management:
A program in execution is a process. It is unit of work in a system.
Ex1. a time-shared user program like a compiler is a process.
Ex2. A word processing program run by an individual user on a PC is a process.
The different types of processes are:
1. OS processes called system processes as they execute system code.
2. User processes that execute user code.
They execute concurrently by time multiplexing the CPU.

What is the difference between a program and a process?

A program is a passive entity like a file stored on the disk.
A process is an active entity like a part of the program in execution.

A process requires resources to complete its task. Resources include CPU time, memory,
files, I/O devices etc. Along with the resources it may also require initialization data. After execution
the resources are returned to the pool.
Ex. if the function of a process is to display the status of a file on the screen then the process must be
given an input namely the name of the file.

The process acquires the resources it needs right at the time of creation or while it is running.

A single threaded process has one PC (program counter) specifying the address of the next
instruction to execute. Such processes are sequential i.e. the CPU executed one instruction after the
other. Even if there are two or more processes associated with a program they are considered as
separate execution sequences.
A multi-threaded process has multiple PCs each pointing to the next instruction to execute
for a given thread.

What activities of process management is the OS responsible for?

 Creating and deleting both user and system processes.
 Suspending and resuming processes.
 Providing mechanisms for process synchronization.
 Providing mechanisms for process communication.
 Providing mechanisms for deadlock handling.

Memory Management:

What are the features of the main memory?

 Central to the operation of the modern computer system.
 Is a large array of words or bytes ranging in size from hundreds of thousands to billions.
 Each word or byte has its own address.
 It contains a repository of quickly accessible data shared by the CPU and the I/O devices.

Dept of ISE, Dr.AIT Page 10


 The CPU can access and address only the main memory directly. Any data on the disk will
have to be moved to the main memory first before the CPU can access.
 The CPU reads instructions from the main memory during the instruction fetch cycle and
reads and writes data to the main memory during the data fetch cycle.
 Its addresses are absolute addresses and every instruction before execution is mapped to the
absolute address. Hence a program execution will involve accessing instructions and data by
generating the corresponding absolute addresses.
 When the program terminates its memory space is freed and a new program is loaded.

In order to improve CPU utilization and to speed up the response time for the users several
users programs are kept in the main memory. Hence to be able to manage multiple programs and to
prevent conflicts memory management is required.

Design feature:
The scheme of memory management for a specific system must take into account the hardware
design of the system.

What activities of memory management is the OS responsible for?

 Keeping track of which parts of memory are currently being used and by whom.
 Deciding which processes and data to move into and out of memory.
 Allocating and deallocating memory space as needed.

Storage Management:
One of the OS goals is to provide convenience to the users. In this regard the OS provides a
uniform logical view of information storage. The physical properties of the storage devices are
abstracted to define a logical storage unit called a file.

File System Management:

The different types of physical media are Magnetic disk, optical disk, magnetic tape. Each
has their own physical organization and characteristics like access speed, capacity, data transfer rate,
access method (sequential or random). Each medium is controlled by a device like disk drive, tape
drive etc. which have their own unique characteristics.

File-It is a collection of related information defined by its creator. They commonly represent
programs (source and object) and data.

Types of files
Program files, data files. Data files may be numeric, alphabetic, alphanumeric or binary. Files may
be free form like text files or may be formatted.

File-System Management component

 Most visible component of the OS is the file management.
 This component of the OS implements the abstract concept of a file by managing mass
storage media like tapes, disks etc. and the devices that control them.
 Organizing files into directories to make them easier to use.
 Control the access to files when multiple users have access by defining controls like read,
write append etc. to the users.

Dept of ISE, Dr.AIT Page 11


What activities of file management is the OS responsible for?

 Creating and deleting of files.
 Creating and deleting of directories to organize files.
 Supporting primitives for manipulating files and directories.
 Mapping files onto secondary storage.
 Backing up files on stable or non-volatile storage media.

Mass-Storage Management:

drawback with the main memory

 It is too small to accommodate all data and programs.
 Data held in this memory is lost when power goes off.
 Cannot be used as a back up.

What is secondary storage used for?

 It is used as a back up for the main memory. Ex. Disks.
 Most programs including compilers, assemblers, word processors, editors etc are stored on
the secondary device until loaded into the memory.
 It is also used as a source and destination of processing.

Design criteria?
The speed of operation of the computer may depend on the speed of the disk system and the
algorithms that manipulate the subsystem.

What activities of disk management is the OS responsible for?

 Free-space management.
 Storage allocation.
 Disk scheduling.

Tertiary storage devices

Slower and low cost but high capacity backup storage devices are called tertiary devices
which are used for back-up of the regular disk data, seldom used data, long term archival storage etc.
Ex. Magnetic tapes and their drives, CD and DVD drives and platters like tape and optical platters.
They vary in formats like WORM (write once read many) and RW (read write).

Tertiary storage management is either done by the OS or can be left to application program.

What activities of tertiary storage management is the OS responsible for?

 Mounting and unmounting.
 Allocating and freeing devices.
 Migrating data from secondary to tertiary storage.

It is a fast memory which is used for storing information under the assumption that we will
need the info again very soon. A copy of the information is maintained in the cache on a temporary
basis. When a particular piece of information is required during processing first the cache is
searched. If the information is available it is directly used from the cache else a search to the disk is
Dept of ISE, Dr.AIT Page 12


1. Internally programmable registers like index registers can be used as high-speed cache for
the main memory. Either the programmer or the compiler implements the register allocation
and register replacement algorithms to decide which info to be kept in registers and which in
the main memory.
2. Caches can also be implemented in hardware. Ex. most systems have instruction cache to
hold the next instruction expected to be executed. This prevents the CPU from waiting
several cycles when an instruction is to be fetched from memory.

Cache management an important design problem:

1. Cache size.
2. Replacement policy.

A copy of the same information may exist at different levels in the storage hierarchy. Bulk of the
secondary storage is on magnetic disks. They may be backed up by magnetic tapes or removable
disks to protect against any loss of data. The movement of data in the hierarchy may be explicit or
implicit depending the hardware design and the controlling OS software.

In general transfer of data from cache to CPU registers is a hardware function and transfer from disk
to memory is controlled by the OS.

What about inconsistency of data when they are multiple copies of the same thing?
Ex. Say an integer A of file B is to be incremented. Through an I/O operation the block
containing A is copied to the main memory. This is followed by a copy to the cache and the internal
registers of the CPU. Hence copy of A appears in several places. The increment to A takes place in
the internal registers. At this point there is inconsistency of the same data. When the new value of A
is written back to the disk then the value of A changes everywhere.

What if multiple users have access the same integer A?

That is in a multitasking environment care must be taken to see that each of the user gets the
latest updated value of A.
The situation is more complex in a multiprocessor environment. There may be several caches
each associated with one CPU. Then the value of A may be different in each of the caches. This
situation is called as cache coherency. It is usually handled at the hardware level.
The situation becomes even more complex in a distributed environment. Several copies of
the same file may be kept on different computers. Since the various replicas may be accessed and
updated concurrently there is a problem of coherence. To prevent this most systems ensure that when
a replica is updated in one place all other replicas are brought to date as soon as possible.

I/O Subsystems:
The I/O subsystem consists of several components like:
 The memory management component that includes buffering, caching and spooling.
 A general device driver interface.
 Drivers for specific hardware devices.
The OS must hide the peculiarities of the I/O devices from the user. Only the device drivers know
the peculiarities of each of the devices.

Protection and Security:

What are protection and its mechanism?
Dept of ISE, Dr.AIT Page 13

Protection is any mechanism for controlling the access of processes or users to the resources defined
by a computer system.
The mechanism must specify the controls to be imposed and means for enforcement of the controls.

What is the need for enforcing protection mechanism?

When a computer system has multiple users and allows the concurrent execution of multiple
processes a protection mechanism is required to regulate access of data. Hence system resources like
files, memory segments, CPU etc. are made available to only those process which have gained
authorization. Ex. memory addressing hardware ensures that a process can execute only within its
address space. Or the timer ensures that processes relinquish control on CPU before they can regain

What are the advantages of providing protection?

 Can improve reliability by detecting latent errors at the interfaces between component
 Early detection of interface errors can prevent corruption of a good subsystems by another
malfunctioning subsystem.
 Protection can prevent misuse by an unauthorized or incompetent user.

What is security?
It is a mechanism to ensure that the resources of a system are used as intended under all
A system may have adequate protection but may not be secure. The security system must defend the
system from external and internal attacks. Attacks can be of various types like viruses, worms,
denial-of-service attack, identity theft, theft of service etc. Ex. If a users authentication information
is stolen then the owner’s data can be stolen, corrupted or deleted.

The mechanism of protection and security must be able to distinguish among all its users.
This is possible because the system maintains a list of all user ids. These ids are unique per user. The
authentication stage determines the appropriate user id for the user and that user id is associated with
all the user’s processes and threads. When it is required to distinguish among a set of users rather
than individual users then group functionality is implemented. A system-wide list of group names
and group ids are stored. A user can belong to one or more groups depending on the OS design and
implementation. If an user needs to escalate privileges for gaining extra permission then different
methods are provided by the OS for escalation. Ex. in UNIX the setuid attribute causes the program
to run with the user id of the owner of the file rather than the current user’s id. This is the effective
user id which is used until the privileges are turned off.

A network is a communication path between two or more systems. Basically when computers
communicate they either create networks or use a network. Based on type of transport media,
protocol used and distance between the nodes networks vary.

Types of protocols:
TCP/IP: common and supported by Windows and UNIX, ATM, and other proprietary
 Based on distance:
LAN: Local Area Network: exits within a room, floor or a building.
MAN: Metropolitan Area Network: could link buildings within a city.
WAN: Wide Area Network: exits between buildings, cities or countries.

Dept of ISE, Dr.AIT Page 14


SAN: Small Area Network: Blue tooth devices can communicate over short distances of
several feet like in a home setup.
 Transmission Medium:
Copper wires, fiber strands, and wireless transmissions between microwave dishes, satellites
and radio. Infrared communication is also one.
 Performance and Reliability:
Based on these two factors also the networks vary.

network operating system (NOS)

NOS is an OS that allows file sharing on the network and provides a communication scheme
that allow different processes on different computers to exchange messages. A computer running
NOS works autonomously from all other computers on the network but aware of the communication
and the other computers on the network. Hence it makes the system less transparent.

The OS provides an environment within which programs are executed: commonality.
Internally OS varies based on the makeup, algorithms and strategies used and the intended usage of
the computer system.
OS can be assessed or viewed based on the following three points:
1. Examining the services that it provides.
2. The type of the interface it makes available to the users.
3. Disassembling and looking at the components and the type of interconnection they have.

OS services:
OS provides certain services to programs & the users. The services provided vary from OS to
OS even though a common class can be identified.These services are required to provide
convenience to the programmer and make programming task easy.

Types of services:

1. Program Execution: OS must be able

 To load the program into memory.
 Run the program.
 Indicate error: abnormal termination of executing program.
 Normal end of execution of program.
2. I/O Operations: OS must be able to
 Provide a file or an I/O device when a running program requires it.
 Perform special functions for specific I/O devices (CD, DVD etc).
 Not allow the users to control I/O devices directly and look into the protection and efficient
usage of the I/O devices.
 Facilitate an efficient means to do I/O.
3. File-System manipulation: OS must be able to
 Read & write files directories.
 Create and delete files by name.

Dept of ISE, Dr.AIT Page 15


 Search for a given file.

 List file information.
 Permission management to deny access to files or directories.
4. Communication: OS must be able to
 Allow exchange of information between processes either residing on the same computer or
different computers, which are connected by a network.
 Facilitate either communication through shared memory or by message passing via packets
of information.
5. Error detection:
Types of errors?
 In the CPU and memory h/w (memory error or power failure).
 In I/O devices (lack of paper in printer, connection failure to n/w).
 In user program (arithmetic overflow).
OS must be
 Aware of possible errors.
 For each error it must take appropriate action for correct execution.
6. User Interface:
All OS have user interface and can take different forms like:
 Command-line interface(CLI): uses text commands and a specific method for entering them.
 Batch Interface: commands and directives to control are entered into files and those files are
 Graphical User Interface (GUI): most common. Interface is a window system with a pointing
device directing the I/O, choose from menus, make selections along with keyboard to enter
The following services is to ensure that the system works efficiently and computer resources are
properly shared by the users.
7. Resource allocation:
The resources like CPU, memory and files have to be allocated to multiple jobs which may
be running simultaneously. Allocation can happen through special allocation algorithms or through
request release code. Ex. to determine how best to use the CPU the OS has CPU-scheduling
algorithm that takes into account speed of the CPU, number of jobs that must be executed, the
number of registers that are available etc. Routines are also used to allocate printers, modems, USB
storage devices and other peripheral devices. An internal table may have to be maintained regarding
the allocation of resources and table entry to be cleared when resources are no longer in use.
8. Accounting:
To keep track of which users, how many users and what kind of resources are used and for
how long. This record keeping may be for accounting or for statistics purpose. Statistic may helpful
to reconfigure the computer system if required in order to improve performance and computing
9. Protection & Security:
What is protection?
Protection involves ensuring that all access to system resources is controlled.
The information stored on the computer will require controlled access. During concurrent execution
one process should not interfere with the other process or with the OS itself. Security can be ensured
Dept of ISE, Dr.AIT Page 16

by having authentication like having login and password. Protection also applies to I/O devices,
modems, network adapters so that no break-ins happen.

User Operating System Interface:

How does the user interface with the OS?
1. Through command-line interface or the command interpreter.
2. Through Graphical User Interface GUI.
1. Command Interpreter System:
Command interpreter is the interface between the user and the OS. The function is to get and
execute the next user-specified command. It resides either in the kernel for some OS and in others
(MS-DOS and UNIX) it is treated as a special program that is running when a job is initiated.
Control statements- Commands are given to the OS through statements called control statement.

Command-line interpreter/control-card interpreter

The program that reads and interprets the control statements automatically is the command-
line interpreter/control-card interpreter. It basically gets the next command statement and executes it.
What is a shell?
When a system has multiple command interpreters to choose from the interpreters are known as
Variations in shell? => differences in user interface!
 User-friendly command interpreter, mouse based window and menu system like in
Macintosh and Windows.
Commands are implemented in two ways:
1. Command interpreter itself has the code to execute the command. Ex. say a command to
delete a file. This will result in the command interpreter to go to a section of its code that sets
up the parameters and makes the appropriate system call.
Disadvantage: In this method the size of the command interpreter depends on the number of
commands that can be given.
2. Most commands are implemented through system programs. The command interpreter uses
the command to identify a file to be loaded into memory and executed. Ex. rm file.txt would
make the command interpreter search for a file rm, load that file into memory and execute it
with parameter file.txt.
 Command interpreter program is small.
 Command interpreter does not have to be changed when new commands are added.
 New commands can be easily added to the system.
Traditionally UNIX systems have used command-line interpreters. This is because they provide
powerful shell interfaces.

2. Graphical User Interface GUI:

It is a mouse based window and menu system providing an interface instead of directly
entering commands. It is desktop metaphor. In this interface the mouse is moved to position its
pointer on images or icons on the screen that represent programs, files directories and other system
functions. Clicking the mouse on the icon invokes the corresponding program.

Dept of ISE, Dr.AIT Page 17


System Calls:
System calls provide the interface between a process and the OS. They are generally in
assembly language and are listed in manuals used by programmers. Some system calls are also made
from HLL which resemble predefine functions or subroutine calls. C, C++ and Perl are used to write
the system calls. They may be generated directly inline or a call to a special run time routine can
make the system call.
Discuss the example of how to read from one file and copy to another file.
In UNIX system calls are directly invoked from C or C++ program.
In MS Windows platforms system calls are part Win32 API.
Sometimes parameters are passed to the system calls. Three methods are used to pass
parameters to the OS:
1. Pass parameters in registers.
2. If the parameters are more than the registers then they are stored in a block or a table and the
address of the block is passed as a parameter in the register. (like in LINUX)
3. Parameters can be pushed on to the stack by the program and popped off by the OS.
Types/categories of system calls?
 Process control
o end, abort
=> normal execution, abnormal execution (causes error trap, dump of memory is
taken and error message generated). Dump written to disk and examined by the debugger.
=> in both cases the control is passed to command interpreter which reads the next
=> depending on the error an error level is indicated (level 0 for normal) so that next
action can be automatically interpreted.
o load, execute
=> for loading and executing a program.
=> The command interpreter after loading starts executing the program and on
termination the control has to be returned.
=> based on whether the program is lost, saved or allowed to continue the control is
sent to the relevant location.
o create process, terminate process
=> A process or a job executing one program can load and execute another program.
If both programs have to run concurrently then multiprogramming is happening.
=> For which we use create or submit process.
=> to terminate a process that was created terminate process is used.
o get process attributes, set process attributes
=> To control the execution of the jobs that are created.
Control can be to determine (get process attributes) and reset (set process attributes)
the attributes of a job or a process.
=> the attributes can be job’s priority, maximum allowable execution time, etc.
o wait for time
=> When jobs have been created waiting is required before the process can finish the job.

Dept of ISE, Dr.AIT Page 18


=> to wait for certain amount of time wait time system call is used.
o wait event, signal event
=> When a job has to wait for a certain event to occur wait event system call is used.
=> When the event has occurred the job should signal the occurrence through the signal
event system call.
o allocate and free memory
Some system calls help in debugging.
 Ex. some system calls help to dump memory.
 A program trace lists each instruction as it is executed.
 In single step mode a trap is executed by the CPU after each instruction. The trap is caught
by the debugger which helps in finding and correcting bugs.
 A time profile provided by the OS indicates the amount of time that the program executes at
a particular location or set of locations.
 A time profile requires tracing facility or regular timer interrupt.

 File management.
o create file, delete file
=> the system call will require name of the file and perhaps some attributes.
o open, close
=> once the file is created it has to be opened with the open system call and then closed
with a close system call.
o read, write, reposition
=> once the file has been opened read, write or reposition ( skip to eof) may have to be
done and the corresponding system call is used.
o get file attributes, set file attributes
=> attributes of a file like file name, file type, protection codes, accounting information
can be determined or changed using these two system calls.
 Device management
When a program is running it may need additional resources like more memory, tape drives,
access to files etc.
o request device, release device
=> make a request for additional resources. If the resources are available they are granted
and control can be returned to the user process else the program will have to wait for the
=> Since the system can have multiple users then a request has to be made for the
resource through the request system call and released through the release system call
once the work is done.
o read, write, reposition
=> once the device has been requested and allocated read, write and reposition can be
done on the device by using the corresponding system calls.
o get device attributes, set device attributes
o logically attach or detach devices
 Information maintenance
These system calls help in the transfer of information between the user program and the OS.
Dept of ISE, Dr.AIT Page 19

o get time or date, set time or date

Help to get and set current time and date.
o get system data, set system data
o get process, file or device attributes
o set process, file or device attributes
 Communications
o create, delete communication connection
o send, receive messages
o transfer status information
o attach or detach remote devices
Two modes of communication is available:
1. Message Passing model: Method useful when small amount of data needs to be exchanged. It
is easier to implement. No conflicts are encountered.
Information exchange takes place through the inter-process communication facility provided by
 Before the communication a connection is established and the name of the other
communicator (process on the same CPU or different CPU connected via a network) must be
known. So a create communication connection system call is used.
 Each computer has a host name (IP name) and each process also has a process name on the
network. The system calls get hostid and get processid is used.
 Once the communication is done a close or delete connection is used.
2. Shared Memory model:
 Processes use memory map system calls to gain access to regions of memory owned by
other processes.
 The OS tries to prevent one process from accessing another process memory so several
processes have to agree to remove restriction.
 Then exchange of data takes place through read and write in the shared areas.
 They allow maximum speed since it can be done at memory speeds.
 Protection & synchronization is a problem.

System Programs:
System programs fall between the OS and the application program. They provide convenient
environment for program development and execution. They are like user interfaces to the system
calls. The categories of system programs are:
1. File Management: These programs create, delete, copy, rename, print, dump, list and
manipulate files & directories.
2. Status Information: They help to ask the system for date, time, amount of available memory
or disk space, number of u sers etc. The information is formatted and printed to the terminal
or o/p device.
3. File Modification: Text editors help to create and modify the contents of the files stored on
the disk or tape.
4. Programming-Language Support: Compilers, assemblers, and interpreters for common
languages come with the OS. Sometimes they come separately also.

Dept of ISE, Dr.AIT Page 20


5. Program Loading & execution: A compiled program to be loaded into memory for execution
is done by these programs. They can be absolute loaders, relocatable loaders, linkage editors
and overlay loaders.
6. Communications: These programs help to create connection among processes, users and
different computer systems. They allow users to send messages to others screen, browse the
net, send electronic mail. Log in remotely or transfer files from one machine to another.
Common programs supplied with the OS are web browsers, word processors, text formatters,
spreadsheets, d/b systems, compilers, games, statistical packages etc. These programs are called as
system utilities or application programs.
Important system program for OS is the command interpreter. Commands given at this level create,
delete, list, print, copy, execute files etc. Two approaches here are: to have the code to execute the
command in the interpreter itself. The number of commands that are supported determines the size
of the command interpreter. Alternately in UNIX the OS implements most commands as system
programs. The command simply identifies the file to be loaded into memory to be executed.
Advantage: Programmers can add new commands to the system by just creating new files related to
the command.
Disadvantage: Since they execute the command is a separate system program OS must provide a
mechanism to pass parameters to the system program from the command interpreter. Clumsy
method: parameter list can be big; Sometimes the command interpreter and the system program may
not be memory resident at the same time. Since the parameter interpretation is users choice they may
be inconsistently define across the system.

System Structure:
In order to make the OS function properly and to be easily modifiable the common approach
to development is to partition the system into small tasks and implement rather than building a
monolithic structure. Each small component must be well defined must a defined set of inputs,
outputs and functionality.
Simple Structure:
Many commercial systems do not have a well-defined structure:
Ex1. MS-DOS. Such OS started as small, simple and limited system and grew later. It was originally
designed to provide most functionality in least space. Hence it was not divided into modules
MS-DOS layer structure.

Application program

Resident system program

MS-DOS device drivers

ROM BIOS device drivers

Dept of ISE, Dr.AIT Page 21


Ex2. UNIX was limited by h/w functionality initially. It consists of two parts the kernel and the
system programs. Kernel has a series of interfaces and device drivers which got added over the
years. Traditionally it has been a layered system. Everything below the system call interface and
above the physical hardware is the kernel. Kernel does CPU scheduling, memory management, file
system and other functions through system calls. This is a lot of functionality combined at one level
which makes UNIX difficult to enhance. System calls define API to UNIX which defines the user
Newer UNIX versions:
 Come with advanced hardware. So the OS can be broken into smaller components.
 Have greater control over the computer and the applications.
 Implementers have more freedom to make changes to the inner workings of the system and
the working of the OS.
 A top down approach is used where the functionality and the features are separated into
components. This helps to hide information and hence freedom to implement the low level
routines as required.

(the users)

shells & commands

compilers & interpreters
system libraries

system call interface to the kernel

Signal terminal handling file system CPU scheduling

character I/O System terminal device swapping block I/O page replacement

kernel interface to the hardware

terminal controller device controllers memory controllers

terminals disks & tapes physical memory

Layered Approach:
One method of modularization is through a layered approach. Here the OS is broken into
number of layers of levels each built on top of the other. The bottom most layer is the h/w and the
top most is the user interface. Each layer is like an object where the implementation and operations
along with the data is encapsulated. Each layer has data structures and a set of routines that can be
invoked by the higher layers. Each layer selects and uses the functions and services of only layers
below it (*). Each layer is implemented using only those services provided by the lower layer
without actually knowing how it is implemented. Hence each layer hides its dsata structures,
operations and hardware from the higher layers.
 Modularity.
 Easy debugging and system verification (because of *). 1st layer is debugged without concern
for rest of the layers. It just has to work correctly on the hardware right below it. Once the 1 st
layer is debugged the 2nd layer is debugged and so on. If an error is found during debugging

Dept of ISE, Dr.AIT Page 22


of a particular layer the error must be in that layer since the layers below it have been verified
to be correct.
 Because of the above advantage the design and implementation of the system is simplified.
 Require careful definition of the layers.
 It tends to be less efficient than other systems. Ex. When a user program executes I/O it
executes a system call that trapped to the I/O layer which in turn calls the memory
management layer, which in turn calls the CPU scheduling layer and so on. At each layer
parameters may be modified and data need to be passed. Hence each layer adds overhead
to the system call.
 This because of the above point the system call takes longer than on a non-layered
Changes: To get over the disadvantages fewer layers with more functionality added to get the
advantage of modularized code. Ex. Windows NT had highly layered approach but delivered low
performance compared to Windows 95. Hence Windows NT 4.0 addressed this problem by moving
layers from user space to kernel space and closely integrating them.
OS layer:

As UNIX system expanded its kernel became large and difficult to manage. In mid 80s
researchers developed Mach OS that modularizes the kernel using the micro kernel approach.
What is microkernel approach?
In this approach the OS is structured such that all non-essential components are removed
from the kernel and implemented them as system and user level programs. They typically provide
minimal process and memory management and communication facility. Communication is through
message passing.
Function of the microkernel is to provide an interface between the client program and various
services that are also running in the user space.
 Ease of extending the OS. All new services are added to the user space and hence no
modification is required to the kernel. Changes are minimal because the kernel is just the
 Such an OS is easy to port from one h/w to another.
 Security is more since the services are running as user rather than kernel processes.
 Reliability is high because if a service fails the rest of the system remains untouched.
Dept of ISE, Dr.AIT Page 23

Ex. Tru64 UNIX provides UNIX interface to the user but provides Mach kernel. The Mach kernel
maps UNIX system calls to messages to the appropriate user level services.
 MacOS X Server OS is based on Mach kernel.
 QNX is a real time OS also based on microkernel.
 Windows NT uses hybrid structure since part of it is layered. It is designed to run various
applications including Win32, OS/2 and POSIX. It provides a server that runs in user space
for each application type. The kernel coordinates the message passing between the client
applications and application servers.
Current methodology for OS design is using OO programming techniques. This helps to create
modular kernel.
 OS only has core components.
 For additional services dynamic linking at boot time or run time is done. I.e. dynamically
loadable modules.
 Ex. Solaris, Linux and Mac OSX.
Solaris: Scheduling
Device and classes File
bus drivers Systems

Core Solaris
Miscellaneous kernel
system calls

STREAMS Executable
modules formats

 Organized around core kernel.

 Have types of loadable kernel modules.
 Advantage:
o Give core features plus allow certain features to be implemented dynamically. Ex.
device and bus drivers can be added dynamically depending on the type of hardware,
different file systems can be added etc.
o Though resembles a layered structure with defined and protected interfaces it is more
flexible because any module can call any other module.
o Resembles microkernel with only core functions in the kernel and has knowledge of
to load and communicate with other modules.

Dept of ISE, Dr.AIT Page 24


Mac OSX:
Application environments
and common services



kernel environment
 Uses hybrid structure.
 Top layers include application environments and a set of services providing GUI.
 Below is the kernel environment which has the Mach microkernel and BSD kernel.
 Mach provides memory management, supports RPC, IPC, message passing and thread
 BSD kernel provides BSD command line interpreter, support for networking and file
systems, implementation of POSIX APIs.
 The Mach and BSD provide an environment that provides an I/O kit for development of
device drivers and dynamically loadable modules.

Virtual Machines:
Computer system made of layers with h/w at the bottom and the kernel above it. Kernel uses
the h/w instructions and creates a set of system calls for use by the outer layer. The system programs
above the kernel can use system calls or hardware instructions and cannot in fact differentiate
between the two, even though they are accessed differently. System programs use them to create
more advanced functions. The system calls and h/w instructions are treated to be at the same level by
the system programs.
A virtual machine takes the layered approach to its logical conclusion. It treats hardware and
the operating system kernel as though they were all hardware. A virtual machine provides an
interface identical to the underlying bare hardware. The operating system creates the illusion of
multiple processes, each executing on its own processor with its own (virtual) memory. The
resources of the physical computer are shared to create the virtual machines. CPU scheduling can
create the appearance that users have their own processor. Spooling and a file system can provide
virtual card readers and virtual line printers. A normal user time-sharing terminal serves as the
virtual machine operator‘s console.

Dept of ISE, Dr.AIT Page 25


Advantages and Disadvantages of Virtual Machines

• The virtual-machine concept provides complete protection of system resources since each virtual
machine is isolated from all other virtual machines. This isolation, however, permits no direct
sharing of resources.
• A virtual-machine system is a perfect vehicle for operating-systems research and development.
development is done on the virtual machine, instead of on a physical machine and so does not
disrupt normal system operation.
• The virtual machine concept is difficult to implement due to the effort required to provide an exact
duplicate to the underlying machine.
Example : Java Virtual Machine
• Compiled Java programs are platform-neutral bytecodes executed by a Java Virtual Machine
• JVM consists of -class loader -class verifier -runtime interpreter
• Just-In-Time (JIT) compilers increase performance

fig: java virtual machine

OS Generation:
It is easy to design OS that that is specific for one machine and one site.
system generation (SYSGEN)
The designing of an OS that runs on any of a class of machines and at different sites with
different peripheral configurations is required. This requires that the system must be configured for
each specific computer site and this process is called system generation.
The SYSGEN program reads from a given file, asks the operator of the system for
information concerning the specific configuration of hardware system or probes the hardware
directly what its components are.

Dept of ISE, Dr.AIT Page 26


Different information is determined by the SYSGEN

 What CPU is used?
 What options like extended instruction set, floating point arithmetic etc. is used?
 If multiple CPUs define each CPU is used.
 How much memory is available?
 What devices are available?
 What OS options are desired?
 What parameter values are used?
After this information is collected the following possibilities take place:
1. The system administrator can use it to modify a copy of the source code of the OS. The OS is
then completely compiled. Data declarations, initializations, and constants along with
conditional compilations produce a new object version of the OS that is tailored to the system
2. The system description can cause the creation of tables and the selection of modules from
precompiled library. These modules are linked together to form the OS. The library supports
all I/O devices but only those needed are linked to the OS. No recompilation hence system
generation is faster.
3. The system is built that is completely table driven. All the code is always part of the system
but the selection of required code is done at the run time rather than compile or link time.

System Boot:
The procedure of starting a computer by loading the kernel is called booting. After loading
execution begins at which point the system is said to be running.
How does the hardware know where the kernel is how to load the kernel?
 On most computer systems a small program called the bootstrap program or bootstrap loader
locates the kernel, loads it into memory and starts execution.
 In some PCs it is a two-step process. A simple bootstrap loader fetches a more complex boot
program form the disk, which in turn loads the kernel.
 In cellular phones, PDAs and consoles the entire OS (and the bootstrap) is stored on a ROM
(firmware). Problem here is changing the bootstrap will require change in the entire ROM
chip. Solution could be to use EPROM.
Note: Executing from firmware is slower and more expensive than executing from RAM. Hence
some systems store the OS in firmware and a copy of it in RAM for fast execution.
 In large OS and for OS that change frequently the bootstrap loader is stored in the firmware
and the OS on the disk. It has the diagnostics and a small code to read a single block from a
fixed location on the disk called the boot block. This is loaded into memory and executed.
Advantage here is that the OS can be changed by writing new versions to disk. A disk with a
boot partition is called a boot disk or system disk.
Steps in booting
 When the CPU receives reset event like power on, the IR gets loaded with a predefined
memory location containing the starting address of the bootstrap program which resides on a
ROM. Hence starting the execution there.
 The bootstrap runs diagnostics to determine the state of the machine.
 If the diagnostics passes the booting steps continue.
Dept of ISE, Dr.AIT Page 27

 All CPU registers, device controllers and some memory locations are updated.

Process Management
The evolution process of computer systems:
 A single program in execution which had complete control over the system and had access to
all the systems resources to
 Multiple programs in memory which had to be executed concurrently and required more
control and more compartmentalization.
This evolution required the introduction of the notion of a process.
Before we understand process management it is necessary to understand the concept of a process.
 A program in execution.
 A program in execution that competes for the CPU and other resources.
 A unit of work in timesharing systems. An entity that can be assigned and executed by the
 An active entity.

A complex OS is expected to do more work on behalf on the users.

 Execution of user programs.
 Take care of system tasks.

System has collection of processes. Two types:

 OS processes executing system code.
 User processes executing user code.
OS & user processes can execute concurrently with CPU multiplexing.

Process Concept:
CPU activities
 A batch system executes jobs.
 A timesharing system executes user programs called tasks.
 On a single-user system like Windows or Macintosh: the several programs like word
processor, web browser, e-mail package etc.
 Your command interpreter is a process.
All these are activities. All these activities in many respect are similar. And they are all called

Difference between program and a process:

Program: passive: contents stored in a file on a disk. They are not scheduling entities.
Process: active: with PC specifying the next instruction to execute and a set of associated resources.
It actually performs the actions specified in the program. The OS considers processes as scheduling

Process Details:
1. Consists of program code: called text section.
2. Current activity: indicated by the program counter and contents of the processors registers.
3. Temporary data stored in: stack that stores method parameters, return addresses and local
variables. Process may also have a heap which is dynamically allocated at run time.
4. Global variables stored in Data Section.

Dept of ISE, Dr.AIT Page 28


5. Process state: A state of a process is defined in part by the current activity it performs. The
process may be in the following states:
a. New: The process is being created.
b. Running: Instructions are being executed. It is the only process which is executed by
the CPU at any given time.
c. Waiting: The process is waiting for some event to occur like I/O completion or
interrupt signal. When a process is waiting it is said to be in a blocked state. A
blocked process cannot be directly scheduled even if the CPU is free.
d. Ready: The process is waiting to be assigned to the processor. Such a process is not
waiting for any external event like I/O or other interrupts.
e. Terminated: The process has finished execution.

Process state diagram:

Admitted interrupt exit

ready running
scheduler dispatch

I/O or event completion I/O or event wait


6. Process Control Block (PCB):

Each process in the operating system is represented by a process control block. Also called task
controlled block. It contains many pieces of information associated with a specific process.
pointer process state

process number

program counter


memory limits

list of open files

a. Process state: may be new, ready, running, waiting or halted.
b. Process number: A unique number allocated to the process for all future identification
and reference. Sometimes referred to as process_id.
c. Program Counter: Indicates the address of the next instruction to be executed by the
d. CPU Registers: They vary in number and type based on the computer architecture.
Contain accumulators, index registers, stack pointers, general-purpose registers and
condition code information.
e. CPU Scheduling Information: Includes process priority, pointers to scheduling queues
and other scheduling parameters.
f. Memory management information: Include values of the base and limit registers,
page table or segment table depending on the type of memory system.
Dept of ISE, Dr.AIT Page 29

g. Accounting Information: Indicates the amount of CPU and real time used, time limits,
account numbers, job or process numbers.
h. I/O status information: Includes list of I/O devices allocated to this process, list of
open files, etc.
i. Other items: can have process priority, path name etc.

7. Threads: Here it indicates that the process is executing a single thread of instructions. This
single thread of control allows the process to perform only one task at a time.

Process Scheduling:
Since there are multiple processes are there in the system like ex.:
 Some process must be running in a multiprogramming environment to increase CPU
 In timesharing the CPU has to switch between different user processes.
Since it is a uni-processor system we can have only one process to be running at a time. If there is
more than one process in the system they must be kept some place before execution. They will have
to wait for the CPU and rescheduled.
Hence the scheduler selects an available process for execution on the CPU.
While scheduling various processes there are many objectives the OS has to choose from like:
 Fairness
 Good throughput
 Good CPU utilization
 Low turnaround time
 Low waiting time
 Good response time

Different scheduling queues:

1. Job Queue: The processes that enter the system are put in this queue. They basically have all
the processes of the system.
2. Ready Queue: The processes in the main memory waiting to be executed are kept in this
queue. It is generally stored as a linked list. The header of this queue contains pointers to the
first and the final PCBs in the list. Each PCB points to the next.
The items in the list can be ordered by priority.
3. Device Queue: An executing process eventually quits, or waits for an event or I/O. In the
case of I/O request the request could be for a shared resource like a disk. If the disk is busy
with I/O of other process then the requesting process must wait. Such waiting processes are
put in a device queue.

Dept of ISE, Dr.AIT Page 30


The representation of process scheduling can be done through a queuing diagram:

ready queue CPU

I/O I/O queue I/O request

Time slice

Child Fork a child


Wait for an
Interrup interrupt
t occurs

Here rectangles represent different types of queues and the circle represents the resources that serve
the queue. When a process is executing one of the following events can occur:
 The process can issue an I/O request and then placed in the I/O queue.
 The process could create a new process and wait for its termination.
 The process could be removed from the CPU as a result of an interrupt after which it is put
back to the ready queue.
Note: The main process or the parent process can create a child process which in turn can also create
its own child process. All these processes form a tree structure with the parent process as the root.
There are some advantages for creating a child process:
1. Computational Speed-up: multiple processes results in multi-tasking. OS can interleave execution
of I/O bound and CPU bound to increase the degree of multiprogramming.
2. Higher priority for critical functions: A child process created to perform a critical function in an
application may be assigned higher priority than others. Such priority assignments help the OS to
meet real-time requirements in an application.
3. Protection of parent process from errors: The OS cancels the child process if an error arises during
its execution. This action does not affect the parent process. This is normally done when a software
system has to invoke an untrusted program.

A process migrates between various queues during its lifetime. A scheduler is one that does
the selection of a process for scheduling purpose from the various queues maintained in the system.
Types of schedulers?
1. Long-term scheduler:
 Ex. In a batch system more processes are submitted than can be executed immediately.
 They are spooled on a mass storage (disk) where they are kept for later execution.
 This scheduler also called the job-scheduler selects processes from this pool and loads them
into memory for execution.
 This scheduler executes much less frequently.

Dept of ISE, Dr.AIT Page 31


 There may be minutes before the creation of processes in the system.

 This scheduler controls the degree of multiprogramming i.e. the number of processes in the
main memory.
 To have the degree of multiprogramming stable the average rate of process creation must be
equal to the average departure rate of the process. Hence the scheduler is invoked only when
a process is leaving the system.
 Since there are long intervals between executions the long-term scheduler can afford to take
more time to select a process.
 The long-term scheduler must make a careful selection. Since there are typically I/O bound
and CPU bound processes for best performance a combination of the two types of processes
is required.
 In some systems long-term scheduler may be absent or minimal. Ex. In time-sharing UNIX
there is no long-term scheduler. It puts every new process created to a short-term scheduler.
The stability of such systems depends on the physical limitations like the number of available
terminals or on the self-adjusting nature of users.

I/O bound process- It is a process that spends more time doing I/O and spends less time on useful
CPU bound process-It is a process that spends more time doing computation and generates I/O
requests very infrequently.

2. Short-term scheduler:
 Also called as CPU-scheduler.
 It selects from among the processes that are ready to execute and allocates the CPU to them.
 The frequency is low i.e. must select a process very frequently.
 It typically executes at least every 100msecs.
 Because of the brief time intervals the short-term scheduler must execute very fast.
 Ex. If it takes 10ms to decide to execute a process for 100ms what percent of CPU is used for
= 10/(100 + 10) = 9% of the CPU is used for scheduling the work.

3. Medium-Term scheduler:
 Some time-sharing systems introduce an additional intermediate level of scheduling.
 This scheduler removes the processes from active contention of CPU (from memory) to
reduce the degree of multiprogramming.
 The process is reintroduced later and continued from where it was left of. This scheme is
called swapping.
 Swapping may be necessary to improve the process mix or a change in memory requirements
has over committed available memory requiring memory to be freed up.

Context Switch:
The process of switching the CPU to another process by saving the state of the old process
and loading the saved state of a new process is called context switch.
The context of a process is represented by its PCB including the value of the CPU registers,
the process state (state diagram) and the memory management information.
When a context switch occurs the kernel saves the context (state save) of the old process in
its PCB and then loads (state restore) the context of the saved process scheduled to run.
Drawback of context switching
Context switch is an overhead because the system does no useful work during switching.

Dept of ISE, Dr.AIT Page 32


The speed of switching varies from machine to machine depending on the memory speed and
the number of registers that must be copied and the existence of special instructions. Typical speed
varies from 1 to 1000microsecs.
Other factors that context switch depends on?
 It is highly dependent on the underlying hardware. Ex. In Sun UltraSPARC there are multiple
sets of registers. A context switch just involves changing the pointer from one set to another
register set. If active processes exceed the register set then the register data is copied to and
from the memory.
 More the complex an OS is more work is done during context switch.
 More advanced is the memory management technique more data has to be switched with
each context.

Operations on processes:
The system must provide a mechanism for creation and deletion of processes.
Process Creation:
 Process created via create_process system call.
 Creating process is the parent process and the created is the child process.
 Each of these may again create processes forming a tree structure.
 The parent has the identity of all its child processes.
 The created sub process may get its resources directly from the OS or may be constrained to
the subset of resources of the parent.
 The parent either partitions its resources among its children or shares its resources.
 The advantage of restricting the resources of a child process to the parent’s resources is:
prevents any process from overloading the system by creating too many sub-processes.
 When a child process is created initialisation data may be passed from the parent process
apart from other physical and logical resources.
 With respect to execution two possibilities exist:
o Parent continues to execute concurrently with the children.
o The parent waits until some or all of the children have terminated.
 With respect to the address space there are two possibilities:
o The child process is a duplicate of parent process. Ex. UNIX
o The child process has a program loaded into it. Ex. DEC VMS
o Windows NT supports both implementations.

Process Termination:
 When a process finishes execution it terminates and asks the OS to delete it using the exit
system call.
 All the resources of a child are returned to the parent process.
 The resources of the main process including physical and virtual memory, open files, I/O
buffers are deallocated by the OS.
 A process can initiate termination via abort system call.
 Only the parent can invoke such a call. Else users will arbitrarily kill each other.
 A parent can terminate the child process for various reasons:
o The task assigned to the child is no longer required.
o The child has exceeded the usage of some of the resources that are allocated.
o The parent wants to exit. The OS does not allow child to continue. If a process
terminates normally or abnormally all its children must also be terminated. This is
called as cascading termination.
 In UNIX we terminate a process using exit system call. The parent waits for the child to exit
via a wait system call. The wait system call returns the process id of the child terminated.
Dept of ISE, Dr.AIT Page 33

IPC (Inter Process Communication):

Cooperating Process:
The process executing in a system can be independent processes or cooperating
Independent Process: it cannot effect or be effected by the other process executing in the
system. Such processes typically do not share any data with any other process.
Cooperating Process: Processes that can effect or can be affected by other processes
executing in the system. Typically such processes share data with other processes. They require IPC
to exchange data and information.
Need for process cooperation:
 Information Sharing: If several users want the same resource like a shared file an
environment to allow that must exist.
 Computational Speedup: If a particular task is to run faster then we may want to break it up
into subtasks all executing in parallel. Such speed up happens if there are multiple
 Modularity: Helps to construct a modular system dividing the system functions into separate
processes or threads.
 Convenience: Every user may have many tasks to work at one time like editing, printing,
compiling etc. That can be achieved through cooperation.

Ex. Producer Consumer Problem:

 A common paradigm for cooperating process.
 A producer process produces information and a consumer process consumes information.
 Ex.
o A print program produces characters that are consumed by the printer driver.
o A compiler produces assembly code consumed by the assembler.
 To allow producer consumer process to run concurrently we have a buffer that can be filled
by the producer and emptied by the consumer. A producer can produce one item when the
consumer consumes another item.
 Synchronization is required so that a consumer does not try to consume an item not yet
 There are two implementations of this ex.
 Unbounded-Buffer producer-consumer problem: There is no practical limit to the size of the
buffer. The producer can always produce but the consumer may have to wait.
 Bounded-Buffer producer-consumer problem: Here there is fixed size buffer. Here the
consumer must wait if the buffer is empty and the producer must wait if the buffer is full.
 The buffer is either implemented by OS through IPC (inter process communication) or
through shared memory. They share a common buffer pool.

IPC (Inter Process Communication):

IPC is of two types: Shared Memory and Message Passing. Many Operating Systems
implement both.
 We can have the cooperating processes to communicate via IPC.
 IPC helps in communication & synchronization without sharing the same address space.
 Typically used in Distributed systems where communicating processes are residing on
different systems.
 IPC is best implemented by message passing system.

Shared Memory:

Dept of ISE, Dr.AIT Page 34


 Processes exchange data by reading and writing to the shared region.

 They allow maximum speed and convenience of communication as it is done at memory
speeds within the computer.
 The system calls used are only to establish shared regions. Once the shared memory is
established all accesses are treated as routine memory accesses and no assistance from the
kernel is required.
 The communicating process has to establish the region of shared memory.
 Typically the shared region exists in the address space of the process creating the shared
region. The other processes which wish to communicate through the shared region must
attach it to their address space.
 In normal circumstances the OS prevents one process from accessing another process’s
memory. Hence in this form of IPC it is required that the two or more processes agree to
remove this restriction.
 The form of data and location is determined by the process and the OS has no control.
 The processes are also responsible for ensuring that they are not writing to the same location
Ex. Producer Consumer Problem:
 A common paradigm for cooperating process.
 A producer process produces information and a consumer process consumes information.
 Ex.
o A print program produces characters that are consumed by the printer driver.
o A compiler produces assembly code consumed by the assembler.
 To allow producer consumer process to run concurrently we have a buffer that can be filled
by the producer and emptied by the consumer. A producer can produce one item when the
consumer consumes another item.
 Synchronization is required so that a consumer does not try to consume an item not yet
 There are two implementations of this ex.
 Unbounded-Buffer producer-consumer problem: There is no practical limit to the size of the
buffer. The producer can always produce but the consumer may have to wait.
 Bounded-Buffer producer-consumer problem: Here there is fixed size buffer. Here the
consumer must wait if the buffer is empty and the producer must wait if the buffer is full.
 The buffer is either implemented by OS through IPC (inter process communication) or
through shared memory. They share a common buffer pool.

Solution to producer consumer problem:

 One solution is to used shared memory.
 A buffer is used in this case a bounded buffer that is filled by the producer and consumed by
the consumer. The buffer is in the shared region of memory between the producer and
 The processes (producer and consumer) can run concurrently; when one item is being
produced another item can be consumed at the same time.
 The shared buffer is implemented as a circular array with two pointers in and out.

#define BUFFER_SIZE 10
typedef struct {
} item;

item buffer[BUFFER_SIZE];
Dept of ISE, Dr.AIT Page 35

int in = 0;
int out = 0;

The producer process:

item nextProduced;
/* produce an item in nextProduced*/
while (((in + 1) % BUFFER_SIZE) == out)
; /* do nothing*/
buffer[in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;

The item produced by the producer is put in a local variable called nextProduced.
The consumer process:
item nextConsumed;
while (in == out)
; /* do nothing*/
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
/* consume the item in nextConsumed*/
The consumer process has a local variable nextConsumed in which the item to be consumed
is stored.
 The buffer is empty when in ==out and buffer full when ((in + 1) % BUFFER_SIZE) ==out.
 In this solution we do not address the situation in which both the producer and the consumer
attempt to access the shared buffer concurrently. There is no synchronization done here in the

Message Passing System:

Dept of ISE, Dr.AIT Page 36


 This is a form of communication between processes without the need for shared address
space. They also do synchronization.
 Useful in a distributed environment.
 Typically two primitives are used namely send(message) and receive(message).
 This is the form of communication in microkernels.
 Messages can be of fixed size or variable size.
 The system implementation is easy if the messages are of fixed sizes but programming task
 If the message lengths are variable the system implementation is complex but programming
 Also they are more suitable for small size messages.
 They are slower because they are typically implemented by using system calls which require
more time consuming task of kernel intervention.
 For two processes P & Q to communicate a communication link must exist between them.
 The link can be physical or logical.
 The different methods for implementing the logical link and send and receive primitives is as
1. Direct or Indirect Communication:
2. Synchronous or Asynchronus Communication:
3. Automatic or Explicit Buffering:
4. Send by Copy or Send by Reference:
5. Fixed size or variable size messages:

 Naming is crucial for direct communication.
 The process that wants to communicate must explicitly mention the recipient or the sender.
 Primitives are defined as follows:
o send(P, message) => send message to process P
o receive(Q, message) => receive message from process Q.
 The communication link in this form of communication has the following characteristics:
o A link is established automatically between every pair of processes that want to
communicate. The processes only need to know each other’s address.
o A link is associated with exactly two processes.
o Exactly one link exists between each pair of processes.
 There is symmetry in this addressing because both the sender and receiver must mention each
other’s address.
 In asymmetric form only the sender has to mention the receiver’s address, the recipient is not
required to name the sender. The primitives are as follows:
o send(P, message) => send message to process P
o receive(id, message) => receive message from any process. The variable id is set to
the name of the process with which the communication has taken place.
 Disadvantage of symmetric & asymmetric:
o Limited modularity.
o Changing the name of the process requires examining all other process definitions.
I.e. all references to the old name must be found so that they can be modified to the
new name.

 Here the messages are sent to and received from mailboxes or ports.
 Each mailbox has a unique identification.
Dept of ISE, Dr.AIT Page 37

 Two processes can communicate only if they have a shared mailbox.

 One process can communicate with another via a number of mailboxes.
 The primitives are as follows:
o send(A, message) => send message to mailbox A
o receive(A, message) => receive message from mailbox A
 The communication link has the following properties:
o A link is established between a pair of processes only if they a shared mailbox.
o A link may be associated with more than two processes.
o A number of different links may exist between each pair of communication processes
with each link corresponding to one mailbox.
 If P1, P2 and P3 share the mailbox A. Say P1 sends message to A with P2 & P3 each
executing receive message.
 Who will receive the message P2 ? or P3?
 It depends on the underlying scheme that is selected:
o Allow a link to be associated with at most two processes.
o Allow at most one process to execute receive operation.
o Allow the system to select arbitrarily which process will receive the message.
 Mailbox can be owned by the process or by the OS.
Process owner:
 If mailbox is owned by the process then the mailbox is part of the address space of the
process. In this case we distinguish between a owner (who can only receive message through
the mailbox) and the user ( who can only send messages to the mailbox).
 Since each mailbox has a unique owner there can be no confusion about who should receive
a message sent to the mailbox.
 When a process that owns the mailbox terminates the mailbox also disappears.
 The sender to this mailbox must be notified about the non-existence of the mailbox.
OS owner:
 The mailbox is independent and is not attached to any particular process.
 OS provides a mechanism for the process to do as follows:
o Create a new mailbox.
o Send and receive messages through the mailbox.
o Delete the mailbox.
 The process that creates the mailbox is the owner by default.
 Initially the owner is the only process that can receive messages through the mailbox but
ownership privileges can be passed other processes through appropriate system calls.

 Communication takes place through calls send( ) and receive( ) primitives in message
passing technique.
 Message passing can be blocking/synchronous and nonblocking/asynchronous.
 Types are:
o Blocking send: the sending process is blocked until the message is received by the
receiving process or mailbox.
o Nonblocking send: the sending process sends the message and resumes operation.
o Blocking receive: the receiver blocks until a message is available.
o Nonblocking receive: the receiver retrieves either a valid message or a null.
 When the send and receive are blocking type then we have rendezvous between the sender
and receiver.
 Producer-Consumer Problem:

Dept of ISE, Dr.AIT Page 38


o Produces invokes blocking send( ) call and waits until the message is delivered to
either receiver or mailbox.
o When the consumer invokes receive ( ) it blocks until the message is available.

The messages exchanged in direct or indirect reside in queues. The queues can be
implemented in three ways:
Zero Capacity: this is message system with no buffering. The queue has a maximum length
of zero and hence there can be no message waiting. Sender must block unless the receiver receives
the message.
Bounded Capacity: queue is of finite length n. Hence at most n messages can be in the queue
at any instant of time. If the queue is not full when a new message is sent it will be placed in the
queue and the sender can continue execution without waiting. If the queue is full the sender must
block until space becomes available.
Unbounded Capacity: the queue length is potentially infinite. The sender never blocks.
CASE STUDY: Linux System

Multithreading Models:
Many-to-One Model:
 Many user level threads are mapped to one kernel thread.
 Thread management is done at user level so it is efficient but entire process blocked if a
thread does a blocking system call.
 Since only one thread can access the kernel at a time multiple threads are unable to run in
parallel on multicomputers. I.e. even though many user threads can be created true
concurrency is not achieved because the kernel can schedule one thread at a time.
 Green threads - a thread library available in Solaris 2 uses this model.
 Also user level thread libraries implemented on OS that do not support kernel threads use the
many-to-one model.

One-to-One Model:
 Each user thread is mapped to one kernel thread.
 More concurrency is achieved since another thread can run if one thread makes a blocking
system call.
 Multiple threads can run in parallel in multiprocessor systems.
 Problem in this method is that creating a user requires creating a kernel thread.
 Creating kernel threads can affect the performance of the application.
 Implementations of this model restricts the number of threads supported by the system.
 Windows NT, Windows 2000 and OS/2 implement this model.

Dept of ISE, Dr.AIT Page 39


Many-to-Many Model:
 Many user level threads are multiplexed to smaller or equal number of kernel threads.
 Number of kernel threads can be specific to a particular application or a particular machine.
 This model allows greater concurrency but the user has to be careful not to create too many
threads within an application.
 If one kernel does a blocking system call the kernel can schedule another thread.
 A variation of this is sometimes referred to as two-level model.
Ex. Solaris 2, IRIX, HP-UX and Tru64

Multiprogramming concept: Process executed until it must wait typically for the completion of
I/O. This helps:
 To have some process running all the time.
 Increase CPU utilization.
Since in a uniprocessor system only one process can run at a time any other process must wait till is
scheduled. Hence CPU scheduling is a must for multiprogramming. Hence scheduling is
fundamental to the OS.
All resources including the primary resource namely the CPU must be scheduled.
To use time productively in multiprogramming: when a process has to wait the OS takes away the
CPU from that process and gives it to another process.
What is CPU-I/O burst cycle?
CPU burst: the CPU time required/used between two I/O bursts.
I/O burst: the CPU time required/used between two CPU bursts.
The following property is observed:
 Typically process execution consists of a cycle of CPU and I/O wait. Processes alternate
between these states.
 Process execution starts with a CPU burst, followed by an I/O burst. The last CPU burst ends
the execution.
 The durations of CPU bursts have been measured extensively. They seem to vary from
process to process and computer to computer. But they all tend to have the same curve which
is exponential.
 Typically I/O bound jobs have many short CPU bursts and CPU bound jobs have few long
CPU bursts.
 This property is used for building several scheduling algorithms.

CPU scheduler

Dept of ISE, Dr.AIT Page 40


When the CPU is idle the OS has to select a process from the ready queue to be executed. This
selection process is done by a scheduler. The records in the ready queue are generally PCBs of the
The ready queue can be implemented in different ways:
 Priority Queue
 Tree
 Simply an unordered linked list.
CPU scheduling decisions take place under four circumstances:
1. When a process switches from run state to wait state. Ex. I/O request.
2. When a process switches from run state to ready state. Ex. In case of interrupt
3. When a process switches from wait state to ready state. Ex. Say I/O complete.
4. When a process terminates.
In the condition 1 & 4 the scheduling scheme is called nonpreemptive.
Else it is called preemptive scheduling.
What is nonpreemptive (cooperative) scheduling?
In this form of scheduling once the CPU is allocated to a process the process keeps the CPU
until it either terminates or switches to the wait state.
 Does not need special h/w like timer.
Ex. Used in Microsoft Windows 3.1 and Apple Macintosh
What is preemptive scheduling?
In this form of scheduling once the CPU is allocated to a process the process does not keep the CPU
until it either terminates or switches to the wait state.
 Cost.
 Coordination is required to access shared data.
Note: Preemption has an affect on the design of the OS kernel. Say the kernel is in the midst of a
system call and the process gets pre-empted. Say next the kernel is suppose to read or modify the
same data structure it was using earlier. There is chaos. Some systems deal with this kind of a
situation by waiting for the system call to complete. This way the kernel structure can be kept
simple. But this kernel execution model is poor for supporting real time applications.
A dispatcher is a component of the CPU scheduling function. The dispatcher is the one that gives
control of the CPU to the process selected by the short-term scheduler. The dispatcher must be fast.
There are 3 functional parts in a dispatcher:
1. Switch context.
2. Switching to user mode.
3. Jumping to the proper location in the user program to restart that program.
Dispatch latency is the time it takes for the dispatcher to stop one process and start another.

What are criterion for scheduling?

1. CPU utilization: We need to keep the CPU busy: can range from 0 to 100%. In reality 40% for
lightly loaded system and 90% for heavily loaded system. This should be maximized as far as
2. Throughput: Since the CPU is executing it means that work is being done. A measure of work is
the number of processes completed per unit time which is the throughput.
3. Turnaround Time: This is from the point of a process. The interval of time from the time of
submission to the time of completion is the turnaround time.
Turnaround time = The sum of periods spent waiting to get into the memory, wait in the ready
queue, executing on the CPU and doing I/O.

Dept of ISE, Dr.AIT Page 41


4.Waiting Time: It is the sum of time a process spends waiting in the ready queue. The CPU
scheduling affects the amount of time a process spends waiting in the ready queue.
5. Response Time: Applicable for interactive system. It is the time from the submission of request
until the first response is produced. The amount of time it takes to start responding. This time should
be minimum.

Scheduling Algorithms:
1. FCFS:
 Simplest.
 Non-Preemptive.
 The process that requested the CPU first gets it.
 Easily implemented with a FIFO queue.
 The new process is inserted at the tail of the queue.
 The process at the head of the queue is given to the CPU when it is free.
 Problem with this algorithm is that the average waiting time is quite high.
 The average wait times in this algorithm varies substantially if the burst times vary greatly.
 This algorithm is a problem in time-sharing systems since a process holds on to the CPU
until termination or an I/O request.
Performance of FCFS algorithm in a dynamic situation: Convoy Effect:
Say there is one CPU bound process and many I/O bound process. When the CPU bound
process is holding the CPU all others will finish their I/O and move to ready queue. At this point I/O
devices are idle. Eventually the CPU bound process will finish the CPU burst and move to the I/O
device queue. At this point all the I/O bound processes which have a very short CPU burst times
quickly finish and move back to the I/O device queue. At this point the CPU is sitting idle. The CPU
bound process will eventually move back to ready queue and gets the CPU. Again I/O processes end
up waiting in the ready queue. This effect is called convoy effect.

Process Burst Time
P1 24ms
P2 3ms
P3 3ms

Gantt Chart
P1 P2 P3
0 24 27 30
Wait time for P1 = 0ms
Wait time for P2 = 24ms
Wait time for P3 = 27ms
Average wait time = ( 0 + 24 + 27)/3 = 17ms

SJF: Shortest Job First (Shortest next CPU burst, shortest remaining time):
 The length of the CPU burst time is associated with this algorithm.
 When the CPU is available then it is assigned to a process with the smallest next CPU burst.
 If two processes have same burst times FCFS is used.

Dept of ISE, Dr.AIT Page 42


 It is an optimal algorithm because it gives the minimum average waiting time for a set of
processes: By moving a short process ahead decreases its wait time, but increases the wait
time of a long process. Hence average wait time decreases.
 Problem:
o Real difficulty in knowing the length of the next CPU request. Can be suitable with
long-term scheduler in batch processing.
o Not suitable with short term scheduling even though optimal. There is no way to
know the length of the next CPU burst even though we can predict it.
Calculation of the approximate CPU burst of a process:
We can calculate an approximation of the next CPU burst.
The next CPU burst is generally predicted as an exponential average of the measure lengths of the
previous CPU bursts.
Let tn be the length of the nth CPU burst. Let n+1 be the predicted value for the next CPU burst.
n+1 = tn + (1 - ) n this is the formula for exponential average.
tn stores the most recent information
n stores the past history.  controls the relative weight of the recent and past history in the
where 0 <=  <= 1

 Can be suitable with long-term scheduler in batch processing.

 SJF may be preemptive or non-preemptive. Preemptive version is sometimes also called as
shortest remaining time first algorithm.

Ex1. (Non-Preemptive):
Process Burst Time (ms)
P1 6
P2 8
P3 7
P4 3

P4 P1 P3 P2

3 9 16 24

Average wait time = (3 + 16 + 9 +0)/4 = 7ms

Ex2. Preemptive
Process Arrival Time Burst Time
P1 0 8
P2 1 4
P3 2 9
P4 3 5
P1 P2 P4 P1 P3

1 5 10 17 26

Average wait time = [(10-1) + (1-1) + (17-2) + (5-3) ]/4 = 6.5ms

Dept of ISE, Dr.AIT Page 43


Priority Scheduling:
 SJF is a special case of general priority scheduling algorithm. The larger the CPU burst the
lower is the priority. The priority p is the inverse of the (predicted) next CPU burst.
 The process with the highest priority is allocated the CPU.
 Equal priority processes are scheduled as FCFS.
 Priorities are generally some fixed range of numbers like 0 to 7.
 There is no general agreement on whether 0 is the lowest or the highest priority. It varies
from system to system.
 Priorities can be defined internally or externally. Internal ones use some measurable
quantity/quantities (time limits, memory requirements, number of open files, ratio of average
I/O burst to average CPU burst) to computer the priority of a process. External priorities are
set by external criteria like importance of a process, type and amount paid for computer use,
political factors, dept sponsoring the work etc.
 They can be preemptive or non-preeemptive.
 In preemptive mode if the arriving process has higher priority than the running process then
it gets preeempted.
 In non-preemptive mode if the arriving process has higher priority than the running process
then the arriving process is put at the head of the ready queue.
 Disadvantage:
o Indefinite Blocking/Starvation: A process that is ready to run but lacks CPU is said to
be blocked. This algorithm can leave some low priority processes waiting indefinitely
for the CPU. In a heavily loaded situation a stream of high priority processes can
block low priority processes from ever getting the CPU.
o Solution: Aging: It is a technique of gradually increasing the processes that wait for a
long time.
Process Burst Time Priority Arrival Start Wait Finish TA
1 10 3 0 6 6 16 16
2 1 1 0 0 0 1 1
3 2 4 0 16 16 18 18
4 1 5 0 18 18 19 19
5 5 2 0 1 1 6 6

 Gantt chart:

average waiting time: (6+0+16+18+1)/5 = 8.2

 average turnaround time: (1+6+16+18+19)/5 = 12

Round Robin Algorithm (RR):

 Designed especially for timesharing systems.
 It is FCFS + Preemption.
 A small time unit called time quantum is defined generally ranging from 10 to 100ms.

Dept of ISE, Dr.AIT Page 44


 The ready queue is a circular queue and a FIFO queue of processes. The CPU scheduler goes
round the queue allocating the CPU to each process a 1 time quantum.
 The first process in the ready queue is dispatched with the timer on.
 If the process’s CPU burst is less than 1 time quantum then the process will itself give up the
CPU otherwise the timer will go off and an interrupt happens to do context switch. The old
process goes to the tail of the queue.
 The average wait time in RR is fairly long.
Process Burst Time
P1 24
P2 3
P3 3
Time quantum = 4ms
P1 P2 P3 P1 P1 P1 P1 P1
4 7 10 14 18 22 26 30
Average wait time = 5.66ms
If there are n processes in the ready queue and time quantum is q then each process gets 1/n of the
CPU time in chunks of at most q time units. Each process must wait no longer than (n – 1)q time
units until its next time quantum.
If there are 5 processes with time quantum of 20ms then each process will 20ms every 100ms.
Performance: Depends on the size of the quantum.
When time quantum is large it looks like FCFS. But if time quantum is small it resembles processor
sharing. It appears to the users as though there are n processes and each processor running at a speed
of 1/n speed of the real processor.
The time quantum must be large w.r.t the context switch.
Turnaround time depends on the size of the time quantum. In general the average turnaround time
does not necessarily improve as the time quantum increase.

Multi-Level Queue Scheduling:

 Used in situations where processes can be easily classified into different groups. Ex. Like
foreground or active processes or background or batch processes.
 These processes require different response time and scheduling needs.
 Ex. Priority for foreground is higher than background process.
 A multilevel queue scheduling algorithm partitions the ready queue into several separate
 The processes are permanently assigned to one queue based on some property like priority,
process type, memory size etc.
 Each queue has its own scheduling.
 There is scheduling among the queues, which is generally fixed priority preemptive
 Ex. Say five queues:
o System processes
o Interactive processes
o Interactive editing processes
o Batch processes
o Student processes
 One way is to provide priority to each queue. Each queue has absolute priority over lower
priority queues. Which means unless the high priority queue is empty the low priority will
not get selected.
 Another way is to time slice between queues.
Dept of ISE, Dr.AIT Page 45

Multi-Level Feedback Queue Scheduling:

o Problem with previous algorithm is that the processes are permanently assigned to a queue on
entry to the system.
o They cannot move between queues.
o Previous algorithm: The advantage here is that there is low scheduling overhead but problem
is that it is inflexible.
o This algorithm improves upon the previous one.
o Idea: separate the processes with different CPU burst characteristics. If a process uses too
much CPU time it is moved to a low priority queue.
o Hence the I/O bound and interactive processes are in high priority queue.
o Also aging is employed to prevent starvation.
o Ex.

Queue0 Quantum 8

Queue1 Quantum 16

Queue2 FCFS
o A process entering the ready queue is put in queue0. If it does not complete in 8ms it is put in
the tail of queue1. If queue0 is empty then the head of queue1 gets 16ms. If it does not
complete it is put in queue2. Processes in queue2 are run FCFS basis if queue0 and queue1
are empty.
o This algorithm gives highest priority to a process that requires a time quantum of 8ms or less.
Such processes finish their work and go for I/O. Processes that need more than 8ms but less
than 24ms are in the next lower queue. Long processes automatically sink to the lowest level
which is FCFS.
o The following parameters define the multilevel feedback queue scheduler:
o The number of queues.
o The scheduling algorithm for each queue.
o The method used to determine when to upgrade a process to a high priority queue.
o The method used to demote a process to lower priority queue.
o The method used to determine which queue a process will enter when the process
needs service.
It is the most general CPU scheduling algorithm. It can be configured to match a specific system
under design.

Multiple-Processor Scheduling:
o In case multiple processors the CPU scheduling is complex.
o There is no best algorithm as such.
Homogeneous system:
o When all processors are identical it is said to be a homogenous system. In terms of
functionality any processor can be used to run any process in the queue. Hence load sharing
can occur.
o If we provide a separate queue for each processor then when the queue is empty the CPU sits
idle. To prevent this we use a common ready queue.
o Two scheduling approaches are used:
o 1. Each processor is self-scheduling. Each processor examines the common ready
queue and selects a process to execute. This is called symmetric multiprocessing.
Dept of ISE, Dr.AIT Page 46

o Here we have to make sure that two processors do not select the same process and
that processes are not lost from the queue.
o 2. To avoid the above problem one processor is appointed as a scheduler for the other
processes thus creating a master slave structure. Some systems have all scheduling
decisions, I/O processing and other system activities to be handled by the master
o This is asymmetric multiprocessing.
o It is simpler than symmetric form.
o In heterogeneous systems where processors are different only programs compiled on a given
processor can run that process.

What is processor affinity?

If a process is running on a specific processor the data most recently accessed by the data sits in the
cache. This is populating the cache. Hence successive memory accesses by the process results in a
cache hit.
When the process migrates to another processor the cache must be invalidated and the cache on the
present processor must be repopulated.
The cost of invalidating and repopulating is very high.
Hence in SMP migration of processes is avoided and instead an attempt is made to keep the process
to run on the same processor. This is called processor affinity.
Processor affinity can take different forms.
Soft affinity: when an OS has a policy of attempting to keep a process running on the same processor
but not guaranteeing it. So migration is possible.
Hard affinity: here we can specify not to migrate to other processors.

Load Balancing:
 In SMP to get maximum performance it is important to keep workload balanced among all the
 Else if one or more processors sit idle the load on other processors increases. The wait queue on
highly loaded processors increases.
 Hence load balancing attempts to keep the workload evenly distributed on all processors.
 Load balancing is required when each processor has its own queue.
 When there is common queue from which processors pick the process no load balancing is
 There are two approaches to load balancing:
o Push migration: a specific task periodically checks the load on the processors. If it finds
an imbalance then it evenly distribute the load by moving from overloaded processor to
less loaded or a idle processor.
o Pull migration: An idle processor pulls a waiting task from a busy processor.
 Linux implements both methods.

Symmetric Multithreading:
 SMP allows multiple threads to run concurrently on multiple processors.
 Instead of physical processors logical processors are provided. This is called Symmetric
multithreading (SMT) or Hyperthreading technology.
SMT: Idea:
 Is to create multiple logical processors on the same physical processors presenting a view of
several logical processors to the OS.
 Each logical processor has its own architecture state, which indicates the general purpose and the
machine state registers.

Dept of ISE, Dr.AIT Page 47


 Each logical processor is responsible for its own interrupt handling.

 Each logical processor shares the resources of its physical processor like cache and buses.
 The SMT feature is provided in hardware and not software.

Thread scheduling:
 There are user level and kernel level threads.
 User level threads managed by thread library.
 To run on the CPU the user level threads have to be mapped onto kernel threads through may be

Dept of ISE, Dr.AIT Page 48



Dept of ISE, Dr.AIT Page 49



Cooperating Processes
 Affect or affected by other processes executing in the system.
 They may directly share a logical address space (through threads) or share data through files
or messages.
 The problem with sharing data is making data inconsistent.

Race Conditions
 In operating systems, processes that are working together share some common storage (main
memory, file etc.) that each process can read and write.
 When two or more processes are reading or writing some shared data and the final result
depends on who runs precisely when, are called race conditions.
 Concurrently executing threads that share data need to synchronize their operations and
processing in order to avoid race condition on shared data.
 Only one ‗customer‘ thread at a time should be allowed to examine and update the shared
 Race conditions are also possible in Operating Systems.
 If the ready queue is implemented as a linked list and if the ready queue is being manipulated
during the handling of an interrupt, then interrupts must be disabled to prevent another
interrupt before the first one completes.
 If interrupts are not disabled than the linked list could become corrupt.

1. count++ could be implemented as

register1 = count
register1 = register1 + 1
count = register1
2. count--could be implemented as
register2 = count
register2 = register2 – 1
count = register2
3. Consider this execution interleaving with ―count = 5‖ initially:
S0: producer execute register1 = count {register1 = 5}
S1: producer execute register1 = register1 + 1 {register1 = 6}
S2: consumer execute register2 = count {register2 = 5}
S3: consumer execute register2 =register2 -1 {register2 = 4}
S4: producer execute count = register1 {count = 6 }
S5: consumer execute count = register2 {count = 4}


 Consider a system consisting of n processes {P0, P1, ..., Pn-1 }.Every process has a segment of
code called critical section in which processes may be changing variables, updating a table,
writing a file etc.

Dept of ISE, Dr.AIT Page 50


 When one process is executing in its critical section no other process is allowed to execute in
its critical section.

Critical section problem-to design a protocol that processes can use in order to cooperate.

What is that protocol?

Let each process request permission to enter its critical section. The section of code implementing
this request is called entry section. The critical section is followed by code to exit called the exit
section. The remaining code is the remainder section.

General structure of a typical process:

do {
entry section
critical section
exit section
remainder section
} while(1);

A solution to the critical-section problem must satisfy the following three requirements:

1. Mutual exclusion. If process P; is executing in its critical section, then no other processes
can be executing in their critical sections.

2. Progress. If no process is executing in its critical section and some processes wish to enter
their critical sections, then only those processes that are not executing in their remainder
sections can participate in the decision on which will enter its critical section next, and this
selection cannot be postponed indefinitely.

3. Bounded waiting. There exists a bound, or limit, on the number of times that other processes
are allowed to enter their critical sections after a process has made a request to enter its
critical section and before that request is granted.

We assume that each process is executing at a nonzero speed. However, we can make no assumption
concerning the relative speed of the n processes.
At a given point in time, many kernel-mode processes may be active in the operating system. As a
result, the code implementing an operating system(kernel code) is subject to several possible race
 Consider as an example a kernel data structure that maintains a list of all open files in the
system. This list must be modified when a new file is opened or closed (adding the file to the
list or removing it from the list).
 If two processes were to open files simultaneously, the separate updates to this list could
result in a race condition.
 Other kernel data structures that are prone to possible race conditions include structures for
maintaining memory allocation, for maintaining process lists, and for interrupt handling. It is
up to kernel developers to ensure that theoperating system is free from such race conditions.

Dept of ISE, Dr.AIT Page 51


Two general approaches are used to handle critical sections in operating systems:

(1) preemptive kernels and

(2) nonpreemptive kernels.

 A preemptive kernel allows a process to be preempted while it is running in kernel mode.

 A nonpreemptive kernel does not allow a process running in kernel mode to be preempted; a
kernel-mode process will run until it exits kernel mode, blocks, or voluntarily yields control
of the CPU.
Obviously, a nonpreemptive kernel is essentially free from race conditions on kernel data structures,
as only one process is active in the kernel at a time. We cannot say the same about nonpreemptive
kernels, so they must be carefully designed to ensure that shared kernel data are free from race

Why, then, would anyone favor a preemptive kernel over a nonpreemptive one?
A preemptive kernel is more suitable for real-time programming, as it will allow a real-time process
to preempt a process currently running in the kernel. Furthermore, a preemptive kernel may be more
responsive, since there is less risk that a kernel-mode process will run for an arbitrarily long period
before relinquishing the processor to waiting processes.

Let the two processes be P0 and P1. Let i and j be used to denote other processes.
 Here the processes share a common variable turn, which is initialised to 0/1.
 If turn == i, then Pi is allowed to execute in its critical section.
Structure of Pi is as follows:
do {
while (turn != i);
critical section
turn = j;
remainder section
} while(1);
For process P0 For process P1
do { do {
while (turn != 0); while (turn != 1);
critical section critical section
turn = 1; turn = 0;
remainder section remainder section
} while(1); } while(1);

 This solution meets the mutual exclusion requirement because it allows only one process to
enter the CS at any given time.

Dept of ISE, Dr.AIT Page 52


 It does not meet the progress requirement because it requires strict alternation of processes in
execution of its CS. If Pj wants to enter CS then Pi must have executed and made turn to
become j. But if Pi does not want to execute then there is no way to make turn == j and hence
Pj will not enter CS.

 In Algo1 there is no way to retain sufficient information about the state of each process.
 The variable turn is replaced by an array Boolean flag[2];
 The elements of the array are initialised to false.
 If flag[i] is true then Pi is ready to enter critical section.
Structure of process Pi:
do {
flag [i] = true;
while (flag [j]);
critical section
flag [i] = false;
remainder section
} while(1);
For process P0 For process P1
do { do {
flag [0] = true; flag [1] = true;
while (flag [1]); while (flag [0]);
critical section critical section
flag [0] = false; flag [1] = false;
remainder section remainder section
} while(1); } while(1);

Algorithm works as follows:

 Process Pi sets flag[i] to true to indicate it is ready to enter its critical section.
 Pi checks to verify if process Pj is also not ready to enter CS.
 If Pj were ready then Pi waits until Pj indicates its finish by setting flag[j] == false.
 Else
o Pi enters CS. After exiting CS Pi sets flag[i] to false allowing the other process to
enter its CS.
This solution meets the mutual exclusion requirement because it allows only one process to enter the
CS at any given time.
It does not meet the progress requirement. This can be explained as follows:
 There is a possibility that both the following instructions gets executed simultaneously:
 Pi sets flag[i] to true and Pj sets the flag[j] to true. If this happens Pi and Pj will be
looping forever in their respective while statements.
 This algorithm is crucially dependent on the exact timing of the two processes.
Algorithm3:( Also called Peterson’s algorithm)
 Combines the ideas of algorithm 1 and 2.
Dept of ISE, Dr.AIT Page 53

 In this solution the processes share two variables:

o Boolean flag[2];
o int turn;
Structure of process Pi:
do {
flag [i] = true;
turn = j;
while (flag [j] && turn == j); /* Pj in CS*/
critical section
flag [i] = false;
remainder section
} while(1);

For process P0 For process P1

do { do {
flag [0] = true; flag [1] = true;
turn = 1; turn = 0;
while (flag [1] && turn == 1); while (flag [0] && turn == 0);
/* Pj in CS*/ /* Pj in CS*/
critical section critical section
flag [0] = false; flag [1] = false;
remainder section remainder section
} while(1); } while(1);

 Initially flag[0] = flag[1] = false and value of turn can be either 0 or 1.

 To enter CS Pi sets its flag flag[i] to true and then sets turn = j so as assert if Pj wants to enter
its critical section.
 Even if both Pi and Pj try to enter CS at the same time then turn will be set to i and j roughly
at the same time.
 Only one of the assignments will last since the other will get overwritten.
 The eventual value of turn will decide who is allowed to enter the CS.
This solution meets the mutual exclusion requirement because it allows only one process enters the
CS at any given time.
Pi enters the CS only if either flag[j] == false or turn ==i. Also if both processes enter CS then
flag[0] = flag[1] = true. These two observations imply that P0 and P1 could not have successfully
executed their while statements since the value of turn can be either 0 or 1 which means only one of
the processes enters the CS. The condition (flag [j] && turn == j) is true as long as Pj is in CS not
otherwise. Hence mutual exclusion is valid.

To prove the other two conditions:

Pi can be prevented from entering the CS if it is stuck in the while loop while (flag [j] && turn ==
j); But if Pj is not ready to enter then flag[j] is false then Pi can enter. Similarly the case of Pj. Once
Pi completes it sets its flag to false which means the Pj can now attempt to enter CS. Hence progress
and bounded waiting is taken care of.
Dept of ISE, Dr.AIT Page 54

 In general any solution to CS will require a lock mechanism. A process must acquire a lock
before entering the CS and release the lock when it exits the CS.
do {
acquire lock
critical section
release lock
remainder section
} while(true);

 In uni-processor systems critical section problem could be solved by not allowing the interrupts
to occur when a shared variable is being modified.
 This solution is not feasible in multiprocessor systems since interrupts can be time consuming as
message has to be passed to all processors. This decreases the efficiency of the system. Ex. Say
the clock is being updated by interrupts.
 Hence many machines provide hardware instructions.
 Many systems have simple hardware instruction that are effectively used to solve the critical
section problem.
 Ex. TestAndSet instruction, Swap instruction.
 These instructions have to be executed atomically without interrupting.
 Hardware solutions are more efficient than the software ones.

1. TestAndSet instruction:
 Executed atomically.
 A boolean variable lock is used with this instruction that is initialised to false.
 It is used to implement mutual exclusion.
 It is a function that returns a boolean value.
 Used in IBM/370.

Definition of TestAndSet instruction:

boolean TestAndSet(boolean &lock) {
boolean rv =lock;
lock= true;
return rv; }
boolean testandset(int i)
{ if (i==0)
i = 1;
return true;
return false; }

Working of TestAndSet (TSL):

 It reads the contents of the memory word lock and stores it into a register say rv.
Dept of ISE, Dr.AIT Page 55

 Then it stores a non-zero value in the location lock.

 The process of reading the lock and storing it is indivisible.
 A process enters CS by setting the lock to true by executing the TestAndSet instruction.
 To use TSL a shared variable lock is used. When lock is zero any process can set it to 1 using
TSL. When CS is over lock is set back to 0.

Mutual Exclusion implementation with TestAndSet:

do {
while(TestAndSet(lock)); //process fails to lock the entry hence retry.
critical section
lock = false;
remainder section
} while(1);

In the while loop:

Is the old value of lock 0? If non-zero then lock is already set, so goes back to beginning and tests it
again. Sooner or later when lock becomes 0 done by the process that came out of CS then the new
process comes with lock set and then comes out of while loop.
If a process Pi is currently performing a test-and-set, no other process may begin another test-
and-set until the first process is done. If at this point, Pj happens to issue a test-and-set instruction for
the same memory location, the system first checks its "internal note", recognizes the situation, and
issues a BUSY interrupt, which tells Pj that it must wait and retry.
This is an implementation of a busy waiting or spinlock using the interrupt mechanism.

Mutual Exclusion: When one process is in CS it sets its value of lock to true by executing the
TestAndSet instruction. Even if another process wants to enter the CS it will wait in the while loop
till the lock is set false by the process in the CS.

Progress:It is valid because if the second process does not want to enter CS now the first process can
still enter.

Bounded Waiting: When a process exits CS, the selection of the next P j to enter CS is arbitrary: it’s
a race! Hence does not satisfy bounded waiting.

2. Swap instruction:
 Swap instruction exchanges the contents of two memory locations.
 Swap instruction is also executed atomically.
 Used in Intel IA-32 and IA-64.
 Mutual exclusion is implemented using by declaring a global boolean variable lock which is
initialised to false.
 Each process has its own local boolean variable key.

Definition of Swap instruction: (Compare & Swap): CAS

void Swap(boolean *a, boolean *b) {

Dept of ISE, Dr.AIT Page 56


boolean temp = *a;

*a = *b;
*b = temp; }

Mutual Exclusion implemented using the Swap instruction:

do { key = true;
while (key == true)
Swap (&lock, &key);

// The value of the first compare operand is compared with the value of the second compare operand.
If they are unequal, the second compare operand is stored into the first compare operand's location.
If they are equal, the swap operand is stored in the second compare operand's location.

Critical section
lock = false;
Remainder section
} while(1);

A shared variable lock is initialised to 0. Each process uses a local variable key that is
initialised to 1. The only process that enters CS is the one that finds the lock to be 0. It eliminates all
other processes by making lock to be 1. When a process leaves CS it resets lock to 0 so that the next
process can access gain to CS.

Mutual exclusion: If lock is 0 no process is in its CS else if lock is 1 exactly one process is in CS.
Progress: also works since even if one process does not want to enter CS the other still can enter.

Disadvantages of these hardware instructions:

 Busy waiting is employed.
 Starvation possible.
 Deadlock is possible.
These two algorithms do not satisfy the bounded waiting requirements.
When Pi exits CS, the selection of the next Pj to enter CS is arbitrary: no bounded waiting (it’s a

Algorithm of TestAndSet Instruction that satisfies all the requirements of critical section.
The data structures used are boolean waiting[n]; and boolean lock; both are initialised to false.

do { waiting[i] = true;
key = true;
while (waiting[i] && key)
key = TestAndSet (lock);
waiting[i] = false;
Critical section
j = (i + 1) % n;
while ((j!=i) && !waiting[j])

Dept of ISE, Dr.AIT Page 57


j = (j + 1) % n;
if (j == i) lock = false;
waiting[j] = false;
Remainder section
} while(1);

Process Pi can enter CS only if either waiting[i] == false or key == false. The value of key
becomes false only if TestAndSet is executed. The first process to execute the TestAndSet
instruction will find key == false; and all others have to wait. Also waiting[i] becomes false only if
another process leaves its critical section. Only one waiting[i] is set to false which means mutual
exclusion is valid.
Since a process executing its CS either sets lock to false or waiting[j] to false this will
definitely allow a waiting process to enter its critical section. Hence Progress condition is verified.
The bounded waiting can be verified as follows: When a process leaves its CS it scans
through the array waiting in the cyclic order (i+1, i+2, …….n-1, 0, 1,…, i-1). The first entry in this
ordering whose waiting[j] == true will enter the critical section. Any process waiting to enter CS will
not have to wait for more than n-1 turns.
Hence all three requirements of critical section problem are verified. Hence this algorithm forms a


 The solution to CS discussed so far (i.e. TestandSet and Swap) are not easy to generalize to
complex problems. Also they are complicated for the application programmers to use.
 Hence a synchronization tool called semaphore is used.
 Semaphore are high-level constructs used to synchronize concurrent process.
 Given first by Dijkstra.

A semaphore S is an integer variable that apart from initialisation is accessed through only
two standard atomic operations wait( ) and signal( ).
(Note: wait( ) and signal ( ) was originally termed P and V respectively to represent some Dutch
word meaning test and increment).

Definition of wait:
wait (S) {
while (S<=0) ; // no op
S--; }

Definition of signal:
signal(S) {
S++; }
 Modifications to the integer value of the semaphore must be done indivisibly.
 I.e. no two processes can modify the semaphore simultaneously.
 Hence in wait the two steps: while (S<=0) ; S--; must be done indivisibly.

Dept of ISE, Dr.AIT Page 58


Mutual exclusion implementation with semaphores:

do {
The semaphore can be used to deal with n-process CS problem. The n processes share a
semaphore called mutex (mutual exclusion) initialised to 1.
Disadvantage of semaphores (i.e. in locks)is that of busy waiting.

When a process is in its CS any other process trying to enter CS will loop continuously in the
entry code. This looping is a problem in multiprogramming environment where a single CPU is
shared among several processes. Busy wait wastes CPU cycles. This type of semaphore is called
In busy waiting a process spins continuously waiting for a lock. This type of semaphore is
called spinlock.
Disadvantage? Busy wait wastes CPU cycles.
Advantage? No context switch is required when a process must wait on a lock. Context switch takes
time and is expensive. Hence when locks are to be held for a short time spinlocks are useful.
They are often employed in a multiprocessor system where one thread can spin on one processor
while another thread performs its critical section on another processor.

Modifications done to avoid busy waiting

 Modify wait and signal semaphore.
 Each semaphore has a queue associated with it where processes are blocked on that
semaphore wait.
 When a process executes wait and finds S to be <=0 then it must wait. Instead of busy
waiting, the process blocks itself and gets into the wait queue of that semaphore and changes
its state to wait. Control goes to the CPU scheduler to select another process.
 When another process executes a signal, then the blocked process is restarted by a wakeup
operation which changes the state from wait to ready and places it in the ready queue.
 The wait list is implemented as link field in each PCB. FIFO queue can be used.

New definition is as follows:

typedef struct {
int value;
struct process *L;
} semaphore;
// Each semaphore has a int value and a list of processes.

New definition for wait and signal:

void wait (semaphore S) {
if(S.value <= 0) {
Dept of ISE, Dr.AIT Page 59

//add this process to SL;

block(); // suspends process.

void signal (semaphore S) {

S.value ++;
if(S.value <=0) {
//remove process P from SL.
wakeup(P); // resumes process.
} }
The type of semaphore used here is called as counting semaphore because they can take an
integer value range over an unrestricted domain. Sometimes also called as general semaphore or
resource counting semaphore.
Counting semaphores are primarily used for synchronizing access to a shared resource by
several concurrent processes.
Note: in this implementation of semaphore it can have a negative value.
Note: semaphores can be implemented using s/w or h/w. Wait and signal can be implemented at
kernel level, user-level or hybrid.
How to guarantee that no two processes execute wait and signal operations on the same semaphore
at the same time?
In uni-processor environment:
 Simply inhibit interrupts during wait and signal operations. Only currently running process
executes until interrupts are re-enabled.
In multi-processor environment:
 Here interrupts does not work. Instructions may be interleaved in arbitrary way.
 If the h/w does not provide any special instruction then any correct s/w solution can be

Deadlocks & Starvation:

 The semaphore implementation with wait queue can result in two or more processes waiting
indefinitely for an event that can be caused by another waiting process.
 When such a state is reached then the processes have entered a deadlock.
Ex: Two processes P0 and P1 each accessing semaphore S and Q set to 1.

P0 P1
wait(S); wait(Q);
wait(Q); wait(S);

Suppose P0 executes wait(S) and then P1 executes wait(Q). Next when P0 executes wait(Q) it must
wait until P1 executes signal(Q).
Similarly when P1 executes wait(S) it must wait until P0 executes signal(S). Since these signal
operations cannot happen P0 and P1 are deadlocked.

Dept of ISE, Dr.AIT Page 60


Deadlock-We say a set processes are deadlocked if every process is waiting on an event that can be
caused by only by another process (also waiting) in the set.

Problem with deadlock is indefinite blocking or starvation. Indefinite blocking may occur if
we add and remove processes from the list associated with the semaphore in the LIFO order.

Binary Semaphores:
A binary semaphore is one whose integer value is restricted a range between 0 and 1.
Advantage: Simpler to implement than counting semaphore depending on the underlying h/w.
Drawbacks of semaphores:
 A process that uses a semaphore has to know which other processes also uses the semaphore.
It may also have to know how they are using the semaphore.
 The semaphore operation must be carefully put inside a process. The omission of a P or V
can result in inconsistencies.
 Programs using semaphores can be extremely hard to verify for correctness.


1. The Bounded Buffer Problem:
 Let there be a pool consisting of n buffers each can hold one item.
 Let a mutex semaphore provide mutual exclusion for access to buffer pool initialised to 1.
 Let the empty (initialised to n) and full (initialised to 0) semaphores count the number of
empty and full buffers respectively.
 In the code there is symmetry between producer and consumer. The producer producing full
buffers for consumer and consumer producing empty buffers for the producer.

The structure of producer process:

do {
produce an item in nextp
add nextp to buffer

The structure of consumer process:

do {

Dept of ISE, Dr.AIT Page 61


remove an item from buffer to nextc

consume the item in nextc

2. The Readers-Writers Problem:

 A data object like a file or a record can be shared between several concurrent processes.
 Some want to read and some update.
 Readers: those who are interested only in reading.
 Writers: those who are only interested in writing.
 If two or more readers access the same data simultaneously there is no adverse effect. But not
if they are writers.
 To ensure inconsistency in data does not arise in this situation synchronization is done.
 This is referred to as reader-writers problem.

The readers-writers problem has several variations:

1. First-readers-writers problem: simple. No readers will be kept waiting unless a writer has
already obtained permission to use the shared object. No reader should wait for other readers
even though a writer is waiting.
2. Second-readers-writers problem: once the writer is ready it performs the write as soon as
possible. If a writer is waiting no new readers may start reading.

Solution to the above two problems result in starvation. In the first the writers are starved and in the
second the readers are starved.

Solution to the first-readers-writers problem:

The reader processes share the following data structures:
semaphore mutex, wrt;
//both initialised to 1 and wrt is common to writers also. And wrt is a mutual
exclusion semaphore for writers.
//mutex is used to ensure mutual exclusion when readcount is updated.
int readcount;
//initialised to 0
//keeps track of how many processes are currently reading the object.

Structure of writer process:

wait(wrt); //get exclusive access
Writing is performed

signal(wrt); //release exclusive access

Dept of ISE, Dr.AIT Page 62


Structure of reader process:

wait(mutex); //get exclusive access to readcount
readcount++; //one more reader
if (readcount==1) // this is first reader
wait(wrt); //lock for reading
signal (mutex); //release exclusive access for readcount
reading is performed
wait(mutex); //get exclusive access to readcount
readcount--; // reader count less by one
if (readcount==0) //last reader
signal (wrt); // unlock for write
signal (mutex); //release exclusive access to readcount

If a writer is in CS then one reader is queued on wrt and the remaining n-1 are queued on mutex.

3. The Dining Philosophers Problem:

 There are five philosophers who spend their lives thinking and eating.
 They share a common circular table and is surrounded by five chairs.
 In the centre is a bowl of rice. The table is laid with five single chopsticks.
 While thinking there is no interaction between the philosophers.
 From time to time when the philosopher is hungry he/she tries to pick up the chopsticks that
are closest.
 Philosopher can pick up only one chopstick at a time and not take a chopstick that is being
used by another.
 Two chopsticks are used for eating and they are not released till the end of eating.
 After eating both chopsticks are put down and the philosopher begins thinking again.
 It is a classic example of synchronization problem.

 Represent the chopstick by a semaphore.
 A philosopher grabs the chopstick by executing the wait operation and releases the chopstick
by executing the signal operation.
 The shared data is represented as follows:

Semaphore chopstick[5]; //initialise all to 1.

Dept of ISE, Dr.AIT Page 63


Structure of philosopher i:
do {

signal(chopstick[i + 1] % 5);

 This solution guarantees that no two neighbours are eating simultaneously.

 Problem with this algorithm is that there is a possibility of deadlock.
 Suppose all five philosophers become hungry at the same time then all of them pick up their
left chopstick.
 When they next try to grab their right chopstick they will be delayed forever creating a
 Remedies for the deadlock situation:
o Allow at most 4 philosophers to be sitting at the table.
o Allow the philosophers to pick up the chopstick only if both are available.
o Use asymmetric solution: odd philosopher picks up first left chopstick and then the
right. Whereas the even philosopher picks the right first and then the left.

 Though semaphores are a convenient and effective mechanism for process synchronization,
using them incorrectly can result in timing errors that are difficult to detect.
 These errors happen only if some particular execution sequence takes place and these sequences
happen very rarely.
 Also when semaphores are used incorrectly there may be deadlocks in the system.
 To deal with such situations researchers have developed monitor type.
 A monitor is a high-level synchronization construct.
 It is characterized by set of user-defined operations.
 The representation of monitor types is as follows:
monitor monitor-name
shared variables declaration
procedure body P1 (….) {
Dept of ISE, Dr.AIT Page 64

procedure body P2 (….) {

procedure body Pn (….) {
initialisation code
Schematic View of a monitor:

 The monitor construct ensures that only one process at a time can be active within a monitor.
 Additional synchronization mechanisms are provided by the conditional construct.
 A programmer who needs to define his own synchronization scheme can define one or more
variables of type condition:
 condition x, y; only wait and signal can be invoked on these variables. I.e. x.wait( ); and
x.signal( );
 x.wait( ); process invoking this suspended until someone does a signal.
 x.signal( ); resumes exactly one suspended process.

Deadlock free dining philosophers problem using monitor:

Assume that: philosopher is allowed to pickup chopstick only if both are available;
States of the philosopher: thinking, eating and hungry:
enum{thinking, hungry, eating} state[5];
Set to state of eating only if: i.e. state[i] = eating if the two neighbours are not eating. I.e. (state[(i+4)
%5] != eating) (state[(i+1) %5] != eating)
Condition to be declared: condition self[5]; where philosopher i delays when hungry and unable to
obtain both the chopstick.
Chopsticks: the distribution of the chopsticks is controlled by the monitor dp i.e.

Dept of ISE, Dr.AIT Page 65



monitor dp
enum{thinking, hungry, eating} state[5];
condition self[5];
void pickup(int i) {
state[i] = hungry;
if(state(i) != eating)
self[i].wait( );
void putdown(int i) {
state[i] = thinking;
test((i+4) % 5);
test((i+1) % 5);
void test(int i) {
if ((state[(i+4) %5]!= eating ) && (state[i] == hungry) && (state[(i + 1)%5]
!=eating)) {
state[i] = eating;
void init( ) {
for(int i=0; i<5; i++)
state[i] = thinking;
Implementing a Monitor Using Semaphores
 We now consider a possible implementation of the monitor mechanism using semaphores.
For each monitor, a semaphore mut ex (initialized to 1) is provided.
 A process must execute wait (mutex) before entering the monitor and must execute signal
(mutex) after leaving the monitor.
 Since a signaling process must wait until the resumed process either leaves or waits, an
additional semaphore, next, is introduced, initialized to 0, on which the signaling processes
may suspend themselves.
 An integer variable next-count is also provided to count the number of processes suspended
on next. Thus, each external procedure F is replaced by
body of F
if (next_count > 0)

Dept of ISE, Dr.AIT Page 66


Mutual exclusion within a monitor is ensured.
We can now describe how condition variables are implemented. For each condition x, we introduce
a semaphore x_sem and an integer variable x_count, both initialized to 0. The operation x. wait ()
can now be implemented as
if (next_count > 0)
The operation x. signal () can be implemented as
if (x_count > 0) {
wait(next) ;
next_count—; }
This implementation is applicable to the definitions of monitors given by both Hoare and Brinch-
Hansen. In some cases, however, the generality of the implementation is unnecessary, and a
significant improvement in efficiency is possible.

Resuming Processes Within a Monitor

We turn now to the subject of process-resumption order within a monitor. If several processes are
suspended on condition x, and an x.signal () operation is executed by some process, then how do we
determine which of the suspended processes should be resumed next? One simple solution is to use
FCFS ordering, so that the process waiting the longest is resumed first. In many circumstances,
however, such a simple scheduling scheme is not adequate. For this purpose, the conditional-wait
construct can be used; it has the form
monitor ResourceAllocator
boolean busy;
condition x;
void acquire(int time) {
if (busy)
busy = TRUE; }
void release() {
busy = FALSE;
x.signal(); }
busy = FALSE;
Dept of ISE, Dr.AIT Page 67

where c is an integer expression that is evaluated when the wait () operation is executed. The value
of c, which is called a priority number, is then stored with the name of the process that is
suspended. When x. signal () is executed, the process with the smallest associated priority number is
resumed next.

To illustrate this new mechanism, we consider the ResourceAllocator monitor shown in

above Figure , which controls the allocation of a single resource among competing processes. Each
process, when requesting an allocation of this resource, specifies the maximum time it plans to use
the resource. The monitor allocates the resource to the process that has the shortest time allocation
request. A process that needs to access the resource in question must observe the following
access the resource;
R. release() ;
where R is an instance of type ResourceAllocator.
Unfortunately, the monitor concept cannot guarantee that the preceding access sequence will be
observed. In particular, the following problems canoccur:
• A process might access a resource without first gaining access permission to the resource.
• A process might never release a resource once it has been granted access to the resource.
• A process might attempt to release a resource that it never requestecj.
• A process might request the same resource twice (without first releasing the resource).

The same difficulties are encountered with the use of semaphores, and these difficulties are similar
in nature to those that encouraged us to develop the monitor constructs in the first place.
 One possible solution to the current problem is to include the resourceaccess operations
within the ResourceAllocator monitor.
 However, using this solution will mean that scheduling is done according to the built-in
monitor-scheduling algorithm rather than the one we have coded. To ensure that the
processes observe the appropriate sequences, we must inspect all the programs that make use
of the ResourceAllocator monitor and its managed resource.
 We must check two conditions to establish the correctness of this system.
 First, user processes must always make their calls on the monitor in a correct sequence.

Dept of ISE, Dr.AIT Page 68


 Second, we must be sure that an uncooperative process does not simply ignore the mutual-
exclusion gateway provided by the monitor and try to access the shared resource directly,
without using the access protocols.
 Only if these two conditions can be ensured can we guarantee that no time-dependent errors
will occur and that the scheduling algorithm will not be defeated. Although this inspection
may be possible for a small, static system, it is not reasonable for a large system or a dynamic


When processes request a resource and if the resources are not available at that time the process enters
into waiting state. Waiting process may not change its state because the resources they are requested are
by other process. This situation is called deadlock.
• The situation where the process waiting for the resource i.e., not available is called deadlock.

 Resources in the system are finite: several processes compete. So need for distribution.
 Processes request for resources and when not available they wait.
 When the requested resource is held by another waiting processes the resulting situation is
called deadlock.
 Resources:
o Finite.
o Partitioned to different types.
o Can be physical resources like printers, tape drives, CPU, memory space or logical
resources like files, semaphores and monitors.
o There can be any number of instances of a resource type.
o Ex. If there are two printers then there are two instances of the type printer.
o Each type may have several identical instances.
o If allocation of any instance of the resource will satisfy a process when it requests
then resources instances are identical else not identical.
o Deadlocks can involve different resource types.

 Rule of allocation sequence:

o Process must request for a resource, use and then release it.
 Request:
 Process cannot request more than the total number of available
resources in the system.
 If request cannot be granted immediately then the process must wait
until it can acquire one.
 There are system calls used for request. Ex. request device, open file,
allocate memory.

Dept of ISE, Dr.AIT Page 69


 Synchronization if required can be achieved through wait and signal

on semaphores.
 If a process request for a resource currently used by another process
then it waits in the queue of processes waiting for that resource.
 Use:
 The process operates on the resource.
 Release:
 The process releases the resource after use.
 There are system calls for release. Ex. release device, close file, free
 Can be released through wait and signal on semaphores.

 A system table records whether each resource is free or allocated and to which process.

Deadlock-A set of processes is in a deadlock state when every process in the set is waiting for an
event that can be caused only by another process in the set.
Event= here is resource acquisition & release.

Consequence of deadlocks
 Processes never finish execution.
 System resources are tied up.
 New jobs not allowed to start.
 Performance degradation.

For a deadlock to hold good the following four conditions must simultaneously hold good in
a system:
1. Mutual Exclusion:
 There must be at least one resource that must be held in a non-sharable mode.
 Allow one process to use the resource at a time.
 If another process requests a resource when in use the requesting process must wait until
2. Hold & Wait:
 A process must be waiting with at least one resource and waiting to acquire additional
resources that are currently being held by other processes.
3. No Preemption:
 Resources cannot be pre-empted. Resources have to be given up voluntarily after completing
the task.
4. Circular Wait:
 A set of processes {P0, P1, P2, … Pn} of waiting processes must exist such that P0 is waiting
on a resource that is held by P1, P1 waiting on a resource held by P2, and Pn-1 is waiting on
a resource held by Pn and Pn is waiting on a resource held by P0.
Resource Allocation Graph:
 They are directed graphs used to describe the deadlocks precisely.
 It consists of set of vertices V and set of edges E.
Dept of ISE, Dr.AIT Page 70

 Vertices are of two types of nodes:

o P = {P1, P2, P3,…Pn} set consisting of all active processes in the system.
o R = {R1, R2, R3, …Rn} set consisting of all resource types in the system.
 A directed edge is of two types:
o Request Edge: The directed edge from process Pi to resource type Rj is denoted as Pi
-> Rj. This implies that process Pi requested an instance of resource type Rj.
o Assignment Edge: A directed edge from resource Rj to process Pi is denoted as Rj ->
Pi. This implies that instance of resource type Rj is allocated to Pi.
 Processes are represented by circles.
 Resources are represented by square/rectangle. Each instance of the resource is represented
by a dot in the square.
 Request edge only points to the square. Assignment edge must point (start from) to one of the
dots in the square to designate the instance of the resource.
When a process Pi makes a request for a resource a request edge is inserted in the resource allocation
graph. When the request is met the edge is converted to an assignment edge. When the resource is
not required it is released.

Ex. of resource allocation graph:

R1 R3

P1 P2 P3


The graph show the following items:

P= { P1, P2, P3}
R = {R1, R2, R3,R4}
E = {P1->R1, P2 -> R3, R1 -> P2, R2 -> P2, R2-> P1, R3-> P3
Resource instances:
R1: one
R2: two
R3: one
R4: three
Process states:
 Process P1 is holding an instance of resource type P1 and is waiting for an instance of
resource type R1.
 Process P2 is holding an instance of resource R1 and R2 and is waiting for an instance of
resource type R3.

Dept of ISE, Dr.AIT Page 71


 Process P3 is holding an instance of R3.

If the resource allocation graph has no cycles then there is no deadlock in the system.
But if there is a cycle then there may be a deadlock.

 If each resource has only one instance then a cycle implies that there is a deadlock. All
processes in the cycle are deadlocked. In this case a cycle is a necessary and sufficient
condition to prove a deadlock.
 If each resource has more than one instance then a cycle in the graph does not necessarily
imply a deadlock. Here the cycle is necessary but not sufficient condition to imply deadlock.
Ex1. Consider the above resource allocation graph:
R1 R3

P1 P2 P3



The graph show the following items:

Let P3 request an instance of resource R2. A request edge from P3 to R2 is added. ********
Now two cycles are found: P1-> R1->P2->R3-> P3-> R2-> P1 and P2-> R3-> P3-> R2->P2. Hence
processes P1, P2 and P3 are deadlocked.
Ex2. R1




In this resource allocation graph there is a cycle P1-> R1-> P3-> R2-> P1. But this is not deadlock.
When P4 releases R2 then the resource can be allocated to P3 breaking the cycle.


Three general methods are used:
1. Avoid or prevent deadlocks thus ensuring the system never enters deadlock.

Dept of ISE, Dr.AIT Page 72


 Deadlock prevention is a set of methods for ensuring that at least one of the necessary
conditions cannot hold.
 These methods prevent deadlock by constraining how requests for resources can be made.
 In deadlock avoidance it requires that the OS be given advance additional information
concerning which resources a process will request and used during its lifetime.
 With this information the OS can decide whether or not a process should wait.
 To decide whether a request can be serviced or must wait till the following considerations are
o Resources currently available.
o Resources currently allocated.
o Future requests and releases of each process.

2. Allow the system to enter deadlock, detect and recover from it.
 An algorithm examines the state of the system to determine/detect whether a deadlock
has occurred.
 Then an algorithm to recover from the deadlock is run.

3. Ignore the problem and pretend that deadlocks never occur in the system.
 Does not ensure that deadlock does not occur.
 No mechanism for detection and recovery.
 No way of recognizing a deadlock even if it occurs.
 It can result in the deterioration of performance because resources are being held up.
 Eventually the system will stop functioning and has to be manually restarted.
Not a viable approach yet used in many OS. Ex. Unix.

Ensure that at least one of the four necessary conditions does not hold good. Thus can prevent
deadlock from happening.

Mutual Exclusion:
Resources can be:
 Sharable:
o Ex. read-only files.
o Do not require mutual exclusiveness.
o The simultaneous access is always granted. Not need for wait.
 Non-Sharable:
o Ex. Printer.
o Cannot be shared simultaneously be several processes.
o Since intrinsically non-sharable mutual exclusion happens.
o Else do it implicitly.

Hold & Wait:

To ensure that hold and wait never occurs in the system:
 When a process requests for a resource it must not be holding any other resources.

Dept of ISE, Dr.AIT Page 73


 Two protocols are used here:

1. Each process to request and be allocated all its resources before it begins its
Here request system calls precede all other system calls.
Disadvantage: Here the resource utilization is low. Since the resources are allocated
and not used for a long time.
2. A process can request some resources and use them. Before an additional request is
made it must release all the resources that is currently held by it. Disadvantage: Here
starvation is possible. A process may have to wait for a long time since some resource
that is needs may be allocated to others.

No Preemption:
Since there can be no pre-emption of resources already allocated we have to make sure that
this condition does not hold good.

1. If a process is holding some resource and requests for other resources that cannot be granted
immediately then the currently held resources are pre-empted: implicitly released.
 Released resources are added back to the list of resources.
 The pre-empted process can be later restarted and old and new resources can be allocated.

2. If a process requests some resources first check if they are available.

 If yes allocate them.
 Else see if those resources are allocated to some other process that is also waiting for
additional resources. If so preempt the resources from the other process and allocate them to
the requesting process.
 Else if not available or not with another waiting process then the requesting process has to
wait. While waiting some of its resources may be pre-empted for giving to other requesting
 This protocol can be applied to resources whose state can be easily saved and restored later.

Circular Wait:
 One way of ensuring that this does not happen is through total ordering of resource type and
each process request only in the increasing order of enumeration.
 Let R = {R1, R2, R3…., Rm} be set of resource type.
 We assign a unique integer number to each resource type so that we can compare two
resources and determine whether one precedes another.
 Let F: R -> N be a one way function where N is a set of natural numbers. Ex. F(tape drive) =
1, F(Printer) = 12 etc.

 Each process can only request in the increasing order of enumeration.
 Initially a process can request any instances of resource Ri. A single request for all of them
have to be made.

Dept of ISE, Dr.AIT Page 74


 If the next request is for resource Rj then it may be granted only if available and if F(Rj) >
F(Ri). Alternatively when a process requests an instance of Rj then it must have released any
resource Ri such that F(Ri >= Rj).
 Prevention of deadlock is made by restraining how the requests are made.
 This restrain makes sure that at least one of the four conditions for deadlock cannot occur.
 Side effects of preventing deadlocks is that there is low device utilization and hence reduced
system throughput.


 Needs additional information.
 Each request requires that the system consider:
o The resources currently available.
o Resources currently allocated to each process.
o The future requests and releases of each process.
 We use algorithms that differ in the amount and type of information required.
 The model used in these algorithms is that each process declare the maximum number of
each type of resources required.
 Knowing this a priori it is possible to construct an algorithm that ensures that the system will
never enter a deadlock.

A deadlock avoidance algorithm dynamically examines the resource allocation state to

ensure that a circular wait condition never exists.

Resource allocation state is defined as the number of available and allocated resources and
the maximum demands of the process.

A state is safe if the system can allocate resources to each process (!> maximum) in some
order and still avoid deadlock. A system is in a safe state only if there exists a safe sequence.

safe sequence- A sequence of processes <P1, P2, P3,….Pn> is a safe sequence for the current
allocation state if for each Pi the resource that Pi can still request can be satisfied by the currently
available resources plus resources held by all Pj with j<i.
 If resources that Pi needs are currently not available then Pi waits until all Pj have finished.
When Pi uses the resources and later releases them then Pi+1 can use them.

If no safe state sequence exists then it is unsafe state in the system.

Relationship between safe state, unsafe state and deadlock:



Dept of ISE, Dr.AIT Page 75


Safe state is not a deadlock. A deadlock state is an unsafe state. Not all unsafe states are deadlocks.
System can go from safe state to unsafe state.
Let a system have 12 magnetic tapes and 3 processes P0, P1 and P2. P0 requires 10 tape
drives, P1 may need as many as 4, and P2 may need up to 9.
Let at time t0 P0 is holding 5 tapes, P1 is holding 2 and P2 is holding 2 tape drives. Hence
there are 3 free tapes.

Process Max Need Current

P0 10 5
P1 4 2
P2 9 2 (+ 1 =3 ) (available = 2)
At t0 the system is safe and the safety sequence is <P1, P0, P2>
At t1 if P2 requests and is allocated one more tape drive then system is no longer safe.
At this point only P1 can be allocated all its tapes. When it returns them the system will have only
four left. Since P0 is allocated 5 and needs 5 more P0 has to wait. P2 may request the additional 6
and has to wait. This results in a deadlock. Hence we need avoidance algorithms to make sure that
the deadlocks do not happen.

Resource Allocation Graph Algorithm:

 Consider resource allocation graph with only one instance of each resource type.
 Resources must be claimed a priori.
 There are three edges namely assignment edge, request edge and claim edge.
 A claim edge Pi -> Rj implies that Pi may request a resource Rj some time in future. This
edge resembles a request edge direction but the line is dashed line.
 When a process Pi starts executing all the edges in the resource allocation graph will have
only claim edges.
 When a process requests resource Rj then the claim edge is converted to request edge.
Similarly if process Pi releases resource Rj then the assignment edge can be converted to
claim edge.
 When a process Pi requests a resource Rj the request is granted only if converting the request
edge Pi -> Rj to an assignment edge Rj -> Pi does not result in a cycle in the graph.
 Safety is checked by a cycle detection algorithm. For detecting a cycle the complexity is of
O(n2), where n is the number of processes in the system.
 If no cycle then the system is safe else if there is cycle then allocation may put the system
into unsafe state. Then Pi will have to wait.
Ex. R1


Consider the above fig:

Dept of ISE, Dr.AIT Page 76


 Suppose P2 requests R2 although it is free it cannot be allocated since this action will create
a cycle.
 This is an unsafe state. P1 requests R2 and P2 requests R1 will create a deadlock.
Ex. R1


 Not applicable in a resource allocation system with multiple instances.

Banker’s Algorithm:
 Applicable for multiple instances of each resource type.
 Name given because it could be used in a banking system to ensure that the bank never
allocates available cash such that it can no longer satisfy the needs of the customer.

Requirement of the Algorithm:

 When a new process enters the system it must declare the max number of instances of each
resource type that it may need.
 This number cannot be greater than the total number of resources in the system.
 When a request comes the system determines whether the allocation will leave the system in
the safe state.
 If yes the resources are allocated else must wait till some other process releases the resources.
 Several data structures are maintained to run this algorithm.
Let n be the number of processes and m be the number of resource types in the system.
Data structures used:

Available: a vector of length m indicates the number of available resources types.

If available[j] = k it implies that there are k instances of resource type Rj available.

Max: an n x m matrix. Defines the maximum demand of each process.

If Max[i,j] = k then process Pi may request at most k instances of resource type Rj.

Allocation: a n x m matrix. Defines the number of resources of each type currently allocated to each
If Allocation[i,j] = k then process Pi is currently allocated k instances of resource type Rj.

Need: an n x m matrix indicates the remaining resource need for each process.
If Need [i,j] = k then process Pi may need k instances of resource type Rj to complete task.
Need [i,j] = Max[i,j]- Allocation[i,j]

 Less efficient than the resource allocation graph algorithm.

Dept of ISE, Dr.AIT Page 77


Safety Algorithm:
 This algorithm is used to find whether a system is in a safe state.
1. Let Work and Finish be vectors of length m and n respectively.
Work := Available
Finish[i]:= false for i= 1,2,3,…n
2. Find i such that both
a. Finish[i]= false
b. Needi <= Work
If no such i exists go to step 4.
3. Work := Work + Allocation;
Finish[i] := true;
Go to step 2.
4. If Finish[i] = true for all i then the system is in safe state.
This algorithm requires m x n2 operations.

Resource-Request Algorithm: this algorithm determines if requests can be safely granted.

Let Requesti be the request vector for process Pi.
If Requesti[j] = k then process Pi wants k instances of resource type Rj.
When a process Pi makes a request for a resource the following actions are taken.
1. If Requesti <= Needi go to step2.
Else raise an error condition since the process has exceeded the maximum claim.
2. If Requesti <= Available go to step 3.
Else Pi must wait since the resources are not available.
3. Available:= Available - Requesti
Allocationi:= Allocationi + Requesti
Needi := Needi - Requesti
If the resulting resource allocation state is safe the transaction is completed and Pi is allocated
resources. Else if unsafe then Pi must wait

Problems( refer class notes)

Deadlock Detection Strategies:

Does two things:
1. An algorithm that examines the state of the system to determine whether a deadlock has
2. An algorithm to recover from deadlock.

Single instance of each resource type:

 Here all resource types have a single instance.
 A variant of resource allocation graph is used. It is called wait-for graph.
 Take the resource allocation graph. Remove the nodes of type resource. Collapse the edges.
The resultant graph is the wait-for graph.

Dept of ISE, Dr.AIT Page 78


Ex. P5

R1 R3 R4 P1 P2 P3

P P3

R2 P4 R5

Resource Allocation Graph Wait-For Graph

In a wait-for graph the edge Pi -> Pj implies that process Pi is waiting for a resource to released by
Pj. This edge is two edges in the resource allocation graph namely: Pi -> Rq, Rq -> Pj for some
resource Rq.

When is it a deadlock?
If a cycle exists in a wait-for graph then a deadlock exists.

Maintain the wait-for graph and periodically invoke an algorithm that searches for a cycle in the
Here cycle detection is O(n2) where n is the number of vertices in the graph.

Several instances of each resource type:

 Cannot use a wait-for graph for multiple instances of resource type.
 A different algorithm is used.
 The data structures used are:
 Available: A vector of length m indicating the number of available resources of each type.
 Allocation: An n x m matrix, which defines the number of resources of each type currently
allocated to each process.
 Request: An n x m matrix indicates the current request of each process. If Request[i,j] = k
then process Pi is requesting k more instances of resource type Rj.
1. Let Work and Finish be vectors of length m and n respectively.
Work := Available
For i = 1, 2, …n if Allocationi != 0 then
Finish[i]:= false else Finish[i]:= true.
2. Find an index i such that both
a. Finish[i]= false
b. Requesti <= Work
If no such i exists go to step 4.
3. Work := Work + Allocation;
Finish[i] := true;
Go to step 2.
Dept of ISE, Dr.AIT Page 79

4. If Finish[i] = false for some i 1<= i <= n, then the system is in deadlock.
If Finish[i] = false then process Pi is deadlocked.
This algorithm has complexity of O(m x n2).

Process Allocation Request Available
P0 0 1 0 0 0 0 0 0 0
P1 2 0 0 2 0 2
P2 3 0 3 0 0 0
P3 2 1 1 1 0 0
P4 0 0 2 0 0 2

<P0, P2, P3, P1, P4> will make Finish[i] = true for all i. Hence there will be no deadlock.
Suppose P2 makes one additional request for an instance of type C then the Request matrix will be
Process Request
P0 0 0 0
P1 2 0 2
P2 0 0 1
P3 1 0 0
P4 0 0 2
System will be deadlocked.

Detection Algorithm Usage:

When to invoke deadlock detection algorithm depends on:
1. How often a deadlock is likely to occur?
 If deadlocks occur frequently detection algorithm must be invoked frequently. Else resources
allocated to deadlocked processes will be idle.
 A deadlock occurs when some process makes a request that cannot be granted immediately.
Identify the set of processes that are deadlocked because of this and the process that made the
deadlock. So invoke the deadlock detection algorithm every time a request for resource
cannot be granted.
 If deadlock detection is done every time a request comes then it becomes very expensive.
 So invoke at less frequent intervals say when CPU utilization goes below 40%.
2. How many processes will be affected by deadlock when it happens?


 When a deadlock is detected by the algorithm the next step is recovery:
 Recovery:
o Inform the operator and let the operator deal with it manually.
o Let the system recover from deadlock automatically.
There are two mechanisms of recovery namely process termination:

Dept of ISE, Dr.AIT Page 80


I. Process Termination:
1. Abort all deadlocked processes:
 Breaks the deadlock cycle.
 Aborting them is at a great expense:
o Since many processes have executed for a long time.
o The results of computation wasted.
o Recomputation late.
2. Abort one process at a time until the deadlock cycle is terminated:
 Abort a process and run deadlock detection algorithm.
 Considerable overhead because of this.
Not easy to abort a process because:
o Say a process was in the midst of updating a file then terminating it will leave the file in an
incorrect state.
o If there is asset of processes in deadlock then determining which process to terminate is
o Which to terminate? Several factors are used:
1. Apply economics and abort the one that will be a minimum cost.
2. See what the priority of the process is.
3. How long a process has computed and how much longer a process will compute before
completing its task.
4. How many and what type of resources the process has used.
5. How many more resources the process needs to complete.
6. How many processes need to be terminated.
7. Whether the process is interactive batch.

II. Resource Preemption:

To eliminate deadlock we use resource pre-emption. There are three issues that need to be addressed
for pre-emption:
1. Selecting a victim:
o Which processes are to be preempted?
o Which resources are to be preempted?
o What is the order of pre-emption to minimize cost? Cost factors include number of resources
a deadlock process is holding, time consumed by the deadlocked process so far etc.
2. Rollback:
o What to do with a process if its resources are preempted.
o Since it cannot continue with normal execution roll back to some state and restart from there.
o Since it is difficult to determine what the safe state is it is better to do total rollback (abort).
o Problem with roll back is that the system has to keep information about the state of all
running processes.
3. Starvation:
 How to make sure that the resources are not always pre-empted from the same process.
 If victim selection is based on cost then the same process may be picked up every time.
 Hence results in starvation.
 Make sure that a victim is picked up only a small finite number of times.

Dept of ISE, Dr.AIT Page 81


1. Background
2. Swapping
3. Contiguous Memory Allocation
4. Paging
5. Structure of Page Table
6. Segmentation
7. Virtual Memory Management: Background
8. Demand Paging
9. Copy on Write
10. Page Replacement
11. Allocation of frames
12. Allocating Kernel Memory.

Dept of ISE, Dr.AIT Page 82



1. Background

 Central to the operation of the computer system.
 Is a large array of words or bytes each with specific address.
 The memory unit only sees a stream of addresses. It does not know how they are generated or
what they are used for (instruction or data).
 PC mentions the address in the memory from where instructions can be fetched
 Memory management is concerned with managing the primary memory.

Instruction execution:
 Fetch the instruction from memory.
 Decode and fetch operands if required.
 Execute & store the results back in the memory.

Basic Hardware:

For the CPU memory: to accommodate space and speed.

 Main memory and the registers built into the processor. These are direct access storage. For
execution the instructions and data must be in the direct access storage.
 Registers built into the processor are generally accessible within one cycle of the CPU clock.
 This is not so in the case of main memory. They may take up many cycles. During this time
the CPU will have to stall may be because it does not have the data required to complete the
instruction execution. A remedy is to have an associated cache.

For the CPU memory: to accommodate protection

 The protection is provided by the hardware. There are several possible implementations.
 To ensure that every process has a separate memory space: for which
o We need to determine the range of legal addresses that the process may access
o And ensure that the process accesses only those legal addresses.
 Protection is provided by using two registers base and limit.
 Base register holds the smallest legal physical memory address and the limit register
specifies the size of the range.
 The CPU hardware compares every address generated in user mode. Any violation will result
in a trap. This scheme prevents the user program from accidentally or deliberately modifying
others code.
The base and the limit registers can only be loaded by the OS using privileged instructions.

Dept of ISE, Dr.AIT Page 83


Address Binding:

 Programs are stored on the secondary storage disks as binary executable files.
 When the programs are to be executed they are brought in to the main memory and placed
within a process.
 The collection of processes on the disk waiting to enter the main memory forms the input
 One of the processes which are to be executed is fetched from the queue and placed in the main
 During the execution it fetches instruction and data from main memory. After the process
terminates it returns back the memory space.
 During execution the process will go through different steps and in each step the address is
represented in different ways.
 In source program the address is symbolic.
 The compiler converts the symbolic address to re-locatable address.
 The loader will convert this re-locatable address to absolute address.
Binding of instructions and data can be done at any step along the way:
1. Compile time:-If we know whether the process resides in memory then absolute code can be
generated. If the static address changes then it is necessary to re-compile the code from the
2. Load time:-If the compiler doesn‘t know whether the process resides in memory then it generates
the re- locatable code. In this the binding is delayed until the load time.
3. Execution time:-If the process is moved during its execution from one memory segment to another
then the binding is delayed until run time. Special hardware is used for this. Most of the general
purpose operating system uses this method.

Dept of ISE, Dr.AIT Page 84


Logical versus physical address:

 The address generated by the CPU is called logical address or virtual address.
 The address seen by the memory unit i.e., the one loaded in to the memory register is called
the physical address.
 Compile time and load time address binding methods generate some logical and physical
 The execution time addressing binding generate different logical and physical address.
 Set of logical address space generated by the programs is the logical address space.
 Set of physical address corresponding to these logical addresses is the physical address space.
 The mapping of virtual address to physical address during run time is done by the hardware
device called memory management unit (MMU).
 The base register is also called re-location register.
 Value of the re-location register is added to every address generated by the user process at
the time it is sent to memory.

Dynamic re-location using a re-location registers

The above figure shows that dynamic re-location which implies mapping from virtual addresses
space to physical address space and is performed by the hardware at run time. Re-location is
performed by the hardware and is invisible to the user dynamic relocation makes it possible to move
a partially executed process from one area of memory to another without affecting.
Dept of ISE, Dr.AIT Page 85

Dynamic Loading:

 For a process to be executed it should be loaded in to the physical memory. The size of the
process is limited to the size of the physical memory.
 Dynamic loading is used to obtain better memory utilization.
 In dynamic loading the routine or procedure will not be loaded until it is called.
 Whenever a routine is called, the calling routine first checks whether the called routine is
already loaded or not. If it is not loaded it cause the loader to load the desired program in to
the memory and updates the programs address table to indicate the change and control is
passed to newly called routine.
 Gives better memory utilization. x Unused routine is never loaded.
 Do not need special operating system support.
 This method is useful when large amount of codes are needed to handle in frequently
occurring cases.

Dynamic linking and Shared libraries:

 Some operating system supports only the static linking.

 In dynamic linking only the main program is loaded in to the memory. If the main program
requests a procedure, the procedure is loaded and the link is established at the time of
references. This linking is postponed until the execution time.
 With dynamic linking a ―stub‖ is used in the image of each library referenced routine. A
―stub‖ is a piece of code which is used to indicate how to locate the appropriate memory
resident library routine or how to load library if the routine is not already present.
 When ―stub‖ is executed it checks whether the routine is present is memory or not. If not it
loads the routine in to the memory.
 This feature can be used to update libraries i.e., library is replaced by a new version and all
the programs can make use of this library.
 More than one version of the library can be loaded in memory at a time and each program
uses its version of the library. Only the program that are compiled with the new version are
affected by the changes incorporated in it. Other programs linked before new version is
installed will continue using older libraries this type of system is called ―shared library.


 Swapping is a technique of temporarily removing inactive programs from the memory of the
 A process can be swapped temporarily out of the memory to a backing store and then brought
back in to the memory for continuing the execution. This process is called swapping.

Dept of ISE, Dr.AIT Page 86


Eg:-In a multi-programming environment with a round robin CPU scheduling whenever the time
quantum expires then the process that has just finished is swapped out and a new process swaps in to
memory for execution.
 A variation of swap is priority based scheduling. When a low priority is executing and if a
high priority process arrives then a low priority will be swapped out and high priority is
allowed for execution. This process is also called as Roll out and Roll in.
 Normally the process which is swapped out will be swapped back to the same memory space
that is occupied previously. This depends upon address binding.
 If the binding is done at load time, then the process is moved to same memory location.
 If the binding is done at run time, then the process is moved to different memory location.
This is because the physical address is computed during run time.
 Swapping requires backing store and it should be large enough to accommodate the copies of
all memory images.
 The system maintains a ready queue consisting of all the processes whose memory images
are on the backing store or in memory that are ready to run.
 Swapping is constant by other factors:
o To swap a process, it should be completely idle.
o A process may be waiting for an i/o operation. If the i/o is asynchronously accessing
the user memory for i/o buffers, then the process cannot be swapped.


 One of the simplest method for memory allocation is to divide memory in to several fixed
partition. Each partition contains exactly one process. The degree of multi-programming
depends on the number of partitions.
 In multiple partition method, when a partition is free, process is selected from the input
queue and is loaded in to free partition of memory.
 When process terminates, the memory partition becomes available for another process.
 Batch OS uses the fixed size partition scheme.
 The OS keeps a table indicating which part of the memory is free and is occupied.
 When the process enters the system it will be loaded in to the input queue. The OS keeps
track of the memory requirement of each process and the amount of memory available and
determines which process to allocate the memory.
 When a process requests, the OS searches for large hole for this process, hole is a large block
of free memory available.

Dept of ISE, Dr.AIT Page 87


 If the hole is too large it is split in to two. One part is allocated to the requesting process and
other is returned to the set of holes.
 The set of holes are searched to determine which hole is best to allocate. There are three
strategies to select a free hole:
o First bit:-Allocates first hole that is big enough. This algorithm scans memory from the
beginning and selects the first available block that is large enough to hold the process.
o Best bit:-It chooses the hole i.e., closest in size to the request. It allocates the smallest
hole i.e., big enough to hold the process.
o Worst fit:-It allocates the largest hole to the process request. It searches for the largest
hole in the entire list.
First fit and best fit are the most popular algorithms for dynamic memory allocation. First fit is
generally faster. Best fit searches for the entire list to find the smallest hole i.e., large enough. Worst
fit reduces the rate of production of smallest holes.
 All these algorithms suffer from fragmentation.
Memory Protection:
 Memory protection means protecting the OS from user process and protecting process from
one another.
 Memory protection is provided by using a re-location register, with a limit register.
 Re- location register contains the values of smallest physical address and limit register
contains range of logical addresses. (Re-location = 100040 and limit = 74600).
 The logical address must be less than the limit register, the MMU maps the logical address
dynamically by adding the value in re-location register.
 When the CPU scheduler selects a process for execution, the dispatcher loads the re-location
and limit register with correct values as a part of context switch.
 Since every address generated by the CPU is checked against these register we can protect
the OS and other users programs and data from being modified.

 Memory fragmentation can be of two types: Internal Fragmentation, External Fragmentation
 In Internal Fragmentation there is wasted space internal to a portion due to the fact that block
of data loaded is smaller than the partition. Eg:-If there is a block of 50kb and if the process
requests 40kb and if the block is allocated to the process then there will be 10kb of memory
 External Fragmentation exists when there is enough memory space exists to satisfy the
request, but it not contiguous i.e., storage is fragmented in to large number of small holes.
 External Fragmentation may be either minor or a major problem.

Dept of ISE, Dr.AIT Page 88


 One solution for over-coming external fragmentation is compaction. The goal is to move all
the free memory together to form a large block. Compaction is not possible always. If the
relocation is static and is done at load time then compaction is not possible. Compaction is
possible if the re-location is dynamic and done at execution time.
Another possible solution to the external fragmentation problem is to permit the logical address
space of a process to be non-contiguous, thus allowing the process to be allocated physical memory
whenever the latter is available.
 Paging is a memory management scheme that permits the physical address space of a process
to be non-contiguous. Support for paging is handled by hardware.
 It is used to avoid external fragmentation.
 Paging avoids the considerable problem of fitting the varying sized memory chunks on to the
backing store.
 When some code or date residing in main memory need to be swapped out, space must be
found on backing store.
Basic Method:
 Physical memory is broken in to fixed sized blocks called frames (f).
 Logical memory is broken in to blocks of same size called pages (p).
 When a process is to be executed its pages are loaded in to available frames from backing
 The blocking store is also divided in to fixed-sized blocks of same size as memory frames.
 The following figure shows paging hardware:

 Logical address generated by the CPU is divided in to two parts: page number (p) and page
offset (d).
 The page number (p) is used as index to the page table. The page table contains base address
of each page in physical memory. This base address is combined with the page offset to
define the physical memory i.e., sent to the memory unit.
 The page size is defined by the hardware. The size of a power of 2, varying between 512
bytes and 10Mb per page.

Dept of ISE, Dr.AIT Page 89


 If the size of logical address space is 2^m address unit and page size is 2^n, then high order
m-n designates the page number and n low order bits represents page offset.

Eg:-To show how to map logical memory in to physical memory consider a page size of 4 bytes and
physical memory of 32 bytes (8 pages).
a. Logical address 0 is page 0 and offset 0. Page 0 is in frame 5. The logical address 0 maps to
physical address 20. [(5*4) + 0].
b. Logical address 3 is page 0 and offset 3 maps to physical address 23 [(5*4) + 3]. c. Logical
address 4 is page 1 and offset 0 and page 1 is mapped to frame 6. So logical address 4 maps to
physical address 24 [(6*4) + 0].
d. Logical address 13 is page 3 and offset 1 and page 3 is mapped to frame 2. So logical address 13
maps to physical address 9 [(2*4) + 1].
Hardware Support for Paging:
The hardware implementation of the page table can be done in several ways:
1. The simplest method is that the page table is implemented as a set of dedicated registers. These
registers must be built with very high speed logic for making paging address translation. Every
accessed memory must go through paging map. The use of registers for page table is satisfactory if
the page table is small.
2. If the page table is large then the use of registers is not visible. So the page table is kept in the
main memory and a page table base register [PTBR] points to the page table. Changing the page
table requires only one register which reduces the context switching type. The problem with this
approach is the time required to access memory location. To access a location [i] first we have to
index the page table using PTBR offset. It gives the frame number which is combined with the page
offset to produce the actual address. Thus we need two memory accesses for a byte.
3. The only solution is to use special, fast, lookup hardware cache called translation look aside buffer
[TLB] or associative register. TLB is built with associative register with high speed memory. Each
register contains two paths a key and a value.

Dept of ISE, Dr.AIT Page 90


When an associative register is presented with an item, it is compared with all the key values, if
the corresponding value field is return and searching is fast.
TLB is used with the page table as follows:
 TLB contains only few page table entries.
 When a logical address is generated by the CPU, its page number along with the frame
number is added to TLB. If the page number is found its frame memory is used to access the
actual memory.
 If the page number is not in the TLB (TLB miss) the memory reference to the page table is
made. When the frame number is obtained use can use it to access the memory.
 If the TLB is full of entries the OS must select anyone for replacement.
 Each time a new page table is selected the TLB must be flushed [erased] to ensure that next
executing process do not use wrong information.
 The percentage of time that a page number is found in the TLB is called HIT ratio.
 Memory protection in paged environment is done by protection bits that are associated with
each frame these bits are kept in page table.
 One bit can define a page to be read-write or read-only.
 To find the correct frame number every reference to the memory should go through page
table. At the same time physical address is computed.
 The protection bits can be checked to verify that no writers are made to read-only page.
 Any attempt to write in to read-only page causes a hardware trap to the OS.
 This approach can be used to provide protection to read-only, read-write or execute-only
 One more bit is generally added to each entry in the page table: a valid-invalid bit.

Dept of ISE, Dr.AIT Page 91


 A valid bit indicates that associated page is in the processes logical address space and thus it
is a legal or valid page.
 If the bit is invalid, it indicates the page is not in the processes logical addressed space and
illegal. Illegal addresses are trapped by using the valid-invalid bit.
 The OS sets this bit for each page to allow or disallow accesses to that page.


a. Hierarchical paging: Recent computer system support a large logical address apace from
2^32 to 2^64. In this system the page table becomes large. So it is very difficult to allocate
contiguous main memory for page table. One simple solution to this problem is to divide
page table in to smaller pieces. There are several ways to accomplish this division.
• One way is to use two-level paging algorithm in which the page table itself is also paged.
Eg:-In a 32 bit machine with page size of 4kb. A logical address is divided in to a page
number consisting of 20 bits and a page offset of 12 bit. The page table is further divided
since the page table is paged, the page number is further divided in to 10 bit page number
and a 10 bit offset. So the logical address is

b. Hashed page table: Hashed page table handles the address space larger than 32 bit. The virtual
page number is used as hashed value. Linked list is used in the hash table which contains a list of
elements that hash to the same location.
 Each element in the hash table contains the following three fields:
 Virtual page number
 Mapped page frame value
 Pointer to the next element in the linked list
Working: Virtual page number is taken from virtual address.
Virtual page number is hashed in to hash table.
Virtual page number is compared with the first element of linked list.
Both the values are matched, that value is (page frame) used for calculating the
physical address.
If not match then entire linked list is searched for matching virtual page number.
Clustered pages are similar to hash table but one difference is that each entity in the
hash table refer to several pages.

Dept of ISE, Dr.AIT Page 92


c. Inverted Page Tables:Since the address spaces have grown to 64 bits, the traditional page tables
become a problem. Even with two level page tables. The table can be too large to handle. An
inverted page table has only entry for each page in memory. Each entry consisted of virtual address
of the page stored in that read-only location with information about the process that owns that page.
Each virtual address in the Inverted page table consists of triple <process-id , page number , offset >.
The inverted page table entry is a pair <process-id , page number>. When a memory reference is
made, the part of virtual address i.e., <process-id , page number> is presented in to memory sub-
system. The inverted page table is searched for a match. If a match is found at entry I then the
physical address <i , offset> is generated. If no match is found then an illegal address access has
been attempted. This scheme decreases the amount of memory needed to store each page table, it
increases the amount of time needed to search the table when a page reference occurs. If the whole
table is to be searched it takes too long.

Advantage: Eliminates fragmentation.

Support high degree of multiprogramming.
Increases memory and processor utilization.
Compaction overhead required for the re-locatable partition scheme is also
Disadvantage: Page address mapping hardware increases the cost of the computer.
Memory must be used to store the various tables like page tables, memory map table
etc. Some memory will still be unused if the number of available block is not
sufficient for the address space of the jobs to be run.
Shared Pages: Another advantage of paging is the possibility of sharing common code. This is
useful in timesharing environment.
Eg:-Consider a system with 40 users, each executing a text editor. If the text editor is of
150k and data space is 50k, we need 8000k for 40 users. If the code is reentrant it can be shared.
Consider the
following figure

Dept of ISE, Dr.AIT Page 93


If the code is reentrant then it never changes during execution. Thus two or more processes can
execute same code at the same time. Each process has its own copy of registers and the data of two
processes will vary.
• Only one copy of the editor is kept in physical memory. Each users page table maps to same
copy of editor but date pages are mapped to different frames.
• So to support 40 users we need only one copy of editor (150k) plus 40 copies of 50k of data space
only 2150k instead of 8000k.

What is the user’s view of memory?

subroutine Symbol
stack sqrt

Logical address space

User’s view:
 User thinks of his program as a main program with a set of subroutines/procedures/functions
or modules.
 The program can also contain structures like tables, arrays, stacks etc.
 There can also be a symbol table.
 The users does not care where in memory these modules/ segments are located. Each of these
segments are of different lengths and each element of the segment has got its own offset from
the beginning of its segment.
Segmentation Definition:
 Is a memory management scheme that supports the user’s view of memory.
 Is natural to compilers, since they automatically generate segments reflecting the input
 Ex. compiler creates separate segment for:
1. Global variables.

Dept of ISE, Dr.AIT Page 94


2. Code portion of the procedure.

3. Local variables of the procedure.
4. Procedure call stack.
 The logical address space is a collection of segments each with specific name and length.
 Segments are referred to by segment number.
 In the system the logical address is represented as a tuple <segment-number, offset>
Hardware Support for segmentation:

 Mapping here is brought about by the segment table.

 Each entry in the segment table has a segment base and the segment limit.
 Segment base has starting address of the physical address where the segment is going to
 The segment limit specifies the length of the segment.
 The logical address has two parts namely the segment number s and the offset d into that
segment. If the offset does not fall within the segment limit then a trap is issued.
Protection & Sharing in Segmentation:
 Since segments are semantically defined portions of the program all entries in the segment will
be used in the same way.
 The memory mapping hardware checks the protection bits for read/write/execute. Any illegal
access results in a trap.
 Advantage of segmentation is code sharing. Segments can be shared when the entries in the
segment table map to the same physical location.
 Segments are of variable length. This is like variable size partition scheme.
 Segmentation causes external fragmentation.
 When memory blocks are available but too small to accommodate the segments this
fragmentation happens.
 Probably compaction helps.
 As an extreme step we could define each process to be one segment.


 A technique that allows the execution of processes/programs that may not be completely in
 It is the technique of separating the users logical memory from physical memory.

Dept of ISE, Dr.AIT Page 95


 Programs can be larger that main memory size.
 Abstracts the main memory into an extremely large and uniform array of storage. OR allows an
extremely large virtual memory to be provided to the programmers even though a small physical
memory is available.
 Separates the logical view as seen by the user from the physical view of memory.
 Frees programmers from the concerns of memory storage limitations.
 Allows processes to easily share files and address spaces.
 Also provides efficient mechanism for process creation.

 Complexity.
 Cost.
 Difficulty in implementation.
 Decrease in performance if not used carefully.

How did we deal with the limited size of memory previously?

Since the program has to reside in the main memory during execution the need to keep the
entire logical address space in physical memory is compensated by techniques like dynamic loading
and overlays. But these techniques require lot of intervention of the user, which is not desirable.

Some reasons why the entire program need not be in memory:

 Programs have code to handle unusual error conditions, which is seldom or never executed.
 Arrays, lists and tables are often allocated more memory than they actually require. Ex.
assembler symbol table is normally allocated space for 3000 symbols even though the average
program has less than 200 symbols.
 Certain options and feature are rarely used in program but yet they are included for any
 Even if the entire program is required not all parts of the program is required at the same time.

What are the advantages of executing a program that is partially in memory?

 The program is not constrained by the amount of physical memory that is available.
 The users will be able to write extremely large programs simplifying the programming task.

Dept of ISE, Dr.AIT Page 96


 Since each user program takes less physical memory more programs can run at the same time.
This increases CPU utilization and throughput.

How is virtual memory implemented?

VM is implemented by demand paging, segmentation, paged segmentation and demand

 In a paging system with swapping, processes residing on the disk are swapped into memory for
 In demand paging, unless a page is requested for it is not swapped.
 In demand paging, the lazy swapper is used. The lazy swapper is one that does not swap a page
into memory unless that page will be needed.
 The swapper can be referred to as a pager.
 The pager guesses what pages are required. So instead of bringing in the entire process only the
necessary pages are brought into the memory. This decreases the swap time and the amount of
memory used.
 Hence demand paging is a technique of not bringing the page until required.

How to know if the page is on the disk or main memory?

 The valid/invalid bit scheme is used.
 If the bit is set it is valid and implies that the page is legal and is in the memory.
 If the bit is set to invalid indicates that the page is not valid or is valid but is currently on the
disk. The paging hardware in translating the addresses through the page table will notice that the
invalid bit is set thus causing a trap.

Dept of ISE, Dr.AIT Page 97


 The important thing is to guess the pages right and page them in so that the processes execute the
resident pages and complete execution normally.

What happens when an access to an invalid page is made?

A page fault trap is issued. During the translation of address through the page table if the
system finds the invalid bit is set then issues a trap. This is the result of the OS’s failure to bring in
the desired page.
How to handle page faults?
 Check the internal table kept inside the PCB of the process to determine whether the reference
was a valid or invalid memory access.
 If the reference is invalid then the process is terminated.
 If the reference is valid but the page is not in the main memory then the page is paged in.
 A free frame is found.
 The disk read operation is initiated and the page is allocated into the frame.
 After the disk read the internal table of the process and its PCB is updated.
 The instruction which was trapped would have saved its state. This enables the process to start
exactly where it was left out and hence is restarted.
 This is how processes not fully in memory are still able to execute.

How does execution start? What is pure demand paging?

 The extreme case is with no pages in the memory.
Dept of ISE, Dr.AIT Page 98

 When the first instruction is to be fetched it refers to the non-memory resident page thus causing
a page fault.
 The desired page is brought to the memory.
 The process of page fault and paging continues till all the desired pages are paged in.
 At this point there are no more page faults and this whole scheme is referred to as pure demand

Analysis of this form of demand paging:

It is possible that a single instruction execution may require several new pages to be brought
to the memory. This causes several page faults per instruction. This results in poor performance.
The concept of locality of reference is used to enhance the performance.

Hardware support is required to support demand paging is same hardware as paging and swapping.
Items are:
1. Page Table: the table has the ability to have bits for marking them valid and invalid or special
protection bits value.
2. Secondary Memory/Swap Space: holds pages that are not present in the main memory. Should be
a reasonably high-speed disk.

Software support required is for:

 Restart instruction after page fault.
 Note: this could be the worst case scenario.
o If the page fault can occur during the fetch in which case we need to restart by fetching
the instruction again.
o If it occurs during the fetch of its operands, then we must fetch and decode the instruction
again and then fetch the operand.
If it occurs say when we are trying to store the result in a page and that page is not in memory then
we have to fetch, decode, fetch operands and execute the instruction once again and then store the
results in the paged just brought in.

Performance of demand paging:

Demand paging has a significant effect on the performance of the system.
The effective access time for a demand-paged memory is
e.a.t = (1-p) x ma + p x page fault time
where p is the probability of a page fault, ma is the memory access time which ranges from 20ns to
To calculate e.a.t we need to know all the time needed to service page fault.
What is the sequence of events that takes in a page fault?
 Trap to the OS.
 Save the user registers and process state.
 Determine that the interrupt is a page fault.
 Check that the page reference was legal and determine the location of the page on the disk.
 Issue a read from the disk to a free frame.
o Wait in the queue for this device until the read request is serviced.
o Wait for the device seek and/or latency time.
Dept of ISE, Dr.AIT Page 99

o Begin the transfer of the page to a free frame.

 While waiting allocate the CPU to some other user (optional: Though CPU scheduling).
 Interrupt from the disk (I/O completed).
 Save the registers and process state for the other user (if step 6 is executed).
 Determine that the interrupt was from the disk.
 Correct the page table and other tables to show that the desired page is now in memory.
 Wait for the CPU to be allocated to this process again.
 Restore the user registers, process state, and new page table and then resume the interrupted

Not all the above steps are encountered all the time.
In general the three major components of the page-fault service time is:
1. Service the page fault interrupt.
2. Read in the page.
3. Restart the process.

The typical values of the different components of the page-fault service time
 The first and the third tasks of the above components typically takes 1 to 100microseconds.
 Page switch time is typically 24ms.
 The typical latency of hard disk is 8ms and seek time of 15ms
 Transfer time of 1ms.

To reduce the page-fault service time

 This service time can be reduced by careful coding.
 Further improvement to demand paging can be done through better handling of overall swap
When a disk I/O operation takes place and is directed to the swap space and not to the file system
then the operation is faster. Faster because the swap space is allocated in much larger blocks and
lookups and indirect allocation methods are not used.

 Demand paging is used when reading a file from disk in to memory. Fork () is used to create
a process and it initially bypass the demand paging using a technique called page sharing.
Page sharing provides rapid speed for process creation and reduces the number of pages
allocated to the newly created process.
 Copy-on-write technique initially allows the parent and the child to share the same pages.
These pages are marked as copy- on-write pages i.e., if either process writes to a shared page,
a copy of shared page is created.
Eg:-If a P1 process try to modify a page containing portions of the stack; the OS recognizes
them as a copy-on-write age and create a copy of this page and maps it on to the address
space of the child process. So the child process will modify its copied page and not the page
belonging to parent. The new pages are obtained from the pool of free pages.

Dept of ISE, Dr.AIT Page 100


Memory Mapping:Standard system calls i.e., open (), read () and write () is used for sequential read
of a file. Virtual memory is used for this. In memory mapping a file allows a part of the virtual
address space to be logically associated with a file. Memory mapping a file is possible by mapping a
disk block to page in


Demand paging shares the I/O by not loading the pages that are never used.
 Demand paging also improves the degree of multiprogramming by allowing more process to
run at the some time.
 Page replacement policy deals with the solution of pages in memory to be replaced by a new
page that must be brought in.
 When a user process is executing a page fault occurs.
 The hardware traps to the operating system, which checks the internal table to see that this is
a page fault and not an illegal memory access.
 The operating system determines where the derived page is residing on the disk, and this
finds that the are no free frames on the list of free frames.
 When all the frames are in main memory, it is necessary to bring a new page to satisfy the
page fault, replacement policy is concerned with selecting a page currently in memory to be
 The page i,e to be removed should be the page i,e least likely to be referenced in future.

Working of Page Replacement Algorithm

 Find the location of derived page on the disk.
 Find a free frame x If there is a free frame, use it.
 Otherwise, use a replacement algorithm to select a victim. Write the victim page to the disk;
change the page and frame tables accordingly.
 Read the desired page into the free frame; change the page and frame tables.
 Restart the user process.

Dept of ISE, Dr.AIT Page 101


Victim Page
 The page that is supported out of physical memory is called victim page.
 If no frames are free, the two page transforms come (out and one in) are read. This will see
the effective access time.
 Each page or frame may have a dirty (modify) bit associated with the hardware. The modify
bit for a page is set by the hardware whenever any word or byte in the page is written into,
indicating that the page has been modified.
 When we select the page for replacement, we check its modify bit. If the bit is set, then the
page is modified since it was read from the disk.
 If the bit was not set, the page has not been modified since it was read into memory.
Therefore, if the copy of the page has not been modified we can avoid writing the memory
page to the disk, if it is already there. Sum pages cannot be modified.
 We must solve two major problems to implement demand paging: we must develop a frame
allocation algorithm and a page replacement algorithm. If we have multiple processors in
memory, we must decide how many frames to allocate and page replacement is needed.
Page replacement Algorithms
FIFO Algorithm:
 This is the simplest page replacement algorithm. A FIFO replacement algorithm associates
each page the time when that page was brought into memory.
 When a Page is to be replaced the oldest one is selected.
 We replace the queue at the head of the queue. When a page is brought into memory, we
insert it at the tail of the queue.
Example: Consider the following references string with frames initially empty.
 The first three references (7,0,1) cases page faults and are brought into the empty frames.
 The next references 2 replaces page 7 because the page 7 was brought in first.
 Since 0 is the next references and 0 is already in memory e has no page faults.
 The next references 3 results in page 0 being replaced so that the next references to 0
causer page fault. This will continue till the end of string. There are 15 faults all together.

Belady’s Anamoly
 For some page replacement algorithm, the page fault may increase as the number of allocated
frames increases. FIFO replacement algorithm may face this problem.

Dept of ISE, Dr.AIT Page 102


Optimal Algorithm
 Optimal page replacement algorithm is mainly to solve the problem of Belady‘s Anamoly.
 Optimal page replacement algorithm has the lowest page fault rate of all algorithms.
 An optimal page replacement algorithm exists and has been called OPT.
 The working is simple ―Replace the page that will not be used for the longest period of
time‖ Example: consider the following reference string
 The first three references cause faults that fill the three empty frames.
 The references to page 2 replaces page 7, because 7 will not be used until reference 18.
 The page 0 will be used at 5 and page 1 at 14.
 With only 9 page faults, optimal replacement is much better than a FIFO, which had 15
 This algorithm is difficult implement because it requires future knowledge of reference
 strings.

Least Recently Used (LRU) Algorithm

 If the optimal algorithm is not feasible, an approximation to the optimal algorithm is
 The main difference b/w OPTS and FIFO is that;
 FIFO algorithm uses the time when the pages was built in and OPT uses the time when a
page is to be used.
 The LRU algorithm replaces the pages that have not been used for longest period of time.
 The LRU associated its pages with the time of that pages last use.
 This strategy is the optimal page replacement algorithm looking backward in time rather than

Dept of ISE, Dr.AIT Page 103


 Two implementation are possible:

 Counters: In this we associate each page table entry a time -of -use field, and add to the cpu
a logical clock or counter. The clock is incremented for each memory reference. When a
reference to a page is made, the contents of the clock register are copied to the time-of-use
field in the page table entry for that page. In this way we have the time of last reference to
each page we replace the page with smallest time value. The time must also be maintained
when page tables are changed.
 Stack: Another approach to implement LRU replacement is to keep a stack of page numbers
when a page is referenced it is removed from the stack and put on to the top of stack. In this
way the top of stack is always the most recently used page and the bottom in least recently
used page. Since the entries are removed from the stack it is best implement by a doubly
linked list. With a head and tail pointer. Neither optimal replacement nor LRU replacement
suffers from Belady‘s Anamoly. These are called stack algorithms.

LRU Approximation
 An LRU page replacement algorithm should update the page removal status information after
every page reference updating is done by software, cost increases.
 But hardware LRU mechanism tend to degrade execution performance at the same time, then
substantially increases the cost. For this reason, simple and efficient algorithm that
approximation the LRU have been developed. With h/w support the reference bit was used.
A reference bit associate with each memory block and this bit automatically set to 1 by the
h/w whenever the page is referenced. The single reference bit per clock can be used to
approximate LRU removal. The page removal s/w periodically resets the reference bit to 0,
write the execution of the users job causes some reference bit to be set to 1.
If the reference bit is 0 then the page has not been referenced since the last time the reference
bit was set to 0.

Count Based Page Replacement

 There is many other algorithms that can be used for page replacement, we can keep a counter
of the number of references that has made to a page.
 a) LFU (least frequently used) : This causes the page with the smallest count to be replaced.
The reason for this selection is that actively used page should have a large reference count.
 This algorithm suffers from the situation in which a page is used heavily during the initial
phase of a process but never used again. Since it was used heavily, it has a large count and
remains in memory even though it is no longer needed.
 b) Most Frequently Used(MFU) : This is based on the principle that the page with the
smallest count was probably just brought in and has yet to be used.
Dept of ISE, Dr.AIT Page 104

Page Buffering Algorithm

 Systems commonly keep a pool of free frames.
 When a page fault occurs a victim frame is selected using any replacement algorithm.
 But before the victim is written out the desired page is first written into the frame from the
 This is to allow the process to start as soon as possible without waiting for the victim to be
swapped out.

 The allocation policy in a virtual memory controls the operating system decision regarding
the amount of real memory to be allocated to each active process.
 In a paging system if more real pages are allocated, it reduces the page fault frequency and
improved turnaround throughput.
 If too few pages are allocated to a process its page fault frequency and turnaround times may
deteriorate to unacceptable levels.
 The minimum number of frames per process is defined by the architecture, and the maximum
number of frames. This scheme is called equal allocation.
Proportional Allocation: Allocate the memory according to the size of each process.
Let si be the size of the logical memory for process pi. Let the total number of frames be m.
And S =  si
If we allocate ai frames to process pi then
ai = si/S x m

 With multiple processes competing for frames, we can classify page replacement into two
broad categories a) Local Replacement: requires that each process selects frames from only
its own sets of allocated frame. b). Global Replacement: allows a process to select frame
from the set of all frames. Even if the frame is currently allocated to some other process, one
process can take a frame from another. In local replacement the number of frames allocated
to a process do not change but with global replacement number of frames allocated to a
process do not change global replacement results in greater system throughput.


Kernel memory/ however1 is often allocated from a free-memory pool different from the list used to
satisfy ordinary user-mode processes. There are two primary reasons for this:
1. The kernel requests memory for data structures of varying sizes, some of which are less than a
page in size. As a result1 the kernel must use memory conservatively and attempt to minimize waste
due to fragmentation. This is especially important because many operating systems do not subject
kernel code or data to the paging system.
2. Pages allocated to user-mode processes do not necessarily have to be in contiguous physical
memory. However/ certain hardware devices interact directly with physical memory-without the
benefit of a virtual memory interface-and consequently may require memory residing in physically
contiguous pages.
Buddy System

Dept of ISE, Dr.AIT Page 105


 Allocates memory from fixed-size segment consisting of physically-contiguous pages

 Memory allocated using power-of-2 allocator
 Satisfies requests in units sized as power of 2
 Request rounded up to next highest power of 2
 When smaller allocation needed than is available, current chunk split into two
buddies of next-lower power of 2
 Continue until appropriate sized chunk available.
Slab Allocator

 Alternate strategy.
 Slab is one or more physically contiguous pages.
 Cache consists of one or more slabs.
 Single cache for each unique kernel data structure.
 Each cache filled with objects – instantiations of the data structure.
 When cache created, filled with objects marked as free.
 When structures stored, objects marked as used.
 If slab is full of used objects, next object allocated from empty slab.
 If no empty slabs, new slab allocated.
 Benefits include no fragmentation, fast memory request satisfaction.

Dept of ISE, Dr.AIT Page 106


1. File Concept.
2. Access Methods
3. Directory Structure
4. File System Mounting
5. File Sharing
6. Protection
7. Implementing File Systems: File System Structur
8. File System Implementation
9. Directory Implementation
10. Allocation Methods
11. Free Space Management
12. Efficiency and Performance
13. Recovery

Dept of ISE, Dr.AIT Page 107


File system is the most visible part of the OS. It has two parts: a collection of files each
storing related data and a directory structure for organizing the information about all files. Some
have the third component called partition used to logically separate large collection of directories.

1. File Concept
 A file is a named collection of related information recorded on the storage media. The
information stored either on disk, tape, optical media or any other storage device require
uniform logical view of the information. This view is provided by the file system.
 According to the type of file it has a defined structure.
o Text File: sequence of characters organized into lines or pages.
o Source File: sequence of subroutines and functions each of which is organized into
declarations and executable statements.
o Object File: is a sequence of bytes organized into blocks understandable by the system’s
o Executable File: is a series of code section (binary) that the loader can bring into memory
and execute.
 A file has a name for convenience. Systems may be sensitive to case while naming and some
are not.

File attributes
 A file has attributes which vary from system to system.
 Some of the attributes are:
o Name: symbolic file name kept in human readable form.
o Identifier: is a unique tag which is usually a number and identifies the file within the file
system and is in the non-human readable format.
o Type: do depict the different types supported by the system.
o Location: it is pointer to the device and to the location of the file on that device.
o Size: indicates the current size of the file in bytes or words or blocks and also possibly
the maximum allowed size.
o Protection: represents the access control information like read/write/execute.
o Time, date, and user identification: depicts the creation time, last modification, last use.
Useful for usage monitory, protection and security.

File operations
 Creating a file:
o space must be found for the file
o a new file entry in the directory must be made.
 Writing a file:
o System call is used to specify the name of the file and the information to be written.
o With the file name the system searches directory for the file location.
o A write pointer indicates the location from where writing is to be done in the file.
o The current operation location is kept in the current-file-position pointer.
 Reading a file:

Dept of ISE, Dr.AIT Page 108


o System call is used to specify the name. With the name the directory is searched for the
file location.
o Read pointer points to the location from where reading is to be done.
o The current operation location is kept in the current-file-position pointer.
 Repositioning within a file:
o Set the current-file-position pointer to a given value.
o This file operation is known as seek.
 Deleting a file:
o Search the directory for the file location with the name as index.
o Release the file space for reuse.
 Truncating a file:
o Erase desired contents of the file but still keep its attributes.
 Append: add new information at the end of the existing file.
 Renaming: giving a different name to the existing file.
 Copy: creating a new file, reading from the old file and writing those contents into the new
 Open: Open system call issued to bring up the file.
o Requires the file name and searches the directory for the location.
o Accepts access mode information create, read, read-write etc.
o Makes an entry in the open file table.
o Open file returns a pointer to the entry in the open-file table.
o Some systems open the file implicitly as soon as the file is referenced. Some systems
require that the user explicitly open the file.
o Close: when the file is no longer required the entry is removed from the open-file table.

 The open-file table keeps information about all opened files. This is to avoid constant search
through the directory.
 UNIX implementation:
o In UNIX the open and close are more complicated because it operates in a multiuser
environment. Here several users may open the file simultaneously.
o Two level internal tables are maintained:
 A per process table: keeps track of all the files opened by the process. Access
rights, accounting information read write pointers are also in the table. Each entry
points to the system wide table.
 System wide table: contains process independent information like location of file
on disk, access dates and file size. It has an open count associated with each file
indicating the number or processes that have opened that file. A close of file
decreases the count and open of the file increases the count.
 Information associated with opening a file
o File Pointer: if systems do not include offset as part of read and write system calls to
track the read and write location the current file position pointer is used. The pointer is
unique to each process.
o File Open Count: to track the number of open files and the value of the counter reaches
zero when the last file is closed.
Dept of ISE, Dr.AIT Page 109

o Disk Location of the File: the information needed to locate the file on disk is kept in
memory to avoid having to read it from disk for each operation.
o Access Rights: per process access modes are stored.

File Locks:
 Some OS provide a locking mechanism for open files. File locks are used when the file is shared
by several processes.
Ex. a system log file can be accessed and modified by a number of processes in the system.
 A shared lock is a reader lock which can be acquired by several processes concurrently. An
exclusive lock is like a writer lock. Only one process at a time can acquire it. Systems can
provide either mandatory or advisory file locking mechanisms.
Mandatory file lock:
 If a lock is mandatory then once the process acquires an exclusive lock the OS will prevent any
other process from acquiring the locked file. In this mechanism the OS ensures the locking
Advisory file lock:
 In this case the OS will not prevent another process from acquiring a locked file. It is the
responsibility of the software developers to ensure that the locks are appropriately acquired and
Ex. Windows OS.

File types
 The common technique for implementing the file types is to include it as a part of the file
name. The name is split into name and extension separated by a period.

file type usual extension function

executable exe, com, bin ready to run machine language program.
object obj, o compiled, machine language, not linked.
source code c, cc, java, pas, asm, a source code in various languages
batch bat, sh commands to the command interpreter
text txt, doc text data or documents
word wp, tex, rrf, doc various word processor formats
library lib, a, so, dll, mpeg, libraries of routines for programmers.
mov, rm
print or view arc, zip, tar ASCII or binary file in a format for printing or viewing
archive arc, zip, tar related files grouped into one file, sometimes compressed
for archive or storage
multimedia mpeg, mov, rm binary file containing audio and a/v information

File Structures
 The type of the file may indicate the structure of the file. Source and object files have structures
that are suited to programs that call them. Some files have structures that are understood by the

Dept of ISE, Dr.AIT Page 110


OS. Some have system supported file structures with sets of special operations for users for
manipulating them. Means there are several structures available.
 The OS has to support several file structures.
 Complexity of the OS increases.
 New applications also face problems if the OS requires information about structures not
supported by it.
 Some OS restrict to minimum number of structures.
Ex. In UNIX and MS-DOS each file is a sequence of 8bit bytes and no interpretation of these
bits is made by the OS. Each application must include code to interpret the file to the appropriate
structure. But all OS must support at least one structure. This scheme gives more flexibility.
 Macintosh supports limited structures. It expects the file to contain two parts resource fork and a
data fork. The resource fork contains information of interest to the user. The OS provides tools
for allowing modification to data in resource fork. Ex. wanting to relabel the button in own
language. The data fork contains program code or data namely traditional file contents.

Internal file structure:

 Disk I/O is performed in units of block or physical record.
 Disk systems have a well-defined block size determined by the size of the sector.
 The size of the physical record may not be same as the size of the logical record. To solve this
problem several logical records are packed into a physical record.
 Packing can be done by the user program or by the OS.
 The type of packing technique, size of the physical record block and the size of the logical record
determines the number of logical records in the physical block.
 In UNIX the files are defined as streams of bytes. Each byte is individually addressable by its
offset from the beginning of the file. Hence the logical record size is 1 byte. Hence the file
system automatically packs/unpacks bytes into physical disk blocks.
 Since the disk space is always allocated in blocks some portions of the last block of each file
may be wasted. This leads to internal fragmentation.
Hence all files suffer from internal fragmentation. The larger the block size the greater the

2. Access Methods
The information stored in the files have to be accessed for reading into computer
memory.The access can be done in several ways. Some systems provide only one access method like
IBM and some support many access methods.

Sequential Access:
 Sequential access is the simplest access method. Based on tape model of a file.
 Information in the file is processed in order; one record after another.
 Most common method.
Ex. editors and compilers access file by this method.

Dept of ISE, Dr.AIT Page 111


 During the read operation (read next) after reading one portion the file pointer automatically
moves to the next portion in order.
 Write (write next) appends at the end of the file and advances the pointer to the end.
Advantage: Simple
Disadvantage: No random access.

Direct Access/Relative Access:

 The file is made up of fixed length logical records.
 It is based on the disk model which allows random access to any file block.
 The read and write records happens rapidly in no particular order.
 The file is viewed as a numbered sequence of blocks or records.
 A direct access file allows arbitrary blocks to be read or written.
Eg:-User may need block 13, then read block 99 then write block 12.
 For searching the records in large amount of information with immediate result, the direct access
method is suitable. Not all OS support sequential and direct access. Few OS use sequential
access and some OS uses direct access.
 Useful for immediate access of large amounts of information.

Other Access Methods:

 Other methods are built on the top of direct access method.
 They involve index to the file.
 To find a record in the file first the index is searched and then use the pointer to access the file
directly and to find the desired record.
 But the problem with this is that with large files the index file itself becomes large to be kept in
the main memory.
 Solution is to create an index to the index file. Primary index file will contain pointers to the
secondary index file.
Ex. IBM uses an indexed sequential access method where a small master index points to disk blocks
of a secondary index. The secondary index blocks point to the actual file blocks. The file is kept
sorted on a defined key. To find an item a binary search of the master index is made which provides
the block number of the secondary index. This block is read in and again a binary search is used to
find the block containing the desired block. Finally the block is searched sequentially.

3. Directory Structure
 The number of files on disks today is very large. Need to organize them.
 The organization is done in two parts.
 First the disk is split into one or more partitions called minidisks/volumes. Each minidisk can
be treated as a separate storage device. Partitions can be grouped also to build a larger

Dept of ISE, Dr.AIT Page 112


 Second each partition contains information about files within. This information is kept in
device directory/volume table of contents.
 The directory records name, location, size and type for all files on that partition.

Directory overview
 Different operations are performed on the directories:
o Search for a file: Search a file for the particular entry through the symbolic name. We should
be able to find all files that match a particular pattern.
o Create a file:New files need to be added and added to the directory.
o Delete a file:When a file is no longer needed it has to be removed from the directory.
o List a directory: Need to list the entries in the directory.
o Rename a file: Change the name of the file. Renaming may allow the position of the file in
the directory structure to be changed.
o Traverse the file system: Trying to access every directory and every file in the directory.

Single-Level Directory
 Simplest, easy to support and understand.
 All files of all users are contained in the same directory.
 The problem is that when the number of entries increases all files in the directory must have
unique names. It becomes difficult to remember all file names as the number of files increases. It
also leads to confusion of file names between different users.

Two-Level Directory
 Create a separate directory for each user called the user file directory (UFD). Each of the UFD
has similar structure but lists only the files of a single user.
 The master file directory (MFD) is indexed by user name or account number has each entry
pointing to the UFD of that user. Several UFDs may have files of the same name, but within
UFD it has to be unique.
 When a file is to be created the OS searches the UFD of only that user.Similarly to delete a file
the file in the user UFD is deleted.The user directories can also be created and deleted.

 The two level directory solves the name collision problem.
 Isolates one from user the other.
 Isolation has advantage but when cooperation between users on some file is required it is not
 Some systems do not allow this sharing at all.

To allow access to other user’s files

 One user must be able to name the files in the other user’s directory.

Dept of ISE, Dr.AIT Page 113


 To name that file both the user name and the file name must be given.
 A two level directory is a tree structure of height 2. Root of the tree is MFD and the children are
the UFDs. The descendents of the UFDs are the files.
 To specify a file we need to specify the file name from the root to the leaf. This is the path.
 Every file in the system has a path name.
Ex. the programs like loaders, compilers, utility routines and libraries are defined as files. When a
command comes these files are read by the loader and executed. The file name is searched in the
local UFD. This will require a copy of the system files in each UFD. That is waste of space. Instead
a special user directory is created to contain all system files. So when a file name is given to be
loaded first the UFD is searched. If not present then the system searches the special user directory.
This sequence of directories searched is called as search path.

Tree-Structured Directories
 The two level directory or the two level tree can be extended to a tree of arbitrary length. This
allows the creation of sub directories. Files can be organized here. Most common approach.
 There is a root directory. A directory has files or sub directories.
 Does not allow sharing of files and directories between users. Creating separate subdirectories
helps in better structure and also to put related files together. Ex. the subdirectory programs may
contain source programs, subdirectory bin may contain all binaries.
 Every file has a unique path from the root to through the sub directories and to the file. One bit is
used to indicate whether it is a file or a directory.
 Special system calls for creating and deleting directories.
 Each user has current directory where all the files of current interest are present.
 When a file name is given the current directory is searched else if not here the entire path name
has to be specified.
 The path names can be absolute: starts from the root and follows down to the specified file, or
relative: which defines the path from the current directory.
 When a directory is to be deleted and if there are entries in it then two approaches are followed:
systems like MS DOS will not let deletion if there are entries in the directory. This may result in
too much work.
 Alternatively if a delete directory command is given then all entries are also deleted. Like in
UNIX rm command. This is a convenient policy but dangerous.
 With tree structure users can access other user files also by specifying either absolute or relative
path names.

Dept of ISE, Dr.AIT Page 114


Acyclic-Graph Directories
 It allows directories to have shared subdirectories and files.
 Same file or directory may be in two different directories.
 A graph with no cycles is a generalization of the tree structure subdirectories scheme.
 Shared files and subdirectories can be implemented by using links. A link is a pointer to
another file or a subdirectory. A link is implemented as absolute or relative path.
 An acyclic graph directory structure is more flexible then is a simple tree structure but
sometimes it is more complex.
Advantage of acyclic:
 Simplicity of the algorithms to traverse the graph.
 More flexible.
Disadvantage of acyclic:
 More complex than tree structure.
 Problem with acyclic method is ensuring that there are no cycles.
 Avoiding traversing shared sections of the acyclic graph twice for performance reasons.
 Files may have multiple absolute path names. Hence distinct file names may refer to the same
file. This is similar to aliasing problem. Hence we have to make sure the same file is not
referenced twice say we are traversing the whole file system to collect statistics.
 Another problem is deletion of shared files.
o Solution1: remove the file whenever someone deletes it. But this will lead to dangling
o Solution2: using a link. Deletion of a link does not affect the original file. But if file entry
is deleted then the links will be dangling. We can keep track of the associated links also
and remove them but it is expensive.
o OR we can leave the links as it is and when referenced later determine that the file does
not exist and treat it as an illegal file name. In UNIX it is implemented in this manner. It
is up to the user to realize that the original file is gone.
 Solution3: deletion of the file is left pending till all references to it are deleted. Keep a list of
all references to the file. When the file reference list is empty the file can be deleted. Problem
here is the potential large size of the list. Hence we keep only the count of number of
references. When count is 0 the file is deleted. The UNIX system uses this approach for
nonsymbolic links called as hard links by keeping the reference count in the file information

Dept of ISE, Dr.AIT Page 115


General Graph Directory

 When a two level directory is taken and allow users to create directories under them a tree
structure results. It is simple. Just add directories at the leaves of the tree.
 When links are added to the tree structure the tree structure is destroyed and a graph is obtained.

4. File System Mounting

A file system must be mounted before it can be accessed by processes. The directory structure can be
partitioned which can be mounted in a file system space.
 The OS is given the name of the device and the location within the file structure at which to
attach the file system called mount point.
 Mount point is an empty directory at which the mounted file system can be attached. If the
directory has files mounting at that point not allowed.The same file system may be mounted at
different mount points or may allowed at one point depending on the type of the system.
 OS verifies that the device has valid file system. If valid the OS notes in directory structure that
the file system is mounted at the specified mount point. This allows the OS to traverse its
directory structure switching among file systems as appropriate.
Existing file system Unmounted partition

users sue jane

prog doc


Dept of ISE, Dr.AIT Page 116


Mount point


sue jane

prog doc

Ex. In Macintosh whenever the system encounters a disk for the first time the file system on the
device is searched for. If there is one the OS automatically mounts the file system to the root and
adds a folder icon on the screen labelled with the same name as the file system. The users will be
able to click on the icon and display the newly mounted file.

In Microsoft Windows the OS maintains an extended two level directory structure with devices and
partitions assigned a drive letter, then is the path and then the file name. Like drive-
letter:\path\to\file. The OS automatically discovers all devices and mount all located file systems at
boot time.

In UNIX the mount commands are explicit. A system configuration file contains a list of devices and
mount points for automatic mounting at boot time.

5. File Sharing
Need for file sharing is to help users to collaborate and reduce the effort required to achieve the goal.
It has several inherent difficulties but may be required at many instances.

Multiple users
 When multiple users are sharing file: then file sharing process, file naming and file protection are
important issues.
 When multiple user sharing is supported by the system then system must mediate file sharing.
Accessing other users files may be implemented as default or may require grant access to files.
For sharing more file attributes are required.
 To incorporate sharing and protection the concept of owner and group is used. The owner can
change file attributes (perform all operations on files) whereas a group defines a subset of users
who can share access to files (perform a subset of operations). The owner and group id are stored
with the file attributes.
 When a user requests an operation on a file the userid is compared with the owner attribute to
determine if the requesting user is the owner of the file. Then they can also be compared to group
ids. This comparison will indicate the operations that can be performed.

Remote file systems:

Through the evolution of network and file technology, remote file-sharing methods have changed.

Dept of ISE, Dr.AIT Page 117


1. Manually transfer files between machines via programs like ftp. In ftp we can use anonymous and
authenticated access. In anonymous access transfer of files can happen even without having an
account on the remote machine.
2. Use DFS (distributed file system) where remote directories are visible from a local machine.
3. Use of www: a browser is needed to gain access to the remote files and then separate operations
are used to transfer files.

The client-server model:

 The machine containing the files is the server and the machine seeking access to the files is the
client. The servers declare the resources that are available in them.
 A server can serve multiple clients and clients can use multiple servers.
 The servers specify the files at the volume or directory level.
 The clients are specified by their IP address. Problem with this spoofing or imitating. This results
in unauthorized access.
 More security can be incorporated through encryption algorithms.
 Once the remote file is mounted DFS protocol is used to send requests across the network.

DIS: Distributed Information Systems: also called Distributed Naming Services.

 DIS provides uniform access to information needed for remote computing.
 DNS: Domain name system provides host-to-network address translation for the entire internet.
 Ex. UNIX uses several DIS methods.
 Ex. Sun Microsystems uses yellow pages also called as NIS: Network Information Service.
Today most of the industry has adopted this.
 It centralizes the storage of user names, host names etc. but its authentication mechanism is not
 Ex. In Microsoft CIFS: Common internet file system is used in conjunction with user
authentication to form a network login that the server uses to decide whether to allow or deny
 Today the trend is to use LDAP: light-weight directory access protocol.

Consistency Semantics:
 It is an important criterion for evaluating any file system that supports file sharing.
 The semantics specify how the multiple users of the system are to access shared files.
 The semantics are implemented as code in the file system.
 They specify when the modifications of data by one user will be observable by the other users.

UNIX Semantics:
 Writes to an open file by a user are visible immediately to other users that have this file open.
 Advancing the current location pointer by one affects all sharing users.
 Here the file has single image.

Session Semantics:
The AFS (Andrew file system) uses this semantics.
 Writes to an open file are not immediately visible to other users who have the same file opened.

Dept of ISE, Dr.AIT Page 118


 Once the file is closed the changes made to it are visible only in sessions starting later. Already
opened instances of the file do not reflect the changes.

Immutable Shared file semantics:

 An immutable file has two key properties: its name may not be reused and its contents may not
be altered. Once the file has be declared as shared by the creator it cannot be modified. Used in
distributed systems.

6. Protection
Protection can be provided in many ways.

Types of Access
 Access is permitted or denied based on the factors like
 Read. Read from the file.
 Write. Write or rewrite the file.
 Execute. Load the file into memory and execute it.
 Append. Write new information at the end of the file.
 Delete. Delete the file and tree its space for possible reuse.
 List. List the name and attributes of the file.

Access control can be done through Access-controlled list which specifies the user’s name and the
type of access. Disadvantage is the length.
 To reduce the length of the list many systems use three classifications on users like
 Other protection mechanism is to associate a password.

7. Implementing File Systems: File System Structure

A file system is needed to store, locate and access data easily the OS provides a file system.
The two design problems concerning file systems
 Defining a file and its attributes, operations allowed on a file, and the required directory
structure for organizing the files.
 Creating algorithms and data structures to map the logical file system into the physical
secondary storage devices.
Different levels in a file system
Application programs

Logical file system

File-organization module

Basic file system

I/O control

Dept of ISE, Dr.AIT Page 119

 The file system consists of different levels as shown above.

 This is a layered structure.
 Each level uses the features of the lower level and creates new features for use by the higher
 At the lowest level the I/O control consists of device drivers and interrupt handlers to transfer
information between main memory and the disk system. A device driver is like a translator. It
inputs high-level commands and outputs low level hardware specific instructions that are
used by the hardware controller which interfaces the I/O device to the rest of the system.
 The basic file system issues generic commands to the appropriate device drivers to read and
write physical blocks to the disk.
 The file organization module knows about the files and their logical blocks and their physical
blocks. They translate logical block addresses to physical block address for the basic file
system to transfer. The logical blocks are numbered 0 through N. The physical blocks do not
match the logical block numbers. Hence translation is needed to locate each block. The file
organization module also includes free space manager which tracks unallocated blocks.
 The logical file system manage the metadata which include all of the file system structures. It
also manages the directory structure for providing file organization. it is also responsible for
protection and security. It maintains the file structure via the file-control block (FCB). FCB
contains information about the file including ownership, permissions and location.

8. File System Implementation

 File system store several important data structures on the disk.
 Boot information: the boot control block is used which contains the information needed by
the system to boot an OS from that volume. It is usually located in the first block of a
volume. In UFS it is called as boot block and in NTFS it is called partition boot sector.
 Volume control block: contains volume or partition details such as number of blocks in a
partition, size of blocks, free block count and free block pointers, free FCB count and FCB
pointers. In UFS it is called as a superblock and in NTFS it is called master file table.

In-memory structures:
 In-memory mount table: contains the information about the each mounted volume.
 In-memory directory structure cache: holds the directory information of recently accessed
 The system wide open file table: has a copy of FCB of each open file.
 The per process open file table: contains pointer to the appropriate entry in the system wide open
file table.

When a new file is to be created the logical file system is called. It allocates a new FCB. Then the
appropriate directory is read in to memory, updated with a new name and then written back to the

Dept of ISE, Dr.AIT Page 120


File permissions
File dates (create access, write)
File owner, group, ACL
File size
File data blocks or pointers to file data blocks

Partitions and Mounting

The layout varies depending on the OS. A disk can be sliced into multiple partitions. Each
partition can be either ‘raw’ meaning it does not contain a file system or can be ‘cooked’ meaning it
contains a file system. Each partition can contain a different types of file system or a different
operating system.
Boot information is stored in a separate partition. It has its own format because at boot time
the system does not have its own file system and hence cannot interpret the file system format. The
boot information is a sequential series of blocks loaded as an image into memory. Execution of the
image starts at a predefined location. If there are multiple file systems and multiple operating
systems the boot loader boots one of the OS on the disk. The root partition can also contain the OS

Virtual file system

The file system consists of three layers:

1. File system interface: this is based on open( ), close( ), read( ), write( ) calls.
2. VFS (Virtual File System) layer: it has two functions:
 It separates file system generic operations from their implementation by defining a clean VFS
interface. Several implementations for the VFS interface may coexist on the same machine
allowing transparent access to different types of file systems mounted locally.
 It provides a mechanism for uniquely representing a file throughout a network. The VFS is based
on a file structure called a vnode that contains the numerical designator for a network wide
unique file.
3. Local file systems: can be of different types either located on the disk or remotely across a

Hence the VFS distinguishes local files from remote ones and local files are further distinguished
according to their file system types.

Dept of ISE, Dr.AIT Page 121


The VFS activates file system specific operations to handle local requests according to the file types.
A virtual file system (VFS) or virtual file system switch is an abstraction layer on top of a more
concrete file system. The purpose of a VFS is to allow client applications to access different types of
concrete file systems in a uniform way. A VFS can, for example, be used to access local and
network storage devices transparently without the client application noticing the difference.

VFS architecture in Linux:

There are four main object types defined:
1. The inode object: it represents the individual file.
2. The file object: which represents an open file.
3. The superblock object: which represents the entire file system.
4. The dentry object: which represents the individual directory entry.
For each of the objects the VFS defines a set of operations that must be implemented. Every object
has a pointer to a function table. The function table lists the addresses of the actual functions that
implement the defined operations for that particular object. An implementation of the file object for
a specific file type is required to implement each function specified in the definition of the file

9. Directory Implementation
Linear List:
 The directory is implemented using a linear list of file names.
 To create a new file we search the directory to be sure that there is no name collision. Then the
new entry is added at the end of the directory.
 To delete a file the directory is searched for the file name then the space allocated is released.
 To reuse the directory entry the following can be done:
o Mark the entry as unused.
o Attach it to the list of free directory entries.
o Copy the last entry of the directory into the freed location.
 Advantage: Simplest method.
 Disadvantage: Linear search used which can be time consuming. It may have noticeable slow
access time.
 Solution: use binary search on a sorted list or use a cache.

Hash Table:
 This is another data structure used for implementing directories.
 The hash table takes the value computed from the file name and returns a pointer to the file name
in the linear list.
 This greatly decreases the search time.
 Care must be taken to avoid collisions: where two file names hash to the same location.
 Another problem is the size of the hash table and corresponding function that is to be used.
 A chained overflow hash table is used. Each hash entry can be a linked list and collisions are
avoided by adding the new entry in the linked list.

Dept of ISE, Dr.AIT Page 122


10. Allocation Methods

To allocate space to files so that the disk space is utilized effectively and quickly. There are
three methods:
Contiguous Allocation
A single set of blocks is allocated to a file at the time of file creation. This is a pre-allocation
strategy that uses portion of variable size. The file allocation table needs just a single entry for each
file, showing the starting block and the length of the file .

If the file is n blocks long and starts at location b, then it occupies blocks b, b+1,
 The file allocation table entry for each file indicates the address of starting block and the
length of the area allocated for this file. Contiguous allocation is the best from the point of
view of individual sequential file.
 It is easy to retrieve a single block. Multiple blocks can be brought in one at a time to
improve I/O performance for sequential processing. Sequential and direct access can be
supported by contiguous allocation.
 Contiguous allocation algorithm suffers from external fragmentation.
 Another problem with contiguous allocation is determining how much space is needed for a
file. When the file is created, the total amount of space it will need must be found and
 Supports variable size portion.
 Pre-allocation is required.
 Requires only single entry for a file.
 Allocation frequency is only once.
 Supports variable size problem.
 Easy to retrieve single block.
 Accessing a file is easy.
 It provides good performance.
 Pre-allocation is required.
 It suffers from external fragmentation.

Dept of ISE, Dr.AIT Page 123


Linked Allocation
It solves the problem of external fragmentation. This allocation is on the basis of an
individual block. Each block contains a pointer to the next block in the chain. The disk block can be
scattered anywhere on the disk. For example, a file of five blocks might start at block 9 and continue
at block 16, then block 1, then block 10, and finally block 25 Each block contains a pointer to the
next block. These pointers are not made available to the user. Thus, if each block is 512 bytes in
size, and a disk address (the pointer) requires 4 bytes, then the user sees blocks of 508 bytes. The
directory contains a pointer to the first and the last blocks of the file.

There is no external fragmentation since only one block is needed at a time. The size of a file need not be
declared when it is created. A file can continue to grow as long as free blocks are available.
 No external fragmentation.
 Compaction is never required.
 Pre-allocation is not required.
 Files are accessed sequentially.
 Space required for pointers.
 Reliability is not good.
 Cannot support direct access.

Indexed Allocation
The file allocation table contains a separate one level index for each file. The index has one entry
for each portion allocated to the file. The i th entry in the index block points to the i th block of the file.

Dept of ISE, Dr.AIT Page 124


The indexes are not stored as a part of file allocation table rather than the index is kept as a separate
block and the entry in the file allocation table points to that block. Allocation can be made on either
fixed size blocks or variable size blocks. When the file is created all pointers in the index block are
set to nil. When an entry is made a block is obtained from free space manager. Allocation by fixed
size blocks eliminates external fragmentation where as allocation by variable size blocks improves
locality. Indexed allocation supports both direct access and sequential access to the file.
 Supports both sequential and direct access.
 No external fragmentation. Faster then other two methods.
 Supports fixed size and variable sized blocks.
 Suffers from wasted space.
 Pointer overhead is generally greater

11. Free Space Management

Since the disk space is limited we need to reuse the space from deleted files. To keep track of free
disk space the system maintains a free space list. To create a file we search the list for the required
amount of memory and allocate the space to the new file. The free space list can be implemented in
different ways:
Bit Vector:
 The free space list is implemented as a bit map or a bit vector.
 Each block is represented by one bit. If the block is free the bit is 1 else 0.
 The advantage is the relative simplicity.
 Ex. if the disk blocks 2,3,4,5,8,9,10,11,12,13,17,18,25,26 and 27 are free then the bit vector is as
follows: 0011110011111100011000000111….
 Unless the entire bit vector is kept in the memory the approach is inefficient.
Linked List:
 The free disk blocks are linked together keeping the pointer to the first free block in a special
location on the disk.
 The first block contains a pointer to the next free block and so on.
 It is not an efficient mechanism since while traversing we have to read every block which
requires a lot of time.
 This is a modification of the free list approach.
 It stores the addresses of the n free blocks in the first free block. The remaining n-1 are free
blocks. The last block again holds the address of the next n free blocks.
 The advantage is that the addresses of a large number of free blocks can be found quickly.
 This is based on the fact that several continuous blocks may be allocated or freed simultaneously
especially when continuous allocation or clustering is used.
 Instead of keeping a list of n free addresses only the address of the first free block and the
number of n continuous free blocks that follow it are stored.
 Hence each entry is a disk address and a count.

Dept of ISE, Dr.AIT Page 125


 Hence the overall size of the list reduces.

12. Efficiency and Performance

Efficiency dependent on:
 disk allocation and directory algorithms
 types of data kept in file’s directory entry
 disk cache – separate section of main memory for frequently used blocks
 free-behind and read-ahead – techniques to optimize sequential access
 improve PC performance by dedicating section of memory as virtual disk, or RAM
13. Recovery
 Consistency checking – compares data in directory structure with data blocks on disk, and
tries to fix inconsistencies
 Use system programs to back up data from disk to another storage device (floppy disk,
magnetic tape, other magnetic disk, optical)
 Recover lost file or disk by restoring data from backup.

Dept of ISE, Dr.AIT Page 126


1. Mass storage structures
2. Disk structure
3. Disk Attachment
4. Disk Scheduling Methods
5. Disk management
6. Swap-Space Management.
7. Protection: Goals of protection
8. Principles of protection
9. Domain of protection
10. Access matrix
11. Implementation of access matrix
12. Access control
13. Revocation of access rights
14. Capability-Based systems.

Dept of ISE, Dr.AIT Page 127


Secondary-Storage Structure
The lowest level of the file structures is the secondary and the tertiary storage.

1. Mass storage structures

Magnetic disk

 Disk is the bulk of storage in today’s machine. Conceptually they are simple.
 Each disk platter has a flat circular shape. Platter diameters range from 1.85inches to 5.25inches.
 The two surfaces of the platter are covered with magnetic material. The read-write head flies
over each surface of the platter. The heads are attached to the disk arm that moves all the heads
as a unit.
 The surface of the platter is logically divided into circular tracks. Tracks are divided into sectors.
There are hundreds of sectors in each track. Set of tracks makes up the cylinder. There are many
concentric cylinders in a disk drive.
 When the disk is in use the drive motor spins at high speed. Most drives rotate at 60 to 200 times
per second.
 Disk speed has two parts:
 Transfer rate: is the rate at which the data flow between the drive and the computer;
typical rates are several mega bytes per second.
 Positioning time: also called as random access time or seek time: is the time to move the
disk arm to the desired cylinder and can be several milliseconds.
 The time for the desired sector to rotate to the disk head is called rotational latency.
 Since the head flies on an extremely thin cushion of air (microns) there is a danger that the head
will make contact with disk surface there by damaging the magnetic surface. This accident is
called a head crash.
 Disks can be Removable: allows different disks to be mounted as needed. They generally consist
of one platter held in a plastic case to prevent damage.
 Floppy Disks: inexpensive removable magnetic disk, which have a soft plastic case containing a
flexible platter. The head generally sits on the disk surface. So they rotate more slowly than hard
disk. Storage capacity is 1.44MB or more.
 It is attached to the computer by a set of wires called the I/O bus. Different types of buses are:
EIDE: Enhanced Integrated Drive Electronics
ATA: Advanced Technology Attachment
SATA: serial ATA
USB: Universal Serial Bus
FC: Fiber Channel and
SCSI: Small Computer System Interface.
Dept of ISE, Dr.AIT Page 128

 The data transfers on a bus is carried out by special electronic processors called controllers. The
host controller is the controller at the computer end and the disk controller is built into each disk
drive. To perform a disk I/O the computer places a command into the host controller using
memory-mapped I/O ports. The host controller then sends the command via messages to the disk
controller and the disk controller operates the disk drive to carry out the command.
Magnetic Tapes:
 Tapes act mainly as backup for infrequently used information. For storing large quantities of data
that is too big for the disk.
 As a medium for transferring information from one medium to another. But they are not used for
normal working of the machine. Since they are very slow.
 It is kept in a spool and is wound or rewound past a read-write head. Moving to the correct spot
on the tape make take several minutes.
 Tape capacities are generally 20GB to 200GB. Tapes and their drivers are categorized by width.
Tapes are named according to technology.

2. Disk structure
 Disks are addressed as a large one-dimensional array of logical blocks, which is the smallest unit
of transfer. Each logical block size is usually 512 bytes although some disks are low level
formatted to choose a different logical block size of 1024bytes.
 The logical blocks are mapped to sectors sequentially. Sector 0 is the first sector of the first track
on the outermost cylinder.
 The mapping proceeds from that entire track to the rest of the tracks on that cylinder and then to
the other cylinder from the outermost to the innermost.
 The farther away the tracks from the centre the more number of sectors it can hold. The number
of sectors per track decreases as we move inwards.
 There are 40% more sectors in the outermost tracks than the inner most one.
 The storage media which use CLV (Constant linear velocity) the density of bits per track is
 The drives increases the rotation speed as the head moves from outer to inner tracks to keep the
same rate of data to be moving under the head. Ex. in CD-ROM and DVD ROM drives.
 In some drives the rotation speed is constant but the bit density decreases from inner track to
outer to keep the data rate constant. This method is used in hard disks known as CAV (Constant
Angular Velocity).

3. Disk Attachment
Host-Attached Storage:
 Common to all small systems.
 Storage access is via I/O ports.
 Ports have several technologies:
o Desktop PC uses I/O bus architecture called IDE or ATA. This architecture supports only
two drives per bus.
o Newer technology used is SATA.
o High-end workstations and servers use sophisticated architecture like SCSI and FC.
 Is a bus architecture.

Dept of ISE, Dr.AIT Page 129


 Its physical medium is a ribbon cable having many (50 to 68) conductors.
 This protocol supports max of 16 devices on the bus. I.e. 1 controller card called
SCSI initiator and 15 storage devices called SCSI targets.
 The SCSI targets are normally disks.
 It can address up to 8 logical units like components of a RAID (Redundant Array of Inexpensive
Disks) array.
 It is a high-speed serial architecture.
 It can operate over fiber of four conductor copper cable.
 They are of two types:
 1. A large switched fabric having 24-bit address space. This is the basis of SANs (Storage Area
Networks). They support large address space and use switch nature of communication. They
have greater flexibility.
 2. Arbitrated loop is another FC that can address 126 devices.

2. Network-Attached-Storage (NAS):

 It is a special purpose storage system that is accessed remotely over a network.

 Access is done through remote procedure call (RPC) interface like NFS (Network File System)
in UNIX and CIFS (Common Internet File System) in Windows.
 RPC is carried via TCP (Transmission Control Protocol) or UDP (User Datagram Protocol) over
IP network.
 The NAS is usually implemented as RAID array with software to implement RPC.
 It is a convenient way for all computers to share a pool of storage in a LAN.
 They are less efficient and tend to have lower performance.
 ISCSI is the latest NAS protocol. It uses IP to carry SCSI protocol.
 The storage I/O operations consume bandwidth on the data network and hence increase the
latency of the communication.
 This problem becomes acute in client-server set up.
 This is because there is competition between communication between (server and clients) and
(server and storage devices).

Storage-Area Network:

Dept of ISE, Dr.AIT Page 130


 Attempts to get over the problem of NAS.

 SAN is a private network using storage protocol rather than network protocols. It connects
servers and storage units.
 Multiple hosts and multiple storage arrays can attach to the same SAN and storage can be
dynamically allocated to hosts. Hence SANS is highly flexible.
 It also allows a cluster of servers to share the same storage. It also allows storage arrays to
multiple direct host connections.
 A SAN switch allows or prohibits access between hosts and storage.

4. Disk Scheduling Methods

 The seek time is the time for the disk arm to move the heads to the cylinder containing the
desired sector.
 The rotational latency is the additional time encountered in waiting for the disk to rotate the
desired sector to the disk head.
 The disk bandwidth is the ratio of total number of bytes transferred to the total time between the
first request for service and the completion of the last transfer.
1. FCFS Scheduling:
 Simplest.
 Is a fair algorithm.
 Generally does not provide fast service.
 Could have wild swings. This causes substantial head movement.

Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183, 37, 122, 14,
124, 65, 67. Let the head start at 53. What is the total head movement? Apply FCFS.

0 14 37 53 65 67 98 122 124 183 199

The total head movement is 640 cylinders.

Dept of ISE, Dr.AIT Page 131

2. SSTF Scheduling:
 Shortest Seek Time First scheduling.
 Service all requests close to the current head position before moving the head far away. I.e. the
minimal seek time form the current head position.
 Increase in performance.
 Like SJF scheduling.
 Not an optimal algorithm.
Problem: Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183,
37, 122, 14, 124, 65, 67. Let the head start at 53. What is the total head movement? Apply SSTF.

0 14 37 53 65 67 98 122 124 183 199

The total head movement is 236 cylinders.

Note: say the current service is at 14 and a request for 189 is pending. If immediately a service for
17 is got from 14 the head moves to 17 keeping 189 waiting. If many more requests near to 14 arrive
they will all be serviced before 189. This keeps 189 waiting for a long time.

3. SCAN Scheduling:
 The disk arm starts at one end of the disk and moves towards the other end servicing requests on
the way till it reaches the other end.
 At the other end the direction of the head is reversed servicing requests as it moves.
 The head SCANS back and forth and hence the name.
 Also called as an elevator algorithm. The request coming ahead of the head will get service and
the requests coming behind the head will have to wait.
Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183, 37, 122, 14,
124, 65, 67. Let the head start at 53. what is the total head movement? Apply SCAN. Say the head is
moving towards 0.

0 14 37 53 65 67 98 122 124 183 199

Total head movement is 236

Dept of ISE, Dr.AIT Page 132


4. C-SCAN Scheduling:
 Circular SCAN.
 Provides more uniform wait time.
 When the head reaches one end it immediately returns to the beginning without servicing any
requests in this journey.
 It treats the cylinders as a circular list that wrap around.

Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183, 37, 122, 14,
124, 65, 67. Let the head start at 53. what is the total head movement? Apply C-SCAN. Say the head
is moving towards 0.

0 14 37 53 65 67 98 122 124 183 199

Total movement is 382

5. LOOK Scheduling:
 Same as SCAN but allow the arm to go only to the farthest request.
 In C-LOOK same as C-SCAN but only till farthest request.

0 14 37 53 65 67 98 122 124 183 199

0 14 37 53 65 67 98 122 124 183 199

Dept of ISE, Dr.AIT Page 133


Assignment Problem:

 Work Queue: 23, 89, 132, 42, 55, 13, 75, 144, 189, 12, 187
 There are 200 cylinders numbered from 0 - 199
 the disk head stars at number 100.
 The disk head moves towards 0.

Calculate the total head movement using FCFS, SSTF, SCAN, C_SCAN, LOOK, C-LOOK.

Refer class notes for problems

Selection of disk scheduling algorithms:

 How to select from so many algorithms?
 The performance depends on the number and types of requests.
 SSTF has natural appeal because it increases performance over FCFS.
 SCAN and C-SCAN perform well for systems that have heavy load on the disk.
 The file allocation method has an influence on the disk service requests.
 A continuously allocated file will generate several requests that are close to each other.
 An indexed file has blocks scattered far apart hence more head movement.
 The location of the directories also has a role. If the directory entry is in the first cylinder and the
files data is in the last cylinder the head has to move the entire disk length.
 All the algorithms discussed here speak only of seek time.
 Rotational latency can also be a high number. It is difficult for the OS to schedule for improved
rotational latency because modern disks do not disclose the physical location of the logical

5. Disk management
Low Level formatting:
 Before the disk can store data it must be divided into sectors that the disk controller can read and
write. This process is called low level formatting or physical formatting.This formatting fills
special data structure for each sector.
 The data structure consists of a header, a data area and a trailer. The header and the trailer has
information like sector number and ECC (error correcting code) used by the disk controller.
 When the disk controller writes to sector/data area the ECC is updated with the value calculated
from all the bytes in the data area.
 When a sector is read the ECC is recalculated and compared with the stored value. If the
numbers mismatch that means the sector is corrupted and the sector is bad.
 Most disks are formatted in the factory to check for any bad sectors.
 When the disk controller is instructed to low level format the disk it is told what should be the
data area between the header and the trailer.
 Sizes are 256, 512, 1024bytes. Larger the sector size fewer will be number sectors per track,
lesser is the number of headers and trailers and more space for the data area.

To hold files on the disk the OS puts its own data structures on the disk.
The OS does two things:
Dept of ISE, Dr.AIT Page 134

1. Partition:
 Partition the disk into one or more groups of cylinders each acting like a separate disk.
2. Logical Formatting:
 The OS stores the initial file system data structures on the disk which represent maps of free and
allocated space.

To increase efficiency most file systems group blocks together into larger chunks called clusters.
Boot Block
bootstrap program
 When the computer starts running when powered or when it is restarted/rebooted it needs an
initial program to run. This initial program is called bootstrap.
 It initializes all aspects of the system including CPU registers, device controllers & contents of
main memory & then starts OS.
 It finds the OS kernel on the disk loads it into memory and jumps to an initial address to begin
the OS execution.
 The location where the full bootstrap program is stored in a partition is called boot block and is
at a fixed location on the disk.
 The memory used for the bootstrap program is ROM.
 The disk is divided into partitions (in Windows 2000) and the partition that contains OS and the
device drivers is called as boot partition.
 The disk that has a boot partition is called a boot disk or system disk.
 Master Boot Record MBR-The Windows 2000 places the boot code in the first sector of the hard
disk called MBR.
Bad Blocks
 Since disks are moving parts and have small tolerances they are prone to failure.
 Either the whole disk may have to be replaced or only sectors get corrupted.
 These corrupted sectors are called bad blocks.
 Generally data is lost in bad blocks.
How are bad blocks handled?
 On some disks the bad blocks are manually handled.
 Ex. MS DOS format command checks for bad blocks. If it finds one it writes special value into
the corresponding FAT (File Allocation Table) entry.

Sector sparing or forwarding

 In SCSI disks used in PCs and workstations the controller maintains the list of bad blocks. The
list is initialized in the factory and updated during the lifetime of the disk. The controller can be
told to replace each bad sector logically with one of the sparse vectors. This scheme is called
sector sparing.
 A typical bad-sector transaction might be as follows
 OS tries to read a logical block.
 The device controller calculates ECC and finds the sector bad.
 Reports it to the OS.
 The next time the system is rebooted a special command is run to tell the SCSI (Small
computer system interface) controller to replace the bad sector with sparse.
Dept of ISE, Dr.AIT Page 135

 Later whenever the bad sector is accessed the request is translated into replacement sectors
address by the controller. (This is redirection by the OS it actually invalidates optimisation
by the disk scheduling algorithm).
 Alternate to sector sparing some controllers replace the bad sector by sector slipping.

6. Swap-Space Management.
Swap space management is a low level OS task. It provides an extension to physical memory
in the form of VM on the disk. Tries to provide better throughput to undo the decrease in
performance because of swapping.
Swap space is an area on disk that temporarily holds a process memory image. Used based
on the type of memory management algorithm used. Those that use swapping use the swap space for
holding the process image. Store pages pushed out of memory. Amount of swap space varies based
on the amount of physical memory. It can vary from few MB to few GB. In UNIX multiple swap
spaces are allowed. Swap spaces are put on a separate disk.
The swap space can reside at two places;
 It can be taken out of the file system or put in a separate disk space/partition.
 If in file system then normal file system routines for creation, naming and allocation of space
can be used.
It is easy approach but inefficient. Time to search directories takes time. When the swap space is on
a separate partition a separate swap manager is used to allocate and deallocate blocks. A fixed
amount swap space is allocated in each system. This manager uses algorithms optimized for speed
rather than storage efficiency. Even internal fragmentation may exist but this can be ignored as the
data is temporary in the swap space.

Swap-Space Management: Example

Consider the swapping and paging in UNIX 4.3 BSD:
 The swap space is allocated when the process is started.
 Enough space is allocated for the text section and the data section of the process.
 Pre-allocation prevents swap space being not there when required.
 Pages are brought from the file system and swapped out to swap space when not required and
brought back when required. Hence the file system is consulted only once.
 Processes with identical text pages are shared both in physical memory and swap space.
 Two swap maps per process are used by kernel to track swap space use: for text and one for data.
 Text sections are given 512KB chunks except the last page which holds the remainder.
 The data section swap map is different. The data segments can grow over time.
 Hence the map is of fixed size but contains addresses of blocks of varying size.
 When the data segment grows beyond the block size the OS allocates another block.

In SOLARIS the swap space is allocated only when the page is forced out of the physical memory.
This gives better performance.

7. Protection: Goals of protection

Dept of ISE, Dr.AIT Page 136


Processes in the system must be protected from one another’s activities. Else there may be
disruption in the normal working. Protection refers to a mechanism for controlling the access of
programs, processes or users to resources defined by the computer system. It includes specification
of controls and means of enforcement. Security is a measure of confidence that the integrity of a
system and the data will be preserved.

Goals of protection
 Provides a means to distinguish between an authorized and unauthorized usage.
 To prevent mischievous, intentional violation of an access restriction by the user.
 To ensure that each program component which is active in a system uses system resources only
in ways consistent with stated policies. (This gives a reliable system).
 To detect latent errors at the interfaces between the component subsystems. (This can improve
reliability). Early detection helps in preventing malfunctioning of subsystems.
 To enforce policies governing resource usage.

A mechanism and a policy

 A mechanism tells how something should be done whereas a policy says what is to be done.
 Policies decide what is to be done during resource usage. The policies can be fixed in design of
the system or can be formulated by the management of the system. Also the protection system
must provide flexibility to enforce a variety of policies. Policies can change from time to time
and from place to place.

8. Principles of protection
 The time tested guiding principle used for protection is called principle of least privilege. It states
that programs, users and even systems be given just enough privileges to perform their tasks.
 An OS following this principle implements its features, programs, system calls and data
structures so that failure or compromise of a component does the minimum damage and allows
minimum damage to be done. Such OS have fine grained access control.
 It provides mechanisms to enable privileges when they are needed and to disable them when not
 Privileged function access have audit trails that enable programmer or systems administrator or
law-enforcement officer to trace all protection and security activities of the system.
 We can create separate accounts for each user with just the privileges that the user needs.

The computer system made up of processes and objects.

Objects can be:
Hardware Objects: CPU, memory segments, printers, disks etc.
Software Objects: files, programs, semaphores etc.
The characteristics of these objects?/domain protection.
 Objects are abstract data types.
 Have a unique name and can each be accessed only through well-defined and meaningful
operations. Type of operation depends on the type of object.

Dept of ISE, Dr.AIT Page 137


At any given time a process should be able to access only those resources that it currently requires.
This is called need to know principle.

9. Domain of protection
Protection domain is the domain/set that specifies the resources that the process may access.
The domain defines the set of objects and the type of operation that may be invoked on each object.
The ability to execute an operation on an object is called access rights. A domain is a collection of
access rights each of which is ordered pair <object-name, rights set>.
Ex. if domain D has access right <file F, {read, write}> then a process executing in domain D can
both read and write file F and no other operation.

Some features of domain:

1. Domains can be disjoint or need not be disjoint.
 In disjoint domains access rights are shared.
 The association of processes and domain may be static or dynamic.
D1 D2 D3

<O3, {read, write}>

<O1, {read, write}> <O1, {execute}>
<O2, {read, write}> <O2, {write}> <O4, {print}>
<O3, {read,}>

 There are three domains D1, D2, D3.

 The access right <O4, {print}> is shared by D2 and D3. i.e. processes in either domains can
execute print object O4.
 To read and write O1 the process must be in D1.
 Only processes in D3 can execute O1.
2. Domains may be static or dynamic:
 Static:
 Set of resources available to a process is fixed throughout the latter’s lifetime.
 Here the need-to-know principle is used.
 If a process executes in two different phases say like read and write then read and write
access must be put in the domain. This means that even though both are needed in both
phases more rights are given than needed. This violates the need-to-know principle.
 Hence the domain has to be modified so that it always reflects the minimum necessary rights.
 Sticks to the same domain.
 Dynamic: set of resources available to a process is variable in the latter’s lifetime.
 Here a mechanism is used to allow process to switch from domain to another.
 We also allow the contents of the domain to change.
Domains can be implemented in the following ways:
1. Each user may be a domain. Here the set of objects that can be accessed depends on the identity of
the user. Domain switching occurs when the user is changed generally when one user logs out and
another logs in.

Dept of ISE, Dr.AIT Page 138


2. Each process may be a domain. Here the set of objects that can be accessed depends on the
identity of the process. Domain switching occurs when one process sends a message to another
process and is waiting for response.
3. Each procedure may be a domain. Here the set of objects that can be accessed corresponds to the
local variables defined within the procedure. Domain switching occurs when a procedure call is

The dual-mode of operation of the OS

 The dual mode consists of the monitor mode and the user mode.
 In monitor mode the OS can execute privileged instructions to gain complete control over the
computer system.
 In the user mode the processes can only invoke non-privileged instructions.
 These modes protect the OS from the user processes.

In a multiprogrammed environment the processes have to be protected from each other.

This can be illustrated in the following two examples:
1. UNIX:
The domain is associated with the user and switching domains means simply changing user
1. Domain change is accomplished through the file system as follows:
 Each file is associated with setuid bit. When the setuid bit is on the user id is set to that of the
owner of the file.
 When a network access is required then the setuid bit is set on a networking program causing
the userid to change and provide access. Here the userid changes to the user with network
access privilege. It can become the userid root the most powerful one. Then the user can do
anything on the system

2. To prevent this in some systems the privileged programs are put in a special directory.
 The OS is designed to change the user-id of any program run from this directory.
 This will eliminate any secret setuid.
 This method is more flexible.
3. In some systems it is more restrictive and protective:
 they do not allow any change of user-id.
 Special techniques are used to provide access to privileged facilities.
 A daemon process is started at boot time and is run as a special user-id.
 Users run a special program which sends a special request to the daemon whenever they need
the facility.


Dept of ISE, Dr.AIT Page 139


 The protection domains are organized hierarchically in a ring structure numbered from 0-7.
 Each ring corresponds to a single domain.
 Process executing in D0 has most privileges. If j< i then the processes executing in Di have more
privileges than the process in Dj.
 MULTICS has segmented address space and each segment is a file.
 Each segment is associated with one of the rings.
 The segment description includes ring number, three access bits for read/write/execute.
 Each process is associated with a current ring number counter identifying the ring in which it is
currently executing.
 A process in segment associated with ring i can only access a segment in ring k if k >=i.
 Domain switching in MULTICS happens when process crosses from one ring to the other.
 This switching is done in a controlled manner.
 To allow controlled switching the segment descriptor’s ring field is modified to include the
o Access bracket: a pair of integers b1, b2 such that b1<=b2.
o Limit: an integer b3 such that b3<b2.
o List of gates: identifies the entry points at which the segments may be called.
 If a process is executing in ring i and calls a procedure with access bracket (b1,b2) then the call
is allowed if b1<=i<=b2. But the ring number will remain i. Else trap issued and handled
 If i> b2 then the call is allowed to occur only if b3 >=i. The call is directed to one of the
designated entry points in the list of gates. This limits access rights.
 The disadvantage of ring is that it does not allow need to know principle.
 Protection scheme is more complex in MULTICS but less efficient. The performance is also

10. Access matrix

The model of protection viewed abstractly as matrix is called access matrix. Rows represent
domains, columns represent objects. Each entry in the matrix is access rights.
 The entry access(i, j) defines the set of operations that a process executing domain Di can invoke
on object Oj.
Access Matrix

Dept of ISE, Dr.AIT Page 140


Object F1 F2 F3 Printer
D1 read read
D2 print
D3 read execute
D4 read read
write write

 It allows to specify variety of policies.

 The access matrix can implement policy decision concerning protection.
 Policy can be which rights to include in the (i,j) th entry.
 It provides an appropriate mechanism for defining and implementing strict control for both static
and dynamic association between processes and domains.
 Domain switching can be done by including domains among objects of the access matrix.
 Operations can also be changed by changing the entry in the matrix.
 The matrix itself can be defined as an object for protection.
 Process switching from domain Di to Dj is allowed only if the access right switch belongs to
 Controlled changes to the access matrix requires three additional operations: copy, owner and
 The copy right allows the copying of the access right only within the column for which the right
is defined.
 The ability to copy an access right from one domain to another is denoted by * on the right side
of the access right.
 Ex. a process executing in D2 can copy read operation into any entry associated with F2.
Access Matrix

Object F1 F2 F3
D1 execute write*
D2 execute read*
D3 execute execute

Access Matrix with copy rights

Object F1 F2 F3
D1 execute write*
D2 execute read*
D3 execute read execute

There are two variants here:

Dept of ISE, Dr.AIT Page 141

 A right is copied from access(i,j) to access(k,j). It is then removed from access(i,j). this action is
called transfer of right rather than copy.
 Propagation of copy right may be limited called limited copy. I.e. when the right R* is copied
from access(i,j) to access(k,j) only the right R and not R* is created. A process executing in
domain Dk cannot further copy the right R.
 A system may have only one of these copy rights or may have all of them.
 Allows addition of new right and deletion of some rights.
 If access(I,j) includes owner rights then a process in Di can add or remove any right in any entry
in that column.
Note: both copy and owner allow changes only to the columns.
To allow entry to rows we use control.
 If access(I,j) includes control rights then a process in Di can remove any access right from row j.

The copy and owner limit the propagation of access rights. But it does not guarantee that information
does not migrate outside the execution environment. This is called confinement problem.

11. Implementation of access matrix

 Matrix is mostly sparse.
 Several implementation techniques are used:
1. Global Table:
 Simplest implementation.
 A global table consists of set of ordered triples <domain, object, rights-set>.
 When an operation M is executed on object Oj in domain Di then the global table is searched
for the triple <Di, Oj, Rk)> with M belonging to Rk.
 If triple found operation is allowed else raise error.
 Disadvantage of this method:
o is that the table can be very large and hence cannot be kept in the main memory.
o Cannot take advantage of special groupings of objects or domains.

2. Access Lists for Objects:

 Each column in the matrix can be implemented as access list for one object.
 Empty entries are discarded. Each entry here is a ordered pair< domain, rights-set>.
 When an operation M is executed on object Oj in domain Di then the access list is searched
for the entry <Di, Rk)> with M belonging to Rk.
 If entry found operation is allowed else raise error.

3. Capability lists for Domains:

 In this method rows are associated with its domain.
 A capability list is for a domain with the list of objects together with the operations allowed
on them.
 The object is represented as capability.

Dept of ISE, Dr.AIT Page 142


 Just possession of the capability means access rights.

 The capability is associated with a domain but never directly accessible to a process
executing in that domain. The capabilities are not allowed to migrate to the user space.
 The capability itself is a protected object.
 Capabilities are distinguished from other data in the following ways:
o 1. Each object has a tag to denote its type as either a capability or an accessible data.
o Tags are not directly accessible by any application.
o 2. The address space associated with a program can be split into two parts:
o One part accessible to the programs and contains the programs normal data and
o The other part contains the capability list is accessible only by OS.
4. A lock-key mechanism:
 Is a compromise between access lists and capability list.
 Each object has a unique set of bit pattern called lock.
 Each domain has a unique set of bit pattern called keys.
 A process executing in a domain can access an object only if that domain has a key that
matches with one of the lock of the object.
 The list of keys is managed by the OS on behalf of the domain.
 Users are allowed to examine or modify the list of keys or locks directly.

12. Access control

Role-Based Access Control (RBAC)
All the access controls specified so far are controls to be used on files. Each file and directory are
assigned an owner, a group or possibly a list of users and for each of these access control
information is assigned.
In Solaris the principle of least privilege is used via RBAC. It is based on privileges.
A privilege is the right to execute a system call or to use an option within a system call (like opening
a file with write access). Privileges can be assigned to processes thereby restricting them to exactly
the access that is required to perform their work. Privileges and programs can also be assigned to
roles. Users are assigned roles or take roles based on password to their roles. In this way a user can
take a role that enables a privilege allowing the user to run the program to accomplish a specific

Dept of ISE, Dr.AIT Page 143


13. Revocation of access rights

In a dynamic protection system, we may sometimes need to revoke access rights to objects shared by
different users. Various questions about revocation may arise:
 Immediate versus delayed. Does revocation occur immediately/ or is it delayed? If revocation
is delayed, can we find out when it will take place?
 Selective versus general. When an access right to an object is revoked, does it affect all the
users who have an access right to that object, or can we specify a select group of users whose
access rights should be revoked?
 Partial versus total. Can a subset of the rights associated with an object be revoked, or must
we revoke all access rights for this object?
 Temporary versus permanent. Can access be revoked permanently (that is, the revoked
access right will never again be available), or can access be revoked and later be obtained
With an access-list scheme, revocation is easy. The access list is searched for any access rights to be
revoked, and they are deleted from the list.
Since the capabilities are distributed throughout the system, we must find them before we can revoke
them. Schemes that implement revocation for capabilities include the following:
 Reacquisition. Periodically, capabilities are deleted from each domain. If a process wants to
use a capability that has been deleted. The process may then try to reacquire the capability. If
access has been revoked, the process will not be able to reacquire the capability.
 Back-pointers. A list of pointers is maintained with each object, pointing to all capabilities
associated with that object. When revocation is required, we can follow these pointers,
changing the capabilities as necessary. This scheme was adopted in the MULTICS system. It
is quite general, but its implementation is costly.
 Indirection. The capabilities point indirectly to the objects. Each capability points to a
unique entry in a global table, which in turn points to the object. We implement revocation
by searching the global table for the desired entry and deleting it. Then, when an access is
attempted, the capability is found to point to an illegal table entry. Table entries can be
reused for other capabilities without difficulty, since both the capability and the table entry
contain the unique name of the object. The object for a capability and its table entry must
match. This scheme was adopted in the CAL system. It does not allow selective revocation.
 Keys. A key is a unique bit pattern that can be associated with a capability. This key is
defined when the capability is created, and it can be neither modified nor inspected by the
process owning the capability. A master key is associated with each object; it can be defined
or replaced with the set-key operation. When a capability is created, the current value of the
master key is associated with the capability. When the capability is exercised, its key is
compared with the master key. If the keys match, the operation is allowed to continue;
otherwise, an exception condition is raised.
Revocation replaces the master key with a new value via the set-key operation,
invalidating all previous capabilities for this object. This scheme does not allows selective
revocation, since only one master key is associated with each object. If we associate a list of
keys with each object, then selective revocation can be implemented. Finally, we can group
all keys into one global table of keys. A capability is valid only if its key matches some key
in the global table. We implement revocation by removing the matching key from the table.
Dept of ISE, Dr.AIT Page 144

With this scheme, a key can be associated with several objects, and several keys can be
associated with each object, providing maximum flexibility. In key-based schemes, the
operations of defining keys, inserting them into lists, and deleting them from lists should not
be available to all users. In particular, it would be reasonable to allow only the owner of an
object to set the keys for that object. This choice, however, is a policy decision that the
protection system can implement but should not define.

14. Capability-Based systems.

 Hydra is a capability-based protection system that provides considerable flexibility.
 Fixed set of access rights known to and interpreted by the system. These rights include such
basic forms of access as the right to read, write, or execute a memory segment.
 In addition, a user (of the protection system) can declare other rights. Interpretation of user-
defined rights performed solely by user's program; system provides access protection for use
of these rights
 Operations on objects are defined procedurally. The procedures that implement such
operations are themselves a form of object, and they are accessed indirectly by capabilities.
 Auxiliary rights can be described in a capability for an instance of the type.
 Hydra also provides rights amplification. This scheme allows a procedure to be certified as
trustworthy to act on a formal parameter of a specified type on behalf of any process that
holds a right to execute the procedure.
Cambridge CAP System
 CAP's capability system is simpler and superficially less powerful than that of Hydra.
 CAP has two kinds of capabilities.
 Data capability -provides standard read, write, execute of individual storage segments
associated with object
 Software capability -interpretation left to the subsystem, through its protected procedures
 Specification of protection in a programming language allows the high-level description of
policies for the allocation and use of resources
 Language implementation can provide software for protection enforcement when automatic
hardware- supported checking is unavailable.

Dept of ISE, Dr.AIT Page 145

You might also like