I/O Management - Part 1: Department of CSE/IT, PSIT, Kanpur

I/O Management – Part 1
Department of CSE/IT, PSIT, Kanpur

Common Overview
◼ I/O management is a major component of operating system

design and operation
⚫ Important aspect of computer operation
⚫ I/O devices vary greatly
⚫ Various methods to control them
⚫ Performance management
⚫ New types of devices frequent
◼ Ports, busses, device controllers connect to various devices
◼ Device drivers encapsulate device details
⚫ Present uniform device-access interface to I/O subsystem

I/O Hardware Devices
◼ Incredible variety of I/O devices
⚫ Storage
⚫ Transmission
⚫ Human-interface
◼ Common concepts – signals from I/O devices interface with computer
⚫ Port – connection point for device
⚫ Bus - daisy chain or shared direct access
 PCI bus common in PCs and servers, PCI Express (PCIe)
 expansion bus connects relatively slow devices
⚫ Controller (host adapter) – electronics that operate port, bus, device
 Sometimes integrated
 Sometimes separate circuit board (host adapter)
 Contains processor, microcode, private memory, bus controller, etc
– Some talk to per-device controller with bus controller, microcode,
memory, etc

I/O Hardware Devices(Cont.)
◼ I/O instructions control devices
◼ Devices usually have registers where device driver places
commands, addresses, and data to write, or read data from
registers after command execution
⚫ Data-in register, data-out register, status register, control
register
⚫ Typically 1-4 bytes, or FIFO buffer
◼ Devices have addresses, used by
⚫ Direct I/O instructions
⚫ Memory-mapped I/O
 Device data and command registers mapped to
processor address space
 Especially for large address spaces (graphics)

A Typical PC Bus Structure

Device I/O Port Locations on PCs (partial)

Polling
◼ For each byte of I/O
1. Read busy bit from status register until 0
2. Host sets read or write bit and if write copies data into data-out
register
3. Host sets command-ready bit
4. Controller sets busy bit, executes transfer
5. Controller clears busy bit, error bit, command-ready bit when
transfer done
◼ Step 1 is busy-wait cycle to wait for I/O from device
⚫ Reasonable if device is fast
⚫ But inefficient if device slow
⚫ CPU switches to other tasks?
 But if miss a cycle data overwritten / lost

Interrupts
◼ Polling can happen in 3 instruction cycles
⚫ Read status, logical-and to extract status bit, branch if not zero
⚫ How to be more efficient if non-zero infrequently?
◼ CPU Interrupt-request line triggered by I/O device
⚫ Checked by processor after each instruction
◼ Interrupt handler receives interrupts
⚫ Maskable to ignore or delay some interrupts
◼ Interrupt vector to dispatch interrupt to correct handler
⚫ Context switch at start and end
⚫ Based on priority
⚫ Some nonmaskable
⚫ Interrupt chaining if more than one device at same interrupt
number

Interrupts (Cont.)
◼ Interrupt mechanism also used for exceptions

⚫ Terminate process, crash system due to hardware error
◼ Page fault executes when memory access error
◼ System call executes via trap to trigger kernel to execute
request
◼ Multi-CPU systems can process interrupts concurrently
⚫ If operating system designed to handle it
◼ Used for time-sensitive processing, frequent, must be fast

Interrupt-Driven I/O Cycle
◼ Rather than using busy waiting,

the device can interrupt the
CPU when it completes an I/O
operation.
◼ On an I/O interrupt:
⚫ Determine which device
caused the interrupt.
⚫ If the last command was an
input operation, retrieve the
data from the device
register.
⚫ Start the next operation for
that device.

Intel Pentium Processor Event-Vector Table

I/O Services provided by OS
◼ Naming of files and devices. (On Unix, devices appear as files in the
/dev directory)
◼ Access control.
◼ Operations appropriate to the files and devices.
◼ Device allocation.
◼ Buffering, caching, and spooling to allow efficient communication
with devices.
◼ I/O scheduling.
◼ Error handling and failure recovery associated with devices
(command retries, for example).
◼ Device drivers to implement device-specific behaviors.

Direct Memory Access
◼ Used to avoid programmed I/O (one byte at a time) for large data
movement
◼ Requires DMA controller
◼ Bypasses CPU to transfer data directly between I/O device and
memory
◼ OS writes DMA command block into memory
⚫ Source and destination addresses
⚫ Read or write mode
⚫ Count of bytes
⚫ Writes location of command block to DMA controller
⚫ Bus mastering of DMA controller – grabs bus from CPU
 Cycle stealing from CPU but still much more efficient
⚫ When done, interrupts to signal completion
◼ Version that is aware of virtual addresses can be even more efficient -
DVMA

Six Step Process to Perform DMA Transfer

Application I/O Interface
◼ I/O system calls encapsulate device behaviors in generic classes
◼ Device-driver layer hides differences among I/O controllers from kernel
◼ New devices talking already-implemented protocols need no extra
work
◼ Each OS has its own I/O subsystem structures and device driver
frameworks
◼ Devices vary in many dimensions
⚫ Character-stream or block
⚫ Sequential or random-access
⚫ Synchronous or asynchronous (or both)
⚫ Sharable or dedicated
⚫ Speed of operation
⚫ read-write, read only, or write only

A Kernel I/O Structure

Characteristics of I/O Devices

Characteristics of I/O Devices (Cont.)
◼ Subtleties of devices handled by device drivers

◼ Broadly I/O devices can be grouped by the OS into
⚫ Block I/O
⚫ Character I/O (Stream)
⚫ Memory-mapped file access
⚫ Network sockets
◼ For direct manipulation of I/O device specific characteristics,
usually an escape / back door
⚫ Unix ioctl() call to send arbitrary bits to a device control
register and data to device data register

Block and Character Devices
◼ Block devices include disk drives

⚫ Commands include read, write, seek
⚫ Raw I/O, direct I/O, or file-system access
⚫ Memory-mapped file access possible
 File
mapped to virtual memory and clusters brought via
demand paging
⚫ DMA
◼ Character devices include keyboards, mice, serial ports
⚫ Commands include get(), put()
⚫ Libraries layered on top allow line editing

I/O Management – Part 2

I/O Buffering
◼ I/O devices typically contain a small on-board memory where
they can store data temporarily before transferring to/from the
CPU.
⚫ A disk buffer stores a block when it is read from the disk.
⚫ It is transferred over the bus by the DMA controller into a
buffer in physical memory.
⚫ The DMA controller interrupts the CPU when the transfer is
done.

Need of Buffer in OS
◼ To cope with speed mismatches between device and CPU.
⚫ Example: Compute the contents of a display in a buffer (slow)
and then zap the buffer to the screen (fast)
◼ To cope with devices that have different data transfer sizes.
⚫ Example: ftp brings the file over the network one packet at a
time. Stores to disk happen one block at a time.
◼ To minimize the time a user process is blocked on a write.
⚫ Writes => copy data to a kernel buffer and return control to the
user program. The write from the kernel buffer to the disk is
done later.

Caching
◼ Improve disk performance by reducing the number of disk accesses.
◼ Idea: keep recently used disk blocks in main memory after the I/O call that
brought them into memory completes.
◼ Example: Read (disk Address)
⚫ If (block in memory) return value from memory Else Read Sector (disk
Address)
◼ Example: Write (disk Address)
⚫ If (block in memory) update value in memory
⚫ Else Allocate space in memory, read block from disk, and update value
in memory
◼ What should happen when we write to a cache?
⚫ write-through policy (write to all levels of memory containing the block,
including to disk). High reliability.
⚫ write-back policy (write only to the fastest memory containing the block,
write to slower memories and disk sometime later). Faster.

A typical process of Read Call
◼ Step1- User process requests a read from a device.
◼ Step 2- OS checks if data is in a buffer. If not,
⚫ OS tells the device driver to perform input.
⚫ Device driver tells the DMA controller what to do and blocks itself.
⚫ DMA controller transfers the data to the kernel buffer when it has
all been retrieved from the device.
⚫ DMA controller interrupts the CPU when the transfer is complete.
◼ Step 3- OS transfers the data to the user process and places the
process in the ready queue.
◼ Step 4- When the process gets the CPU, it begins execution following
the system call.

Network Devices
◼ Varying enough from block and character to have own

interface
◼ Linux, Unix, Windows and many others include socket
interface
⚫ Separates network protocol from network operation
⚫ Includes select() functionality
◼ Approaches vary widely (pipes, FIFOs, streams, queues,
mailboxes)

Clocks and Timers
◼ Provide current time, elapsed time, timer

◼ Normal resolution about 1/60 second
◼ Some systems provide higher-resolution timers
◼ Programmable interval timer used for timings, periodic
interrupts
◼ ioctl() (on UNIX) covers odd aspects of I/O such as
clocks and timers

Nonblocking and Asynchronous I/O
◼ Blocking - process suspended until I/O completed
⚫ Easy to use and understand
⚫ Insufficient for some needs
◼ Nonblocking - I/O call returns as much as available
⚫ User interface, data copy (buffered I/O)
⚫ Implemented via multi-threading
⚫ Returns quickly with count of bytes read or written
⚫ select() to find if data ready then read() or write()
to transfer
◼ Asynchronous - process runs while I/O executes
⚫ Difficult to use
⚫ I/O subsystem signals process when I/O completed

Two I/O Methods
Synchronous Asynchronous

Vectored I/O
◼ Vectored I/O allows one system call to perform multiple I/O
operations
◼ For example, Unix readve() accepts a vector of multiple
buffers to read into or write from
◼ This scatter-gather method better than multiple individual I/O
calls
⚫ Decreases context switching and system call overhead
⚫ Some versions provide atomicity
 Avoid
for example worry about multiple threads
changing data as reads / writes occurring

Kernel I/O Subsystem
◼ Scheduling
⚫ Some I/O request ordering via per-device queue
⚫ Some OSs try fairness
⚫ Some implement Quality Of Service (i.e. IPQOS)
◼ Buffering - store data in memory while transferring between devices
⚫ To cope with device speed mismatch
⚫ To cope with device transfer size mismatch
⚫ To maintain “copy semantics”
⚫ Double buffering – two copies of the data
 Kernel and user
 Varying sizes
 Full / being processed and not-full / being used
 Copy-on-write can be used for efficiency in some cases

Device-status Table

Sun Enterprise 6000 Device-Transfer Rates

Kernel I/O Subsystem
◼ Caching - faster device holding copy of data

⚫ Always just a copy
⚫ Key to performance
⚫ Sometimes combined with buffering
◼ Spooling - hold output for a device
⚫ If device can serve only one request at a time
⚫ i.e., Printing
◼ Device reservation - provides exclusive access to a device
⚫ System calls for allocation and de-allocation
⚫ Watch out for deadlock

Error Handling
◼ OS can recover from disk read, device unavailable, transient

write failures
⚫ Retry a read or write, for example
⚫ Some systems more advanced – Solaris FMA, AIX
 Track error frequencies, stop using device with
increasing frequency of retry-able errors
◼ Most return an error number or code when I/O request fails
◼ System error logs hold problem reports

I/O Protection
◼ User process may accidentally or purposefully attempt to

disrupt normal operation via illegal I/O instructions
⚫ All I/O instructions defined to be privileged
⚫ I/O must be performed via system calls
 Memory-mapped and I/O port memory locations must
be protected too

Use of a System Call to Perform I/O

Kernel Data Structures
◼ Kernel keeps state info for I/O components, including open file
tables, network connections, character device state
◼ Many, many complex data structures to track buffers, memory
allocation, “dirty” blocks
◼ Some use object-oriented methods and message passing to
implement I/O
⚫ Windows uses message passing
 Message with I/O information passed from user mode
into kernel
 Message modified as it flows through to device driver
and back to process
 Pros / cons?

UNIX I/O Kernel Structure

Power Management
◼ Not strictly domain of I/O, but much is I/O related
◼ Computers and devices use electricity, generate heat, frequently
require cooling
◼ OSes can help manage and improve use
⚫ Cloud computing environments move virtual machines
between servers
 Can end up evacuating whole systems and shutting them
down
◼ Mobile computing has power management as first class OS
aspect

Power Management (Cont.)
◼ For example, Android implements
⚫ Component-level power management
 Understands relationship between components
 Build device tree representing physical device topology
 System bus -> I/O subsystem -> {flash, USB storage}
 Device driver tracks state of device, whether in use
 Unused component – turn it off
 All devices in tree branch unused – turn off branch
⚫ Wake locks – like other locks but prevent sleep of device when lock
is held
⚫ Power collapse – put a device into very deep sleep
 Marginal power use
 Only awake enough to respond to external stimuli (button
press, incoming call)

I/O Requests to Hardware Operations
◼ Consider reading a file from disk for a process:
⚫ Determine device holding file
⚫ Translate name to device representation
⚫ Physically read data from disk into buffer
⚫ Make data available to requesting process
⚫ Return control to process

Life Cycle of An I/O Request

STREAMS
◼ STREAM – a full-duplex communication channel between a

user-level process and a device in Unix System V and beyond
◼ A STREAM consists of:
⚫ STREAM head interfaces with the user process
⚫ driver end interfaces with the device
⚫ zero or more STREAM modules between them
◼ Each module contains a read queue and a write queue
◼ Message passing is used to communicate between queues

⚫ Flow control option to indicate available or busy
◼ Asynchronous internally, synchronous where user process
communicates with stream head

The STREAMS Structure

Performance
◼ I/O a major factor in system performance:

⚫ Demands CPU to execute device driver, kernel I/O
code
⚫ Context switches due to interrupts
⚫ Data copying
⚫ Network traffic especially stressful

Intercomputer Communications

How to Improve Performance of OS
◼ Reduce number of context switches
◼ Reduce data copying
◼ Reduce interrupts by using large transfers, smart controllers,
polling
◼ Use DMA
◼ Use smarter hardware devices
◼ Balance CPU, memory, bus, and I/O performance for highest
throughput
◼ Move user-mode processes / daemons to kernel threads

Disk Structure and
Disk Scheduling
(Part 1)

Storage-Device Hierarchy

Overview of Mass-Storage Structure
◼ The bulk of secondary storage for modern computer are
⚫ Hard disk drives (HDD)
⚫ Nonvolatile Memory (NVM)
◼ The most common tertiary storage is magnetic tape.
◼ In this chapter we describe:
⚫ The basic mechanism of those devices
⚫ How operating systems translate their physical
properties to logical storage via address mapping.

Hard Disk Drive Moving-head Mechanism
◼ Platters range from .85” to 14” (historically)

⚫ Commonly 3.5”, 2.5”, and 1.8”
◼ Range from gigabytes through terabytes per drive

Hard Disk Drives
◼ HDDs rotate at 60 to 250 times per second
◼ Transfer rate is rate at which data flow between drive and
computer
◼ Positioning time (random-access time) is time to move disk
arm to desired cylinder (seek time) and time for desired sector to
rotate under the disk head (rotational latency)
◼ Head crash results from disk head making contact with the disk
surface
⚫ That’s bad
◼ Disks can be removable
◼ Other types of storage media include CDs, DVDs, Blu-ray discs.
magnetic tape

A 3.5" HDD with Cover Removed.

The First Commercial Disk Drive
1956
IBM RAMDAC computer included the
IBM Model 350 disk storage system
5 million (7 bit) characters

50 x 24” platters
Access time = < 1 second

Performance of Magnetic Disks
◼ Performance
⚫ Transfer Rate – theoretical – 6 Gb/sec
⚫ Effective Transfer Rate – real – 1Gb/sec
⚫ Seek time from 3ms to 12ms – 9ms common for desktop drives
⚫ Average seek time measured or calculated based on 1/3 of tracks
⚫ Latency based on spindle speed
 1 / (RPM / 60) = 60 / RPM
⚫ Average latency = ½ latency
◼ From Wikipedia

Performance of Magnetic Disks (Cont.)
◼ Access Latency = Average access time = average seek

time + average latency
⚫ For fastest disk 3ms + 2ms = 5ms
⚫ For slow disk 9ms + 5.56ms = 14.56ms
◼ Average I/O time = average access time + (amount to
transfer / transfer rate) + controller overhead
◼ For example to transfer a 4KB block on a 7200 RPM disk
with a 5ms average seek time, 1Gb/sec transfer rate with a
.1ms controller overhead =
⚫ 5ms + 4.17ms + 0.1ms + transfer time
⚫ Transfer time = 4KB / 1Gb/s * 8Gb / GB * 1GB /
10242KB = 32 / (10242) = 0.031 ms
⚫ Average I/O time for 4KB block = 9.27ms + .031ms =
9.301ms

Disk Structure
◼ Disk drives are addressed as large 1-dimensional arrays of logical
blocks, where the logical block is the smallest unit of transfer
⚫ Low-level formatting creates logical blocks on physical media
◼ The 1-dimensional array of logical blocks is mapped into the
sectors of the disk sequentially
⚫ Sector 0 is the first sector of the first track on the outermost
cylinder
⚫ Mapping proceeds in order through that track, then the rest of
the tracks in that cylinder, and then through the rest of the
cylinders from outermost to innermost
⚫ Logical to physical address should be easy
 Except for bad sectors
 Non-constant # of sectors per track via constant angular
velocity
⚫ Each sector 512B or 4KB – smallest I/O the drive can do

Nonvolatile Memory (NVM)
◼ NVM is electrical vs. mechanical (HDD)
◼ Commonly composed of a controller and flash NAND die
semiconductor chips (A die is a small block of semiconducting
material, on which a given functional circuit is fabricated
⚫ Other forms include DRAM with battery backup, non-NAND
die like 3D XPoint
◼ Flash-memory-based NVM is frequently used in disk-drive like
container -> solid-state disk (SSD)
◼ Also can be in other formats like USB drive

NVM (Cont.)
◼ In all forms, acts and treated the same way
◼ More reliable than HDD (no moving parts), can be faster (no
seek time or latency), consumes less power
◼ More expensive per MB, lower capacity, may have shorter
lifespan (writes wear it out)
◼ Higher speed means new connection methods
⚫ Direct to PCIe bus (called NVMe)
◼ Can be used in variety of ways – replacement for disk, caching
tier, etc

NVM Details
◼ Read and written in “page” increment
⚫ Page size varies based on device
◼ Cannot overwrite data, must erase it first
◼ Erase occurs in ”block” increment composed of several
pages
◼ Erase operation takes much longer than read or write
◼ NAND wears out a bit with every erase
⚫ ~100,000 program-erase cycles until cells no longer
retain data

NVM Device Controller
◼ Several algorithms, usually implemented in NVM device
controller
⚫ So operating system blissfully just reads and writes
blocks and device deals with the physics
⚫ But can impact performance, so worth knowing about
◼ NAND block with valid and invalid pages

NVM Controller Algorithms
◼ NAND cannot be overwritten, therefore:
⚫ There are usually pages containing invalid data.
⚫ To track which logical blocks contain the valid data, the
controller maintains a Flash Translation Layer (FTL).
⚫ This table maintains the mapping of which physical page
contains the currently valid logical block.
⚫ This table It is also used to track physical block state
(does the block only contain invalid pages and therefore
can be erased).
⚫ Each logical block can have many versions, each stored
in a physical page, one valid the others invalid.

Full SSD Controller Algorithms
◼ Full SSD (all pages written to, some hold valid data and others
invalid)
⚫ Logically space available, physically no where to write data
⚫ Garbage collection copies good data from blocks with mix
of valid and invalid pages to other locations, freeing up
blocks to erase
 But if device full, no where to copy good pages
 Over-provisioning (20% of media set aside) as target
for GC writes
 As blocks invalid or made invalid by GC are erased,
placed into overprovision pool if device full or returned
to free pool

NVM Controller Algorithms
◼ Overprovision helps with wear leveling

⚫ Want media to wear out at the same time, not have a
hot spot (repeated writes) wear out early
⚫ Track # of writes per block, GC and over provisioning
space used to write to less written to blocks
◼ Data protected with NVM via ECC
⚫ Too many errors and mark block as bad to not use it
⚫ Uncorrectable needs to be recovered via RAID

Magnetic Tape
◼ Was early secondary-storage medium

⚫ Evolved from open spools to cartridges
◼ Relatively permanent and holds large quantities of data
◼ Access time slow
◼ Random access ~1000 times slower than disk
◼ Mainly used for backup, storage of infrequently-used data,
transfer medium between systems

Magnetic Tape
◼ Kept in spool and wound or rewound past read-write

head
◼ Once data under head, transfer rates comparable to
disk
⚫ 140MB/sec and greater
◼ GB to TB typical storage
◼ Common technologies are LTO-{5,6}, SDLT, 4, 8, and
19mm, and ¼ and ½” widths

Secondary Storage Connection Methods
◼ Host-attached storage accessed through I/O ports talking to

I/O busses
◼ Several bus technologies including advanced technology
attachment (ATA), serial ATA (SATA), eSATA, universal
serial bus (USB), fibre channel (FC), serial attached SCSI
(SAS)
◼ Data transfers are carried out by special electronic processors:
controllers
⚫ Host controller is in computer, device controller built into
storage device
⚫ Talk to each other, usually via memory-mapped I/O ports
◼ Device controllers have built in caches
⚫ Data transfer from media into cache, then over bus to host
(and vice versa)

Address Mapping
◼ Storage devices addressed as one-dimensional array of logical

blocks
⚫ Logical block is smallest unit of transfer
⚫ Each logical block maps to physical sectors or pages on
device
 For example, logical block 0 might be sector 0 on cylinder
0 on platter 0 of an HDD
⚫ Easier to use logical address <0> - <N> than address
containing <sector, cylinder, head> or <chip, block, page>
⚫ Also, defective sectors / blocks can be mapped out of use by
logical to physical mapping and logical addresses still
sequential
⚫ Number of sectors per track is not constant on some drives,
also requiring logical addressing to hide complexity

Address Mapping
◼ Some media / devices use constant linear velocity
(CLV) (CD, DVD)
⚫ Density of bits per track uniform
⚫ Farther a track from the center of the disk -> greater
its length -> more sectors
⚫ Drive increases rotational speed as head moves
outward to keep same rate of data under the
read/write head
◼ Alternatively disk rotation speed can stay constant –
constant angular velocity (CAV) (HDD)
⚫ Density of bits decreases from inner to outer tracks
⚫ (Blu-ray drives can do both CAV and CLV depending
on media etc.)

Disk Structure and
Disk Scheduling
(Part 2)

Disk Scheduling
◼ The operating system is responsible for using hardware
efficiently — for the disk drives, this means having a fast
access time and disk bandwidth
◼ Minimize seek time
◼ Seek time  seek distance
◼ Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for
service and the completion of the last transfer

Disk Scheduling (Cont.)
◼ There are many sources of disk I/O request
⚫ OS
⚫ System processes
⚫ Users processes
◼ I/O request includes input or output mode, disk address,
memory address, number of sectors to transfer
◼ OS maintains queue of requests, per disk or device
◼ Idle disk can immediately work on I/O request, busy disk
means work must queued.
⚫ Optimization algorithms only make sense when a
queue exists

Disk Scheduling (Cont.)
◼ Note that drive controllers have small buffers and can manage a
queue of I/O requests (of varying “depth”)
◼ Several algorithms exist to schedule the servicing of disk I/O
requests
◼ The analysis is true for one or many platters
◼ We illustrate scheduling algorithms with a request queue (0-
199)
98, 183, 37, 122, 14, 124, 65, 67
Head pointer 53

FCFS
Illustration shows total head movement of 640 cylinders

SSTF
◼ Shortest Seek Time First -- selects the request with the
minimum seek time from the current head position
◼ SSTF scheduling is a form of SJF scheduling; may cause
starvation of some requests
◼ Illustration shows total head movement of 236 cylinders

SCAN
◼ SCAN algorithm. The disk arm starts at one end of the

disk, and moves toward the other end, servicing requests
until it gets to the other end of the disk, where the head
movement is reversed and servicing continues.
◼ Sometimes it is called the elevator algorithm.
◼ Illustration shows total head movement of 208 cylinders.
◼ But note that if requests are uniformly dense, largest density
at other end of disk and those wait the longest

SCAN (Cont.)

C-SCAN
◼ Provides a more uniform wait time than SCAN

◼ The head moves from one end of the disk to the other,
servicing requests as it goes
⚫ When it reaches the other end, however, it immediately
returns to the beginning of the disk, without servicing
any requests on the return trip
◼ Treats the cylinders as a circular list that wraps around from
the last cylinder to the first one

C-SCAN (Cont.)

LOOK and C-LOOK
◼ LOOK is a version of SCAN. Arm only goes as far as
the last request in each direction, then reverses direction
immediately, without first going all the way to the end of
the disk
◼ C-LOOK a version of C-SCAN. Arm only goes as far as
the last request in one direction, then reverses direction
immediately, without first going all the way to the end of
the disk

C-LOOK (Cont.)

Selecting a Disk-Scheduling Algorithm
◼ SSTF is common and has a natural appeal

◼ SCAN and C-SCAN perform better for systems that place
a heavy load on the disk
⚫ Less starvation
◼ Performance depends on the number and types of
requests
◼ Requests for disk service can be influenced by the file-
allocation method
⚫ And metadata layout

Disk-Scheduling Algorithm
Practice Problem
◼ A disk contains 200 tracks(i.e. 0-199). With a given request queue
track no. 82,170,43,140,24,16, and, 190 calculate the total number of
track movement (i.e. seek time) by R/W head with the help of following
disk scheduling Algorithms. Assume that the current position of R/W
head is 50.
◼ FCFS
◼ SSTF
◼ SCAN (Move towards right)
◼ SCAN (Move towards left)
◼ CSCAN (Move towards right)
◼ CSCAN (Move towards left)
◼ LOOK (Move towards right)
◼ LOOK (Move towards left)
◼ CLOOK (Move towards right)
◼ CLOOK (Move towards left)
◼ If R/W head takes 1ns to move from one track to other, then calculate
the total time in each case.

Disk-Scheduling Algorithm (Cont.)
◼ The disk-scheduling algorithm should be written as a

separate module of the operating system, allowing it to be
replaced with a different algorithm if necessary
◼ Either SSTF or LOOK is a reasonable choice for the default
algorithm
◼ What about rotational latency?
⚫ Difficult for OS to calculate
◼ How does disk-based queuing effect OS queue ordering
efforts?

◼ Problem 1
◼ Suppose a disk has 201 cylinders, numbered from 0 to 200. At some time
the disk arm is at cylinder 100, and there is a queue of disk access requests
for cylinders 30, 85, 90, 100, 105, 110, 135 and 145. If Shortest-Seek Time
First (SSTF) is being used for scheduling the disk access, the request for
cylinder 90 is serviced after servicing number of requests.
(A) 1
(B) 2
(C) 3
(D) 4

◼ Problem 1
(A) 1
(B) 2
(C) 3
(D) 4
◼ Answer
◼ The disk will service that request first whose cylinder number is closest to its
arm. Hence 1st serviced request is for cylinder no 100 ( as the arm is itself
pointing to it ), then 105, then 110, and then the arm comes to service
request for cylinder 90. Hence before servicing request for cylinder 90, the
disk would had serviced 3 requests.
◼ Hence option C.

◼ Problem 1
(A) 1
(B) 2
(C) 3
(D) 4
◼ Answer
◼ The disk will service that request first whose cylinder number is closest to its
arm. Hence 1st serviced request is for cylinder no 100 ( as the arm is itself
pointing to it ), then 105, then 110, and then the arm comes to service
request for cylinder 90. Hence before servicing request for cylinder 90, the
disk would had serviced 3 requests.
◼ Hence option C.

◼ Problem 2
◼ Suppose the following disk request sequence (track numbers) for a disk
with 100 tracks is given: 45, 20, 90, 10, 50, 60, 80, 25, 70. Assume that the
initial position of the R/W head is on track 50. The additional distance that
will be traversed by the R/W head when the Shortest Seek Time First
(SSTF) algorithm is used compared to the SCAN (Elevator) algorithm
(assuming that SCAN algorithm moves towards 100 when it starts
execution) is tracks
(A) 8 (B) 9 (C) 10 (D) 11

◼ Problem 2
◼ Suppose the following disk request sequence (track numbers) for a disk
with 100 tracks is given: 45, 20, 90, 10, 50, 60, 80, 25, 70. Assume that the
initial position of the R/W head is on track 50. The additional distance that
will be traversed by the R/W head when the Shortest Seek Time First
(SSTF) algorithm is used compared to the SCAN (Elevator) algorithm
(assuming that SCAN algorithm moves towards 100 when it starts
execution) is tracks
(A) 8 (B) 9 (C) 10 (D) 11

◼ Solution
◼ Given a disk with 100 tracks And Sequence 45, 20, 90, 10, 50, 60, 80, 25,
70. Initial position of the R/W head is on track 50.
◼ By using SSTF, requests are served and the total Distance Traveled is
130
◼ If Simple SCAN is used, requests are served and the total Distance
Traveled is 140
◼ Hence Less Distance traveled in SSTF = 130 - 140 = 10
◼ So answer is C
◼ Problem 3
◼ Consider an operating system capable of loading and executing a
single sequential user process at a time. The disk head scheduling
algorithm used is First Come First Served (FCFS). If FCFS is replaced
by Shortest Seek Time First (SSTF), claimed by the vendor to give 50%
better benchmark results, what is the expected improvement in the I/O
performance of user programs?
(A) 50% (B) 40% (C) 25% (D) 0%

◼ Problem 3
◼ Consider an operating system capable of loading and executing a
single sequential user process at a time. The disk head scheduling
algorithm used is First Come First Served (FCFS). If FCFS is replaced
by Shortest Seek Time First (SSTF), claimed by the vendor to give 50%
better benchmark results, what is the expected improvement in the I/O
performance of user programs?
(A) 50% (B) 40% (C) 25% (D) 0%
◼ Answer
◼ Since Operating System can execute a single sequential user process
at a time, the disk is accessed in FCFS manner always. The OS never
has a choice to pick an I/O from multiple IOs as there is always one
I/O at a time.
◼ Hence answer is D

◼ Problem 4
◼ Suppose that we want to store a file with 60,000 fixed-length data records
where each record requires 80 bytes and records are not allowed to span
two sectors - How many cylinders are required for this file ?
◼ The Disk Drive have the following specifications :
⚫ bytes per sector = 512
⚫ tracks per cylinder = 16
⚫ sectors per track = 63
⚫ cylinders = 1654

◼ Problem 4
◼ Suppose that we want to store a file with 60,000 fixed-length data records
where each record requires 80 bytes and records are not allowed to span
two sectors - How many cylinders are required for this file ?
◼ The Disk Drive have the following specifications :
⚫bytes per sector = 512
⚫ tracks per cylinder = 16
⚫ sectors per track = 63
⚫ cylinders = 1654
◼ Answer :
⚫ Each sector can hold 512/80 = 6 records •
⚫ The file requires 60,000/6 = 10,000 sectors •
⚫ One cylinder can hold 63 × 16 = 1008 sectors •
⚫ So the number of cylinders required is approximately 10,000/1008 =
9.93 cylinders.

Network-Attached Storage
◼ Network-attached storage (NAS) is storage made available
over a network rather than over a local connection (such as
a bus)
⚫ Remotely attaching to file systems
⚫ Manages storage, provides data management, has
network interfaces, and talks network storage protocols
◼ NFS and CIFS are common protocols
◼ Implemented via remote procedure calls (RPCs) between
host and storage over typically TCP or UDP on IP network

Network-Attached Storage (Cont.)
◼ iSCSI protocol uses IP network to carry the SCSI protocol

⚫ Remotely attaching to devices (blocks)

Cloud Storage
◼ Similar to NAS
⚫ Provides storage across the network
⚫ Usually not owned by the company or user, but
provided for a fee (based on time, storage capacity
used, I/O done, etc.)
⚫ But across a WAN rather than a LAN
⚫ Frequently too slow, more prone to connection
interruption than NAS so CIFS, NFS, iSCSI possibly
but less used
⚫ Frequently use their own APIs, and apps that use those
APIs to do I/O
 Dropbox, Microsoft OneDrive, Apple iCloud, etc.

Storage Area Network
◼ Common in large storage environments
◼ Multiple hosts attached to multiple storage arrays - flexible

Reliability and Redundancy
◼ Mean time to failure. The average time it takes a disk to fail.
◼ Mean time to repair. The time it takes (on average) to replace a
failed disk and restore the data on it.
◼ Mirroring. Copy of a disk is duplicated on another disk.
⚫ Consider disk with 100,000 mean time to failure and 10 hour
mean time to repair
 Mean time to data loss is 100, 0002 / (2 ∗ 10) = 500 ∗ 106
hours, or 57,000 years If mirrored disks fail independently,
◼ Several improvements in disk-use techniques involve the use of
multiple disks working cooperatively

Redundant Array of
Independent Disks(RAID)

RAID
◼ What is RAID?
◼ RAID is an acronym for :

Redundant Array of Independent Disk
or
Redundant Array of Inexpensive Disks.
◼ In fact, RAID is the way of combining several independent and
relatively small disks into a single storage of a large size.
◼ The disks included into the array are called array members. The
disks can be combined into the array in different ways which are
known as RAID levels.

RAID
◼ Key evaluation points for a RAID System
◼ Reliability: How many disk faults can the system tolerate?
◼ Availability: What fraction of the total session time is a system

in uptime mode, i.e. how available is the system for actual use?
◼ Performance: How good is the response time? How high is the
throughput (rate of processing work)? Note that performance
contains a lot of parameters and not just the two.
◼ Capacity: Given a set of N disks each with B blocks, how much
useful capacity is available to the user?

RAID Characteristics
◼ Each of RAID levels has its own characteristics of:
⚫ Fault-tolerance which is the ability to survive of one or several
disk failures.
⚫ Performance which shows the change in the read and write
speed of the entire array as compared to a single disk.
⚫ The capacity of the array which is determined by the amount of
user data that can be written to the array. The array capacity
depends on the RAID level and does not always match the sum of
the sizes of the RAID member disks. To calculate the capacity of
the particular RAID type and a set of the member disks you can
use a free online RAID calculator.

RAID Organisation
◼ Two independent aspects are clearly distinguished in the
RAID organization.
⚫ The organization of data in the array (RAID storage techniques:
striping, mirroring, parity, combination of them).
⚫ Implementation of each particular RAID installation -
hardware or software.

RAID Storage Structure
◼ The main methods of storing data in the array are:
⚫ Striping - splitting the flow of data into blocks of a certain size
(called "block size") then writing of these blocks across the RAID
one by one. This way of data storage affects on the performance.
⚫ Mirroring - is a storage technique in which the identical copies of
data are stored on the RAID members simultaneously. This type
of data placement affects the fault tolerance as well as the
performance.
⚫ Parity - is a storage technique which is utilized striping and
checksum(error checking) methods. In parity technique, a certain
parity function is calculated for the data blocks. If a drive fails, the
missing block are recalculated from the checksum, providing the
RAID fault tolerance.
◼ All the existing RAID types are based on striping, mirroring, parity, or
combination of these storage techniques.

RAID LEVELS
◼ RAID 0 (stripes) (also known as a stripe set or striped volume) splits
("stripes") data evenly across two or more disks,
without parity information, redundancy, or fault tolerance.
RAID 0
Disk 1 Disk 2 Disk 3 Disk 4

A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4
D1 D2 D3 D4
E1 E2 E3 E4
◼ Evaluation:
◼ Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be recovered.
◼ Capacity: N*B
The entire space is being used to store data. Since there is no duplication, N disks each
having B blocks are fully utilized.

RAID LEVELS
◼ RAID 1(Mirroring): Blocks are “striped” across disks.
RAID 1
Disk 1 Disk 2
A1 A1
B1 B1
C1 C1
D1 D1
E1 E1
◼ Evaluation:
◼ Assume a RAID system with mirroring level 2.
◼ Reliability: 1 to N/2
Capacity: N*B/2

RAID LEVELS
◼ RAID 1+0 (Mirroring and Striped):
Blocks are first mirroring and then striped. Other wise known
as Nested RAID
RAID 0
RAID 1 RAID 1
Disk 1 Disk 2 Disk 3 Disk 4

A1 A1 A2 A2
B1 B1 B2 B2
C1 C1 C2 C2
D1 D1 D2 D2
E1 E1 E2 E2
◼ Evaluation:
◼ Assume a RAID system with mirroring level 1+0.
◼ Reliability: 1 to N/2 Capacity: N*B/2

RAID LEVELS
◼ RAID-2 consists of bit-level striping using a Hamming Code parity.

◼ RAID-3 consists of byte-level striping with a dedicated parity.
These two are now a days are absolute and less commonly used.

RAID LEVELS
◼ RAID 4 (Block-Level Striping with Dedicated Parity) : Instead of
duplicating data, this adopts a parity-based approach.
◼ Parity is calculated using a simple XOR function. If the data bits are
0,0,0,1 the parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0
the parity bit is XOR(0,1,1,0) = 0. A simple approach is that even
number of ones results in parity 0, and an odd number of ones
results in parity 1.
RAID 4
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 D1 D2 D3 D4 P

A1 A2 A3 A4 Ap 0 0 0 1 1
B1 B2 B3 B4 Bp
C1 C2 C3 C4 Cp 0 1 1 0 0
D1 D2 D3 D4 Dp
E1 E2 E3 E4 Ep
◼ Evaluation:
⚫ Reliability: 1
⚫ Capacity: (N-1)*B
RAID LEVELS
◼ RAID 5 (Block-Level Striping with Distributed Parity) : This is a
slight modification of the RAID-4 system where the only difference is
that the parity rotates among the drives.
RAID 5
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5

A1 A2 A3 A4 Ap
B1 B2 BP B3 B4
C1 Cp C2 C3 C4
Dp D1 D2 D3 D4
E1 E2 E3 Ep E4
◼ Evaluation:
⚫ Reliability: 1

RAID LEVELS
◼ RAID 6 (Block-Level Striping with Distributed two Parity) : This
is a slight modification of the RAID-4 system where the only
difference is that the parity rotates among the drives.
RAID 6
Disk 1 Disk 2 Disk 3 Disk 4 Disk 5

A1 A2 A3 Ap Aq
B1 B2 BP Bq B3
C1 Cp Cq C2 C3
Dp Dq D1 D2 D3
E1 Ep E2 Eq E3
◼ Evaluation:
⚫ Reliability: 1

RAID Levels

I/O Management - Part 1: Department of CSE/IT, PSIT, Kanpur

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

I/O Management - Part 1: Department of CSE/IT, PSIT, Kanpur

Uploaded by

Copyright:

Available Formats

I/O Management – Part 1

Department of CSE/IT, PSIT, Kanpur

◼ I/O management is a major component of operating system

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

◼ Interrupt mechanism also used for exceptions

Department of CSE/IT, PSIT, Kanpur

◼ Rather than using busy waiting,

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

◼ Subtleties of devices handled by device drivers

Department of CSE/IT, PSIT, Kanpur

◼ Block devices include disk drives

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

◼ Varying enough from block and character to have own

Department of CSE/IT, PSIT, Kanpur

◼ Provide current time, elapsed time, timer

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

◼ Caching - faster device holding copy of data

Department of CSE/IT, PSIT, Kanpur

◼ OS can recover from disk read, device unavailable, transient

Department of CSE/IT, PSIT, Kanpur

◼ User process may accidentally or purposefully attempt to

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

◼ STREAM – a full-duplex communication channel between a

◼ Message passing is used to communicate between queues

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

◼ I/O a major factor in system performance:

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur

Department of CSE/IT, PSIT, Kanpur