Download as pdf or txt
Download as pdf or txt
You are on page 1of 112

I/O Management – Part 1

Department of CSE/IT, PSIT, Kanpur


Common Overview

◼ I/O management is a major component of operating system


design and operation
⚫ Important aspect of computer operation
⚫ I/O devices vary greatly
⚫ Various methods to control them
⚫ Performance management
⚫ New types of devices frequent
◼ Ports, busses, device controllers connect to various devices
◼ Device drivers encapsulate device details
⚫ Present uniform device-access interface to I/O subsystem

Department of CSE/IT, PSIT, Kanpur


I/O Hardware Devices
◼ Incredible variety of I/O devices
⚫ Storage
⚫ Transmission
⚫ Human-interface
◼ Common concepts – signals from I/O devices interface with computer
⚫ Port – connection point for device
⚫ Bus - daisy chain or shared direct access
 PCI bus common in PCs and servers, PCI Express (PCIe)
 expansion bus connects relatively slow devices
⚫ Controller (host adapter) – electronics that operate port, bus, device
 Sometimes integrated
 Sometimes separate circuit board (host adapter)
 Contains processor, microcode, private memory, bus controller, etc
– Some talk to per-device controller with bus controller, microcode,
memory, etc

Department of CSE/IT, PSIT, Kanpur


I/O Hardware Devices(Cont.)
◼ I/O instructions control devices
◼ Devices usually have registers where device driver places
commands, addresses, and data to write, or read data from
registers after command execution
⚫ Data-in register, data-out register, status register, control
register
⚫ Typically 1-4 bytes, or FIFO buffer
◼ Devices have addresses, used by
⚫ Direct I/O instructions
⚫ Memory-mapped I/O
 Device data and command registers mapped to
processor address space
 Especially for large address spaces (graphics)

Department of CSE/IT, PSIT, Kanpur


A Typical PC Bus Structure

Department of CSE/IT, PSIT, Kanpur


Device I/O Port Locations on PCs (partial)

Department of CSE/IT, PSIT, Kanpur


Polling
◼ For each byte of I/O
1. Read busy bit from status register until 0
2. Host sets read or write bit and if write copies data into data-out
register
3. Host sets command-ready bit
4. Controller sets busy bit, executes transfer
5. Controller clears busy bit, error bit, command-ready bit when
transfer done
◼ Step 1 is busy-wait cycle to wait for I/O from device
⚫ Reasonable if device is fast
⚫ But inefficient if device slow
⚫ CPU switches to other tasks?
 But if miss a cycle data overwritten / lost

Department of CSE/IT, PSIT, Kanpur


Interrupts
◼ Polling can happen in 3 instruction cycles
⚫ Read status, logical-and to extract status bit, branch if not zero
⚫ How to be more efficient if non-zero infrequently?
◼ CPU Interrupt-request line triggered by I/O device
⚫ Checked by processor after each instruction
◼ Interrupt handler receives interrupts
⚫ Maskable to ignore or delay some interrupts
◼ Interrupt vector to dispatch interrupt to correct handler
⚫ Context switch at start and end
⚫ Based on priority
⚫ Some nonmaskable
⚫ Interrupt chaining if more than one device at same interrupt
number

Department of CSE/IT, PSIT, Kanpur


Interrupts (Cont.)

◼ Interrupt mechanism also used for exceptions


⚫ Terminate process, crash system due to hardware error
◼ Page fault executes when memory access error
◼ System call executes via trap to trigger kernel to execute
request
◼ Multi-CPU systems can process interrupts concurrently
⚫ If operating system designed to handle it
◼ Used for time-sensitive processing, frequent, must be fast

Department of CSE/IT, PSIT, Kanpur


Interrupt-Driven I/O Cycle

◼ Rather than using busy waiting,


the device can interrupt the
CPU when it completes an I/O
operation.
◼ On an I/O interrupt:
⚫ Determine which device
caused the interrupt.
⚫ If the last command was an
input operation, retrieve the
data from the device
register.
⚫ Start the next operation for
that device.

Department of CSE/IT, PSIT, Kanpur


Intel Pentium Processor Event-Vector Table

Department of CSE/IT, PSIT, Kanpur


I/O Services provided by OS
◼ Naming of files and devices. (On Unix, devices appear as files in the
/dev directory)
◼ Access control.
◼ Operations appropriate to the files and devices.
◼ Device allocation.
◼ Buffering, caching, and spooling to allow efficient communication
with devices.
◼ I/O scheduling.
◼ Error handling and failure recovery associated with devices
(command retries, for example).
◼ Device drivers to implement device-specific behaviors.

Department of CSE/IT, PSIT, Kanpur


Direct Memory Access
◼ Used to avoid programmed I/O (one byte at a time) for large data
movement
◼ Requires DMA controller
◼ Bypasses CPU to transfer data directly between I/O device and
memory
◼ OS writes DMA command block into memory
⚫ Source and destination addresses
⚫ Read or write mode
⚫ Count of bytes
⚫ Writes location of command block to DMA controller
⚫ Bus mastering of DMA controller – grabs bus from CPU
 Cycle stealing from CPU but still much more efficient
⚫ When done, interrupts to signal completion
◼ Version that is aware of virtual addresses can be even more efficient -
DVMA

Department of CSE/IT, PSIT, Kanpur


Six Step Process to Perform DMA Transfer

Department of CSE/IT, PSIT, Kanpur


Application I/O Interface
◼ I/O system calls encapsulate device behaviors in generic classes
◼ Device-driver layer hides differences among I/O controllers from kernel
◼ New devices talking already-implemented protocols need no extra
work
◼ Each OS has its own I/O subsystem structures and device driver
frameworks
◼ Devices vary in many dimensions
⚫ Character-stream or block
⚫ Sequential or random-access
⚫ Synchronous or asynchronous (or both)
⚫ Sharable or dedicated
⚫ Speed of operation
⚫ read-write, read only, or write only

Department of CSE/IT, PSIT, Kanpur


A Kernel I/O Structure

Department of CSE/IT, PSIT, Kanpur


Characteristics of I/O Devices

Department of CSE/IT, PSIT, Kanpur


Characteristics of I/O Devices (Cont.)

◼ Subtleties of devices handled by device drivers


◼ Broadly I/O devices can be grouped by the OS into
⚫ Block I/O
⚫ Character I/O (Stream)
⚫ Memory-mapped file access
⚫ Network sockets
◼ For direct manipulation of I/O device specific characteristics,
usually an escape / back door
⚫ Unix ioctl() call to send arbitrary bits to a device control
register and data to device data register

Department of CSE/IT, PSIT, Kanpur


Block and Character Devices

◼ Block devices include disk drives


⚫ Commands include read, write, seek
⚫ Raw I/O, direct I/O, or file-system access
⚫ Memory-mapped file access possible
 File
mapped to virtual memory and clusters brought via
demand paging
⚫ DMA
◼ Character devices include keyboards, mice, serial ports
⚫ Commands include get(), put()
⚫ Libraries layered on top allow line editing

Department of CSE/IT, PSIT, Kanpur


I/O Management – Part 2

Department of CSE/IT, PSIT, Kanpur


I/O Buffering
◼ I/O devices typically contain a small on-board memory where
they can store data temporarily before transferring to/from the
CPU.
⚫ A disk buffer stores a block when it is read from the disk.
⚫ It is transferred over the bus by the DMA controller into a
buffer in physical memory.
⚫ The DMA controller interrupts the CPU when the transfer is
done.

Department of CSE/IT, PSIT, Kanpur


Need of Buffer in OS
◼ To cope with speed mismatches between device and CPU.
⚫ Example: Compute the contents of a display in a buffer (slow)
and then zap the buffer to the screen (fast)
◼ To cope with devices that have different data transfer sizes.
⚫ Example: ftp brings the file over the network one packet at a
time. Stores to disk happen one block at a time.
◼ To minimize the time a user process is blocked on a write.
⚫ Writes => copy data to a kernel buffer and return control to the
user program. The write from the kernel buffer to the disk is
done later.

Department of CSE/IT, PSIT, Kanpur


Caching
◼ Improve disk performance by reducing the number of disk accesses.
◼ Idea: keep recently used disk blocks in main memory after the I/O call that
brought them into memory completes.
◼ Example: Read (disk Address)
⚫ If (block in memory) return value from memory Else Read Sector (disk
Address)
◼ Example: Write (disk Address)
⚫ If (block in memory) update value in memory
⚫ Else Allocate space in memory, read block from disk, and update value
in memory
◼ What should happen when we write to a cache?
⚫ write-through policy (write to all levels of memory containing the block,
including to disk). High reliability.
⚫ write-back policy (write only to the fastest memory containing the block,
write to slower memories and disk sometime later). Faster.

Department of CSE/IT, PSIT, Kanpur


A typical process of Read Call
◼ Step1- User process requests a read from a device.
◼ Step 2- OS checks if data is in a buffer. If not,
⚫ OS tells the device driver to perform input.
⚫ Device driver tells the DMA controller what to do and blocks itself.
⚫ DMA controller transfers the data to the kernel buffer when it has
all been retrieved from the device.
⚫ DMA controller interrupts the CPU when the transfer is complete.
◼ Step 3- OS transfers the data to the user process and places the
process in the ready queue.
◼ Step 4- When the process gets the CPU, it begins execution following
the system call.

Department of CSE/IT, PSIT, Kanpur


Network Devices

◼ Varying enough from block and character to have own


interface
◼ Linux, Unix, Windows and many others include socket
interface
⚫ Separates network protocol from network operation
⚫ Includes select() functionality
◼ Approaches vary widely (pipes, FIFOs, streams, queues,
mailboxes)

Department of CSE/IT, PSIT, Kanpur


Clocks and Timers

◼ Provide current time, elapsed time, timer


◼ Normal resolution about 1/60 second
◼ Some systems provide higher-resolution timers
◼ Programmable interval timer used for timings, periodic
interrupts
◼ ioctl() (on UNIX) covers odd aspects of I/O such as
clocks and timers

Department of CSE/IT, PSIT, Kanpur


Nonblocking and Asynchronous I/O
◼ Blocking - process suspended until I/O completed
⚫ Easy to use and understand
⚫ Insufficient for some needs
◼ Nonblocking - I/O call returns as much as available
⚫ User interface, data copy (buffered I/O)
⚫ Implemented via multi-threading
⚫ Returns quickly with count of bytes read or written
⚫ select() to find if data ready then read() or write()
to transfer
◼ Asynchronous - process runs while I/O executes
⚫ Difficult to use
⚫ I/O subsystem signals process when I/O completed

Department of CSE/IT, PSIT, Kanpur


Two I/O Methods

Synchronous Asynchronous

Department of CSE/IT, PSIT, Kanpur


Vectored I/O
◼ Vectored I/O allows one system call to perform multiple I/O
operations
◼ For example, Unix readve() accepts a vector of multiple
buffers to read into or write from
◼ This scatter-gather method better than multiple individual I/O
calls
⚫ Decreases context switching and system call overhead
⚫ Some versions provide atomicity
 Avoid
for example worry about multiple threads
changing data as reads / writes occurring

Department of CSE/IT, PSIT, Kanpur


Kernel I/O Subsystem
◼ Scheduling
⚫ Some I/O request ordering via per-device queue
⚫ Some OSs try fairness
⚫ Some implement Quality Of Service (i.e. IPQOS)
◼ Buffering - store data in memory while transferring between devices
⚫ To cope with device speed mismatch
⚫ To cope with device transfer size mismatch
⚫ To maintain “copy semantics”
⚫ Double buffering – two copies of the data
 Kernel and user
 Varying sizes
 Full / being processed and not-full / being used
 Copy-on-write can be used for efficiency in some cases

Department of CSE/IT, PSIT, Kanpur


Device-status Table

Department of CSE/IT, PSIT, Kanpur


Sun Enterprise 6000 Device-Transfer Rates

Department of CSE/IT, PSIT, Kanpur


Kernel I/O Subsystem

◼ Caching - faster device holding copy of data


⚫ Always just a copy
⚫ Key to performance
⚫ Sometimes combined with buffering
◼ Spooling - hold output for a device
⚫ If device can serve only one request at a time
⚫ i.e., Printing
◼ Device reservation - provides exclusive access to a device
⚫ System calls for allocation and de-allocation
⚫ Watch out for deadlock

Department of CSE/IT, PSIT, Kanpur


Error Handling

◼ OS can recover from disk read, device unavailable, transient


write failures
⚫ Retry a read or write, for example
⚫ Some systems more advanced – Solaris FMA, AIX
 Track error frequencies, stop using device with
increasing frequency of retry-able errors
◼ Most return an error number or code when I/O request fails
◼ System error logs hold problem reports

Department of CSE/IT, PSIT, Kanpur


I/O Protection

◼ User process may accidentally or purposefully attempt to


disrupt normal operation via illegal I/O instructions
⚫ All I/O instructions defined to be privileged
⚫ I/O must be performed via system calls
 Memory-mapped and I/O port memory locations must
be protected too

Department of CSE/IT, PSIT, Kanpur


Use of a System Call to Perform I/O

Department of CSE/IT, PSIT, Kanpur


Kernel Data Structures
◼ Kernel keeps state info for I/O components, including open file
tables, network connections, character device state
◼ Many, many complex data structures to track buffers, memory
allocation, “dirty” blocks
◼ Some use object-oriented methods and message passing to
implement I/O
⚫ Windows uses message passing
 Message with I/O information passed from user mode
into kernel
 Message modified as it flows through to device driver
and back to process
 Pros / cons?

Department of CSE/IT, PSIT, Kanpur


UNIX I/O Kernel Structure

Department of CSE/IT, PSIT, Kanpur


Power Management
◼ Not strictly domain of I/O, but much is I/O related
◼ Computers and devices use electricity, generate heat, frequently
require cooling
◼ OSes can help manage and improve use
⚫ Cloud computing environments move virtual machines
between servers
 Can end up evacuating whole systems and shutting them
down
◼ Mobile computing has power management as first class OS
aspect

Department of CSE/IT, PSIT, Kanpur


Power Management (Cont.)
◼ For example, Android implements
⚫ Component-level power management
 Understands relationship between components
 Build device tree representing physical device topology
 System bus -> I/O subsystem -> {flash, USB storage}
 Device driver tracks state of device, whether in use
 Unused component – turn it off
 All devices in tree branch unused – turn off branch
⚫ Wake locks – like other locks but prevent sleep of device when lock
is held
⚫ Power collapse – put a device into very deep sleep
 Marginal power use
 Only awake enough to respond to external stimuli (button
press, incoming call)

Department of CSE/IT, PSIT, Kanpur


I/O Requests to Hardware Operations
◼ Consider reading a file from disk for a process:
⚫ Determine device holding file
⚫ Translate name to device representation
⚫ Physically read data from disk into buffer
⚫ Make data available to requesting process
⚫ Return control to process

Department of CSE/IT, PSIT, Kanpur


Life Cycle of An I/O Request

Department of CSE/IT, PSIT, Kanpur


STREAMS

◼ STREAM – a full-duplex communication channel between a


user-level process and a device in Unix System V and beyond
◼ A STREAM consists of:
⚫ STREAM head interfaces with the user process
⚫ driver end interfaces with the device
⚫ zero or more STREAM modules between them
◼ Each module contains a read queue and a write queue

◼ Message passing is used to communicate between queues


⚫ Flow control option to indicate available or busy
◼ Asynchronous internally, synchronous where user process
communicates with stream head

Department of CSE/IT, PSIT, Kanpur


The STREAMS Structure

Department of CSE/IT, PSIT, Kanpur


Performance

◼ I/O a major factor in system performance:


⚫ Demands CPU to execute device driver, kernel I/O
code
⚫ Context switches due to interrupts
⚫ Data copying
⚫ Network traffic especially stressful

Department of CSE/IT, PSIT, Kanpur


Intercomputer Communications

Department of CSE/IT, PSIT, Kanpur


How to Improve Performance of OS
◼ Reduce number of context switches
◼ Reduce data copying
◼ Reduce interrupts by using large transfers, smart controllers,
polling
◼ Use DMA
◼ Use smarter hardware devices
◼ Balance CPU, memory, bus, and I/O performance for highest
throughput
◼ Move user-mode processes / daemons to kernel threads

Department of CSE/IT, PSIT, Kanpur


Disk Structure and
Disk Scheduling
(Part 1)

Department of CSE/IT, PSIT, Kanpur


Storage-Device Hierarchy

Department of CSE/IT, PSIT, Kanpur


Overview of Mass-Storage Structure
◼ The bulk of secondary storage for modern computer are
⚫ Hard disk drives (HDD)
⚫ Nonvolatile Memory (NVM)
◼ The most common tertiary storage is magnetic tape.
◼ In this chapter we describe:
⚫ The basic mechanism of those devices
⚫ How operating systems translate their physical
properties to logical storage via address mapping.

Department of CSE/IT, PSIT, Kanpur


Hard Disk Drive Moving-head Mechanism

◼ Platters range from .85” to 14” (historically)


⚫ Commonly 3.5”, 2.5”, and 1.8”
◼ Range from gigabytes through terabytes per drive

Department of CSE/IT, PSIT, Kanpur


Hard Disk Drives
◼ HDDs rotate at 60 to 250 times per second
◼ Transfer rate is rate at which data flow between drive and
computer
◼ Positioning time (random-access time) is time to move disk
arm to desired cylinder (seek time) and time for desired sector to
rotate under the disk head (rotational latency)
◼ Head crash results from disk head making contact with the disk
surface
⚫ That’s bad
◼ Disks can be removable
◼ Other types of storage media include CDs, DVDs, Blu-ray discs.
magnetic tape

Department of CSE/IT, PSIT, Kanpur


A 3.5" HDD with Cover Removed.

Department of CSE/IT, PSIT, Kanpur


The First Commercial Disk Drive

1956
IBM RAMDAC computer included the
IBM Model 350 disk storage system

5 million (7 bit) characters


50 x 24” platters
Access time = < 1 second

Department of CSE/IT, PSIT, Kanpur


Performance of Magnetic Disks
◼ Performance
⚫ Transfer Rate – theoretical – 6 Gb/sec
⚫ Effective Transfer Rate – real – 1Gb/sec
⚫ Seek time from 3ms to 12ms – 9ms common for desktop drives
⚫ Average seek time measured or calculated based on 1/3 of tracks
⚫ Latency based on spindle speed
 1 / (RPM / 60) = 60 / RPM
⚫ Average latency = ½ latency
◼ From Wikipedia

Department of CSE/IT, PSIT, Kanpur


Performance of Magnetic Disks (Cont.)

◼ Access Latency = Average access time = average seek


time + average latency
⚫ For fastest disk 3ms + 2ms = 5ms
⚫ For slow disk 9ms + 5.56ms = 14.56ms
◼ Average I/O time = average access time + (amount to
transfer / transfer rate) + controller overhead
◼ For example to transfer a 4KB block on a 7200 RPM disk
with a 5ms average seek time, 1Gb/sec transfer rate with a
.1ms controller overhead =
⚫ 5ms + 4.17ms + 0.1ms + transfer time
⚫ Transfer time = 4KB / 1Gb/s * 8Gb / GB * 1GB /
10242KB = 32 / (10242) = 0.031 ms
⚫ Average I/O time for 4KB block = 9.27ms + .031ms =
9.301ms

Department of CSE/IT, PSIT, Kanpur


Disk Structure
◼ Disk drives are addressed as large 1-dimensional arrays of logical
blocks, where the logical block is the smallest unit of transfer
⚫ Low-level formatting creates logical blocks on physical media
◼ The 1-dimensional array of logical blocks is mapped into the
sectors of the disk sequentially
⚫ Sector 0 is the first sector of the first track on the outermost
cylinder
⚫ Mapping proceeds in order through that track, then the rest of
the tracks in that cylinder, and then through the rest of the
cylinders from outermost to innermost
⚫ Logical to physical address should be easy
 Except for bad sectors
 Non-constant # of sectors per track via constant angular
velocity
⚫ Each sector 512B or 4KB – smallest I/O the drive can do

Department of CSE/IT, PSIT, Kanpur


Nonvolatile Memory (NVM)
◼ NVM is electrical vs. mechanical (HDD)
◼ Commonly composed of a controller and flash NAND die
semiconductor chips (A die is a small block of semiconducting
material, on which a given functional circuit is fabricated
⚫ Other forms include DRAM with battery backup, non-NAND
die like 3D XPoint
◼ Flash-memory-based NVM is frequently used in disk-drive like
container -> solid-state disk (SSD)

◼ Also can be in other formats like USB drive

Department of CSE/IT, PSIT, Kanpur


NVM (Cont.)
◼ In all forms, acts and treated the same way
◼ More reliable than HDD (no moving parts), can be faster (no
seek time or latency), consumes less power
◼ More expensive per MB, lower capacity, may have shorter
lifespan (writes wear it out)
◼ Higher speed means new connection methods
⚫ Direct to PCIe bus (called NVMe)
◼ Can be used in variety of ways – replacement for disk, caching
tier, etc

Department of CSE/IT, PSIT, Kanpur


NVM Details
◼ Read and written in “page” increment
⚫ Page size varies based on device
◼ Cannot overwrite data, must erase it first
◼ Erase occurs in ”block” increment composed of several
pages
◼ Erase operation takes much longer than read or write
◼ NAND wears out a bit with every erase
⚫ ~100,000 program-erase cycles until cells no longer
retain data

Department of CSE/IT, PSIT, Kanpur


NVM Device Controller
◼ Several algorithms, usually implemented in NVM device
controller
⚫ So operating system blissfully just reads and writes
blocks and device deals with the physics
⚫ But can impact performance, so worth knowing about
◼ NAND block with valid and invalid pages

Department of CSE/IT, PSIT, Kanpur


NVM Controller Algorithms
◼ NAND cannot be overwritten, therefore:
⚫ There are usually pages containing invalid data.
⚫ To track which logical blocks contain the valid data, the
controller maintains a Flash Translation Layer (FTL).
⚫ This table maintains the mapping of which physical page
contains the currently valid logical block.
⚫ This table It is also used to track physical block state
(does the block only contain invalid pages and therefore
can be erased).
⚫ Each logical block can have many versions, each stored
in a physical page, one valid the others invalid.

Department of CSE/IT, PSIT, Kanpur


Full SSD Controller Algorithms
◼ Full SSD (all pages written to, some hold valid data and others
invalid)
⚫ Logically space available, physically no where to write data
⚫ Garbage collection copies good data from blocks with mix
of valid and invalid pages to other locations, freeing up
blocks to erase
 But if device full, no where to copy good pages
 Over-provisioning (20% of media set aside) as target
for GC writes
 As blocks invalid or made invalid by GC are erased,
placed into overprovision pool if device full or returned
to free pool

Department of CSE/IT, PSIT, Kanpur


NVM Controller Algorithms

◼ Overprovision helps with wear leveling


⚫ Want media to wear out at the same time, not have a
hot spot (repeated writes) wear out early
⚫ Track # of writes per block, GC and over provisioning
space used to write to less written to blocks
◼ Data protected with NVM via ECC
⚫ Too many errors and mark block as bad to not use it
⚫ Uncorrectable needs to be recovered via RAID

Department of CSE/IT, PSIT, Kanpur


Magnetic Tape

◼ Was early secondary-storage medium


⚫ Evolved from open spools to cartridges
◼ Relatively permanent and holds large quantities of data
◼ Access time slow
◼ Random access ~1000 times slower than disk
◼ Mainly used for backup, storage of infrequently-used data,
transfer medium between systems

Department of CSE/IT, PSIT, Kanpur


Magnetic Tape

◼ Kept in spool and wound or rewound past read-write


head
◼ Once data under head, transfer rates comparable to
disk
⚫ 140MB/sec and greater
◼ GB to TB typical storage
◼ Common technologies are LTO-{5,6}, SDLT, 4, 8, and
19mm, and ¼ and ½” widths

Department of CSE/IT, PSIT, Kanpur


Secondary Storage Connection Methods

◼ Host-attached storage accessed through I/O ports talking to


I/O busses
◼ Several bus technologies including advanced technology
attachment (ATA), serial ATA (SATA), eSATA, universal
serial bus (USB), fibre channel (FC), serial attached SCSI
(SAS)
◼ Data transfers are carried out by special electronic processors:
controllers
⚫ Host controller is in computer, device controller built into
storage device
⚫ Talk to each other, usually via memory-mapped I/O ports
◼ Device controllers have built in caches
⚫ Data transfer from media into cache, then over bus to host
(and vice versa)

Department of CSE/IT, PSIT, Kanpur


Address Mapping

◼ Storage devices addressed as one-dimensional array of logical


blocks
⚫ Logical block is smallest unit of transfer
⚫ Each logical block maps to physical sectors or pages on
device
 For example, logical block 0 might be sector 0 on cylinder
0 on platter 0 of an HDD
⚫ Easier to use logical address <0> - <N> than address
containing <sector, cylinder, head> or <chip, block, page>
⚫ Also, defective sectors / blocks can be mapped out of use by
logical to physical mapping and logical addresses still
sequential
⚫ Number of sectors per track is not constant on some drives,
also requiring logical addressing to hide complexity

Department of CSE/IT, PSIT, Kanpur


Address Mapping
◼ Some media / devices use constant linear velocity
(CLV) (CD, DVD)
⚫ Density of bits per track uniform
⚫ Farther a track from the center of the disk -> greater
its length -> more sectors
⚫ Drive increases rotational speed as head moves
outward to keep same rate of data under the
read/write head
◼ Alternatively disk rotation speed can stay constant –
constant angular velocity (CAV) (HDD)
⚫ Density of bits decreases from inner to outer tracks
⚫ (Blu-ray drives can do both CAV and CLV depending
on media etc.)

Department of CSE/IT, PSIT, Kanpur


Disk Structure and
Disk Scheduling
(Part 2)

Department of CSE/IT, PSIT, Kanpur


Disk Scheduling
◼ The operating system is responsible for using hardware
efficiently — for the disk drives, this means having a fast
access time and disk bandwidth
◼ Minimize seek time
◼ Seek time  seek distance
◼ Disk bandwidth is the total number of bytes transferred,
divided by the total time between the first request for
service and the completion of the last transfer

Department of CSE/IT, PSIT, Kanpur


Disk Scheduling (Cont.)
◼ There are many sources of disk I/O request
⚫ OS
⚫ System processes
⚫ Users processes
◼ I/O request includes input or output mode, disk address,
memory address, number of sectors to transfer
◼ OS maintains queue of requests, per disk or device
◼ Idle disk can immediately work on I/O request, busy disk
means work must queued.
⚫ Optimization algorithms only make sense when a
queue exists

Department of CSE/IT, PSIT, Kanpur


Disk Scheduling (Cont.)
◼ Note that drive controllers have small buffers and can manage a
queue of I/O requests (of varying “depth”)
◼ Several algorithms exist to schedule the servicing of disk I/O
requests
◼ The analysis is true for one or many platters
◼ We illustrate scheduling algorithms with a request queue (0-
199)

98, 183, 37, 122, 14, 124, 65, 67

Head pointer 53

Department of CSE/IT, PSIT, Kanpur


FCFS
Illustration shows total head movement of 640 cylinders

Department of CSE/IT, PSIT, Kanpur


SSTF
◼ Shortest Seek Time First -- selects the request with the
minimum seek time from the current head position
◼ SSTF scheduling is a form of SJF scheduling; may cause
starvation of some requests
◼ Illustration shows total head movement of 236 cylinders

Department of CSE/IT, PSIT, Kanpur


SCAN

◼ SCAN algorithm. The disk arm starts at one end of the


disk, and moves toward the other end, servicing requests
until it gets to the other end of the disk, where the head
movement is reversed and servicing continues.
◼ Sometimes it is called the elevator algorithm.
◼ Illustration shows total head movement of 208 cylinders.
◼ But note that if requests are uniformly dense, largest density
at other end of disk and those wait the longest

Department of CSE/IT, PSIT, Kanpur


SCAN (Cont.)

Department of CSE/IT, PSIT, Kanpur


C-SCAN

◼ Provides a more uniform wait time than SCAN


◼ The head moves from one end of the disk to the other,
servicing requests as it goes
⚫ When it reaches the other end, however, it immediately
returns to the beginning of the disk, without servicing
any requests on the return trip
◼ Treats the cylinders as a circular list that wraps around from
the last cylinder to the first one

Department of CSE/IT, PSIT, Kanpur


C-SCAN (Cont.)

Department of CSE/IT, PSIT, Kanpur


LOOK and C-LOOK
◼ LOOK is a version of SCAN. Arm only goes as far as
the last request in each direction, then reverses direction
immediately, without first going all the way to the end of
the disk
◼ C-LOOK a version of C-SCAN. Arm only goes as far as
the last request in one direction, then reverses direction
immediately, without first going all the way to the end of
the disk

Department of CSE/IT, PSIT, Kanpur


C-LOOK (Cont.)

Department of CSE/IT, PSIT, Kanpur


Selecting a Disk-Scheduling Algorithm

◼ SSTF is common and has a natural appeal


◼ SCAN and C-SCAN perform better for systems that place
a heavy load on the disk
⚫ Less starvation
◼ Performance depends on the number and types of
requests
◼ Requests for disk service can be influenced by the file-
allocation method
⚫ And metadata layout

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm
Practice Problem
◼ A disk contains 200 tracks(i.e. 0-199). With a given request queue
track no. 82,170,43,140,24,16, and, 190 calculate the total number of
track movement (i.e. seek time) by R/W head with the help of following
disk scheduling Algorithms. Assume that the current position of R/W
head is 50.
◼ FCFS
◼ SSTF
◼ SCAN (Move towards right)
◼ SCAN (Move towards left)
◼ CSCAN (Move towards right)
◼ CSCAN (Move towards left)
◼ LOOK (Move towards right)
◼ LOOK (Move towards left)
◼ CLOOK (Move towards right)
◼ CLOOK (Move towards left)
◼ If R/W head takes 1ns to move from one track to other, then calculate
the total time in each case.

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)

◼ The disk-scheduling algorithm should be written as a


separate module of the operating system, allowing it to be
replaced with a different algorithm if necessary
◼ Either SSTF or LOOK is a reasonable choice for the default
algorithm
◼ What about rotational latency?
⚫ Difficult for OS to calculate
◼ How does disk-based queuing effect OS queue ordering
efforts?

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)

◼ Problem 1
◼ Suppose a disk has 201 cylinders, numbered from 0 to 200. At some time
the disk arm is at cylinder 100, and there is a queue of disk access requests
for cylinders 30, 85, 90, 100, 105, 110, 135 and 145. If Shortest-Seek Time
First (SSTF) is being used for scheduling the disk access, the request for
cylinder 90 is serviced after servicing number of requests.
(A) 1
(B) 2
(C) 3
(D) 4

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)

◼ Problem 1
◼ Suppose a disk has 201 cylinders, numbered from 0 to 200. At some time
the disk arm is at cylinder 100, and there is a queue of disk access requests
for cylinders 30, 85, 90, 100, 105, 110, 135 and 145. If Shortest-Seek Time
First (SSTF) is being used for scheduling the disk access, the request for
cylinder 90 is serviced after servicing number of requests.
(A) 1
(B) 2
(C) 3
(D) 4
◼ Answer
◼ The disk will service that request first whose cylinder number is closest to its
arm. Hence 1st serviced request is for cylinder no 100 ( as the arm is itself
pointing to it ), then 105, then 110, and then the arm comes to service
request for cylinder 90. Hence before servicing request for cylinder 90, the
disk would had serviced 3 requests.
◼ Hence option C.

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)

◼ Problem 1
◼ Suppose a disk has 201 cylinders, numbered from 0 to 200. At some time
the disk arm is at cylinder 100, and there is a queue of disk access requests
for cylinders 30, 85, 90, 100, 105, 110, 135 and 145. If Shortest-Seek Time
First (SSTF) is being used for scheduling the disk access, the request for
cylinder 90 is serviced after servicing number of requests.
(A) 1
(B) 2
(C) 3
(D) 4
◼ Answer
◼ The disk will service that request first whose cylinder number is closest to its
arm. Hence 1st serviced request is for cylinder no 100 ( as the arm is itself
pointing to it ), then 105, then 110, and then the arm comes to service
request for cylinder 90. Hence before servicing request for cylinder 90, the
disk would had serviced 3 requests.
◼ Hence option C.

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)
◼ Problem 2
◼ Suppose the following disk request sequence (track numbers) for a disk
with 100 tracks is given: 45, 20, 90, 10, 50, 60, 80, 25, 70. Assume that the
initial position of the R/W head is on track 50. The additional distance that
will be traversed by the R/W head when the Shortest Seek Time First
(SSTF) algorithm is used compared to the SCAN (Elevator) algorithm
(assuming that SCAN algorithm moves towards 100 when it starts
execution) is tracks

(A) 8 (B) 9 (C) 10 (D) 11

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)
◼ Problem 2
◼ Suppose the following disk request sequence (track numbers) for a disk
with 100 tracks is given: 45, 20, 90, 10, 50, 60, 80, 25, 70. Assume that the
initial position of the R/W head is on track 50. The additional distance that
will be traversed by the R/W head when the Shortest Seek Time First
(SSTF) algorithm is used compared to the SCAN (Elevator) algorithm
(assuming that SCAN algorithm moves towards 100 when it starts
execution) is tracks

(A) 8 (B) 9 (C) 10 (D) 11


◼ Solution
◼ Given a disk with 100 tracks And Sequence 45, 20, 90, 10, 50, 60, 80, 25,
70. Initial position of the R/W head is on track 50.
◼ By using SSTF, requests are served and the total Distance Traveled is
130
◼ If Simple SCAN is used, requests are served and the total Distance
Traveled is 140
◼ Hence Less Distance traveled in SSTF = 130 - 140 = 10
◼ So answer is C
Department of CSE/IT, PSIT, Kanpur
Disk-Scheduling Algorithm (Cont.)

◼ Problem 3
◼ Consider an operating system capable of loading and executing a
single sequential user process at a time. The disk head scheduling
algorithm used is First Come First Served (FCFS). If FCFS is replaced
by Shortest Seek Time First (SSTF), claimed by the vendor to give 50%
better benchmark results, what is the expected improvement in the I/O
performance of user programs?
(A) 50% (B) 40% (C) 25% (D) 0%

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)

◼ Problem 3
◼ Consider an operating system capable of loading and executing a
single sequential user process at a time. The disk head scheduling
algorithm used is First Come First Served (FCFS). If FCFS is replaced
by Shortest Seek Time First (SSTF), claimed by the vendor to give 50%
better benchmark results, what is the expected improvement in the I/O
performance of user programs?
(A) 50% (B) 40% (C) 25% (D) 0%
◼ Answer
◼ Since Operating System can execute a single sequential user process
at a time, the disk is accessed in FCFS manner always. The OS never
has a choice to pick an I/O from multiple IOs as there is always one
I/O at a time.
◼ Hence answer is D

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)

◼ Problem 4
◼ Suppose that we want to store a file with 60,000 fixed-length data records
where each record requires 80 bytes and records are not allowed to span
two sectors - How many cylinders are required for this file ?
◼ The Disk Drive have the following specifications :
⚫ bytes per sector = 512
⚫ tracks per cylinder = 16
⚫ sectors per track = 63
⚫ cylinders = 1654

Department of CSE/IT, PSIT, Kanpur


Disk-Scheduling Algorithm (Cont.)

◼ Problem 4
◼ Suppose that we want to store a file with 60,000 fixed-length data records
where each record requires 80 bytes and records are not allowed to span
two sectors - How many cylinders are required for this file ?
◼ The Disk Drive have the following specifications :
⚫bytes per sector = 512
⚫ tracks per cylinder = 16
⚫ sectors per track = 63
⚫ cylinders = 1654
◼ Answer :
⚫ Each sector can hold 512/80 = 6 records •
⚫ The file requires 60,000/6 = 10,000 sectors •
⚫ One cylinder can hold 63 × 16 = 1008 sectors •
⚫ So the number of cylinders required is approximately 10,000/1008 =
9.93 cylinders.

Department of CSE/IT, PSIT, Kanpur


Network-Attached Storage
◼ Network-attached storage (NAS) is storage made available
over a network rather than over a local connection (such as
a bus)
⚫ Remotely attaching to file systems
⚫ Manages storage, provides data management, has
network interfaces, and talks network storage protocols
◼ NFS and CIFS are common protocols
◼ Implemented via remote procedure calls (RPCs) between
host and storage over typically TCP or UDP on IP network

Department of CSE/IT, PSIT, Kanpur


Network-Attached Storage (Cont.)

◼ iSCSI protocol uses IP network to carry the SCSI protocol


⚫ Remotely attaching to devices (blocks)

Department of CSE/IT, PSIT, Kanpur


Cloud Storage
◼ Similar to NAS
⚫ Provides storage across the network
⚫ Usually not owned by the company or user, but
provided for a fee (based on time, storage capacity
used, I/O done, etc.)
⚫ But across a WAN rather than a LAN
⚫ Frequently too slow, more prone to connection
interruption than NAS so CIFS, NFS, iSCSI possibly
but less used
⚫ Frequently use their own APIs, and apps that use those
APIs to do I/O
 Dropbox, Microsoft OneDrive, Apple iCloud, etc.

Department of CSE/IT, PSIT, Kanpur


Storage Area Network
◼ Common in large storage environments
◼ Multiple hosts attached to multiple storage arrays - flexible

Department of CSE/IT, PSIT, Kanpur


Reliability and Redundancy
◼ Mean time to failure. The average time it takes a disk to fail.
◼ Mean time to repair. The time it takes (on average) to replace a
failed disk and restore the data on it.
◼ Mirroring. Copy of a disk is duplicated on another disk.
⚫ Consider disk with 100,000 mean time to failure and 10 hour
mean time to repair
 Mean time to data loss is 100, 0002 / (2 ∗ 10) = 500 ∗ 106
hours, or 57,000 years If mirrored disks fail independently,
◼ Several improvements in disk-use techniques involve the use of
multiple disks working cooperatively

Department of CSE/IT, PSIT, Kanpur


Redundant Array of
Independent Disks(RAID)

Department of CSE/IT, PSIT, Kanpur


RAID
◼ What is RAID?

◼ RAID is an acronym for :


Redundant Array of Independent Disk
or
Redundant Array of Inexpensive Disks.
◼ In fact, RAID is the way of combining several independent and
relatively small disks into a single storage of a large size.
◼ The disks included into the array are called array members. The
disks can be combined into the array in different ways which are
known as RAID levels.

Department of CSE/IT, PSIT, Kanpur


RAID
◼ Key evaluation points for a RAID System

◼ Reliability: How many disk faults can the system tolerate?

◼ Availability: What fraction of the total session time is a system


in uptime mode, i.e. how available is the system for actual use?
◼ Performance: How good is the response time? How high is the
throughput (rate of processing work)? Note that performance
contains a lot of parameters and not just the two.
◼ Capacity: Given a set of N disks each with B blocks, how much
useful capacity is available to the user?

Department of CSE/IT, PSIT, Kanpur


RAID Characteristics
◼ Each of RAID levels has its own characteristics of:
⚫ Fault-tolerance which is the ability to survive of one or several
disk failures.
⚫ Performance which shows the change in the read and write
speed of the entire array as compared to a single disk.
⚫ The capacity of the array which is determined by the amount of
user data that can be written to the array. The array capacity
depends on the RAID level and does not always match the sum of
the sizes of the RAID member disks. To calculate the capacity of
the particular RAID type and a set of the member disks you can
use a free online RAID calculator.

Department of CSE/IT, PSIT, Kanpur


RAID Organisation
◼ Two independent aspects are clearly distinguished in the
RAID organization.
⚫ The organization of data in the array (RAID storage techniques:
striping, mirroring, parity, combination of them).
⚫ Implementation of each particular RAID installation -
hardware or software.

Department of CSE/IT, PSIT, Kanpur


RAID Storage Structure
◼ The main methods of storing data in the array are:
⚫ Striping - splitting the flow of data into blocks of a certain size
(called "block size") then writing of these blocks across the RAID
one by one. This way of data storage affects on the performance.
⚫ Mirroring - is a storage technique in which the identical copies of
data are stored on the RAID members simultaneously. This type
of data placement affects the fault tolerance as well as the
performance.
⚫ Parity - is a storage technique which is utilized striping and
checksum(error checking) methods. In parity technique, a certain
parity function is calculated for the data blocks. If a drive fails, the
missing block are recalculated from the checksum, providing the
RAID fault tolerance.
◼ All the existing RAID types are based on striping, mirroring, parity, or
combination of these storage techniques.

Department of CSE/IT, PSIT, Kanpur


RAID LEVELS
◼ RAID 0 (stripes) (also known as a stripe set or striped volume) splits
("stripes") data evenly across two or more disks,
without parity information, redundancy, or fault tolerance.

RAID 0

Disk 1 Disk 2 Disk 3 Disk 4


A1 A2 A3 A4
B1 B2 B3 B4
C1 C2 C3 C4
D1 D2 D3 D4
E1 E2 E3 E4
◼ Evaluation:
◼ Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be recovered.
◼ Capacity: N*B
The entire space is being used to store data. Since there is no duplication, N disks each
having B blocks are fully utilized.

Department of CSE/IT, PSIT, Kanpur


RAID LEVELS
◼ RAID 1(Mirroring): Blocks are “striped” across disks.

RAID 1

Disk 1 Disk 2
A1 A1
B1 B1
C1 C1
D1 D1
E1 E1

◼ Evaluation:
◼ Assume a RAID system with mirroring level 2.
◼ Reliability: 1 to N/2
Capacity: N*B/2

Department of CSE/IT, PSIT, Kanpur


RAID LEVELS
◼ RAID 1+0 (Mirroring and Striped):
Blocks are first mirroring and then striped. Other wise known
as Nested RAID
RAID 0

RAID 1 RAID 1

Disk 1 Disk 2 Disk 3 Disk 4


A1 A1 A2 A2
B1 B1 B2 B2
C1 C1 C2 C2
D1 D1 D2 D2
E1 E1 E2 E2

◼ Evaluation:
◼ Assume a RAID system with mirroring level 1+0.
◼ Reliability: 1 to N/2 Capacity: N*B/2

Department of CSE/IT, PSIT, Kanpur


RAID LEVELS

◼ RAID-2 consists of bit-level striping using a Hamming Code parity.


◼ RAID-3 consists of byte-level striping with a dedicated parity.

These two are now a days are absolute and less commonly used.

Department of CSE/IT, PSIT, Kanpur


RAID LEVELS
◼ RAID 4 (Block-Level Striping with Dedicated Parity) : Instead of
duplicating data, this adopts a parity-based approach.
◼ Parity is calculated using a simple XOR function. If the data bits are
0,0,0,1 the parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0
the parity bit is XOR(0,1,1,0) = 0. A simple approach is that even
number of ones results in parity 0, and an odd number of ones
results in parity 1.
RAID 4

Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 D1 D2 D3 D4 P


A1 A2 A3 A4 Ap 0 0 0 1 1
B1 B2 B3 B4 Bp
C1 C2 C3 C4 Cp 0 1 1 0 0
D1 D2 D3 D4 Dp
E1 E2 E3 E4 Ep

◼ Evaluation:
⚫ Reliability: 1
⚫ Capacity: (N-1)*B
Department of CSE/IT, PSIT, Kanpur
RAID LEVELS
◼ RAID 5 (Block-Level Striping with Distributed Parity) : This is a
slight modification of the RAID-4 system where the only difference is
that the parity rotates among the drives.
RAID 5

Disk 1 Disk 2 Disk 3 Disk 4 Disk 5


A1 A2 A3 A4 Ap
B1 B2 BP B3 B4
C1 Cp C2 C3 C4
Dp D1 D2 D3 D4
E1 E2 E3 Ep E4
◼ Evaluation:
⚫ Reliability: 1
⚫ Capacity: (N-1)*B

Department of CSE/IT, PSIT, Kanpur


RAID LEVELS
◼ RAID 6 (Block-Level Striping with Distributed two Parity) : This
is a slight modification of the RAID-4 system where the only
difference is that the parity rotates among the drives.
RAID 6

Disk 1 Disk 2 Disk 3 Disk 4 Disk 5


A1 A2 A3 Ap Aq
B1 B2 BP Bq B3
C1 Cp Cq C2 C3
Dp Dq D1 D2 D3
E1 Ep E2 Eq E3
◼ Evaluation:
⚫ Reliability: 1
⚫ Capacity: (N-1)*B

Department of CSE/IT, PSIT, Kanpur


RAID Levels

Department of CSE/IT, PSIT, Kanpur

You might also like