Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 65

Chapter 14: File System

Implementation

Operating System Concepts – 10 th Edition Silberschatz, Galvin and Gagne


Chapter 14: File System Implementation
 File-System Structure
 File-System Operations
 Directory Implementation
 Allocation Methods
 Free-Space Management
 Efficiency and Performance
 Recovery
 Example: WAFL File System

Operating System Concepts – 10 th Edition 14.2 Silberschatz, Galvin and Gagne


Objectives

 To describe the details of implementing local file


systems and directory structures
 To describe the implementation of remote file
systems
 To discuss block allocation and free-block algorithms
and trade-offs

Operating System Concepts – 10 th Edition 14.3 Silberschatz, Galvin and Gagne


File-System Structure
 File structure
 Logical storage unit
 Collection of related information
 File system resides on secondary storage (disks)
 Two characteristics of disks make them convenient for file system:
1. A disk can be rewritten in place; it is possible to read a block
from the disk, modify the block, and write it back into the same
block.
2. A disk can access directly any block of information it contains.
 Thus, it is simple to access any file either sequentially or
randomly, and switching from one file to another requires
the drive moving the read–write heads and waiting for the
media to rotate.
 Nonvolatile memory (NVM) devices are increasingly used for file
storage and thus as a location for file systems.
 They differ from hard disks in that they cannot be rewritten in-
place and they have different performance characteristics.
Operating System Concepts – 10 th Edition 14.4 Silberschatz, Galvin and Gagne
File-System Structure
 Disk provides in-place rewrite and random access
 I/O transfers performed in blocks of sectors (usually 512 or 4096
bytes)
 NVM devices usually have blocks of 4,096 bytes
 File systems
 Provides efficient and convenient access to disk by allowing data to
be stored, located, and retrieved easily
 Interface to user involves defining a file and its attributes, the
operations allowed on a file, and the directory structure for
organizing files.
 Algorithms and data structures to map the logical file system to
physical secondary storage devices
 File control block – storage structure consisting of information about a
file
 File system organized into layers
 Each level in the design uses the features of lower levels to create new
features for use by higher levels.
 Device driver controls the physical device
Operating System Concepts – 10 th Edition 14.5 Silberschatz, Galvin and Gagne
Layered File System
 The I/O Control level consists of device
drivers and interrupt handlers to transfer
information between the main memory and
the disk system.
 A device driver can be thought of as a
translator.
 Its input consists of high-level commands,
such as “retrieve block 123” , “read drive1,
cylinder 72, track 2, sector 10, into memory
location 1060”
 Its output consists of low-level, hardware-
specific instructions that are used by the
hardware controller, which interfaces the I/O
device to the rest of the system.
 The device driver usually writes specific bit
patterns to special locations in the I/O
controller's memory to tell the controller
which device location to act on and what
actions to take.
Operating System Concepts – 10 th Edition 14.6 Silberschatz, Galvin and Gagne
File System Layers
 Basic file system (called the “block I/O subsystem” in Linux)
needs only to issue generic commands to the appropriate device
driver to read and write blocks on the storage device.
 It is also concerned with I/O request scheduling.
 Also manages memory buffers and caches (allocation,
freeing, replacement)
 Buffers hold data in transit
 Caches hold frequently used file-system metadata to improve
performance
 File organization module understands files, logical address, and
physical blocks
 Translates logical block # to physical block #
 Manages free space, disk allocation

Operating System Concepts – 10 th Edition 14.7 Silberschatz, Galvin and Gagne


File System Layers (Cont.)
 Logical file system manages metadata information
 Metadata includes all of the file-system structure except
the actual data (or contents of the files).
 Translates file name into file number, file handle, location
by maintaining file control blocks (inodes in UNIX)
 A file-control block (FCB) (an inode in UNIX file systems)
contains information about the file, including ownership,
permissions, and location of the file contents.
 Directory management
 Protection
 Layering useful for reducing complexity, redundancy, and
duplication of code, but adds overhead and can decrease
performance
 Logical layers can be implemented by any coding method
according to OS designer

Operating System Concepts – 10 th Edition 14.8 Silberschatz, Galvin and Gagne


File System Layers (Cont.)
 Many file systems, sometimes many within an operating
system
 Each with its own format
 CD-ROM is ISO 9660;
 Unix has UNIX File System (UFS), based on the Berkley
Fast File System (FFS);
 Windows has FAT, FAT32, NTFS (or WindowsNT File
System), as well as CD-ROM, DVD and Blu-ray file-system
formats
 Linux supports more than 130 types, the standard Linux
file system is known as the extended file system, with
the most common versions being ext3 and ext4;
 There are also distributed file systems in which a file
system on a server is mounted by one or more client
computers across a network.
 New ones still arriving – ZFS, GoogleFS, Oracle ASM, FUSE

Operating System Concepts – 10 th Edition 14.9 Silberschatz, Galvin and Gagne


File-System Operations
 We have system calls at the API level, but how do we implement
their functions?
 On-storage and in-memory structures
 Boot control block (per volume) contains info needed by system to
boot OS from that volume
 Needed if volume contains OS, usually first block of volume
 In UFS, it is called the boot block. In NTFS, it is the partition
boot sector.
 Volume control block (per volume) contains volume details
 Total # of blocks, # of free blocks, block size, free block
pointers or array, and a free-FCB count and FCB pointers.
 In UFS, this is called a superblock. In NTFS, it is stored in the
master file table.
 A directory structure (per file system) is used to organize the files.
 In UFS, this includes file names and associated inode numbers.
 In NTFS, it is stored in the master file table.

Operating System Concepts – 10 th Edition 14.10 Silberschatz, Galvin and Gagne


File-System Operation (Cont.)
 A per-file File Control Block (FCB) contains many details
about the file
 inode number, permissions, size, dates
 NFTS stores info in master file table using relational DB
structures
 To create a new file, a process calls the logical file system.
 The logical file system knows the format of the directory
structures.
 It allocates a new FCB.
 The system then reads the appropriate directory into
memory, updates it with the new file name and FCB, and
writes it back to the file system.

File Control Block (FCB)


Operating System Concepts – 10 th Edition 14.11 Silberschatz, Galvin and Gagne
In-Memory File System Structures

 The in-memory information is used for both file-system


management and performance improvement via caching.
 The data are loaded at mount time, updated during file-system
operations, and discarded at dismount.
 An in-memory mount table contains information about each
mounted volume.
 An in-memory directory-structure cache holds the directory
information of recently accessed directories.
 The system-wide open-file table contains a copy of the FCB of
each open file, as well as other information.
 The per-process open-file table contains pointers to the
appropriate entries in the system-wide open-file table, as well
as other information, for all files the process has open.
 Buffers hold file-system blocks when they are being read from
or written to a file system.

Operating System Concepts – 10 th Edition 14.12 Silberschatz, Galvin and Gagne


In-Memory File System Structures
 UNIX treats a directory exactly the same as a file, one with a
“type” field indicating that it is a directory.
 Windows implement separate system calls for files and
directories and treat directories as entities separate from files.
 The logical file system can call the file-organization module to
map the directory I/O into storage block locations, which are
passed on to the basic file system and I/O control system.
 The open() call passes a file name to the logical file system.
 Search the system-wide open-file table to see if the file is
already in use
 If it is, a per-process open-file table entry is created pointing
to the existing system-wide open-file table.
 If the file is not already open, the directory structure is
searched for the given file name.
 FCB of the file is copied into a system-wide open-file table in
memory, which also tracks the number of processes that
have the file open.
 Parts of the directory structure are usually cached in memory
Operating System Concepts – 10 th Edition 14.13 Silberschatz, Galvin and Gagne
In-Memory File System Structures
 Next, an entry is made in the per-process open-file table, with
a pointer to the entry in the system-wide open-file table and
some other fields.
 The open() call returns a pointer to the appropriate entry in
the per-process file-system table.
 UNIX systems refer to the entry as a file descriptor; Windows
refers to it as a file handle.
 When a process closes the file, the per-process table entry is
removed, and the system-wide entry's open count is
decremented.
 When all users that have opened the file close it, any updated
metadata are copied back to the disk-based directory structure,
and the system-wide open-file table entry is removed.

Operating System Concepts – 10 th Edition 14.14 Silberschatz, Galvin and Gagne


In-Memory File System Structures

In-memory file-system structures. (a) File open. (b) File read.

Operating System Concepts – 10 th Edition 14.15 Silberschatz, Galvin and Gagne


Directory Implementation
 Linear list of file names with pointer to the data blocks
 Simple to program
 Time-consuming to execute
 Linear search time
 Could keep ordered alphabetically via linked list or use B+
tree
 create a new file - search the directory, then add a new entry at
the end of the directory.
 delete a file - search the directory for the named file and then
release the space allocated to it.
 To reuse the directory entry:
 mark the entry as unused, by assigning it a special name,
such as an all-blank name, assigning it an invalid inode
number (such as 0), or by including a used–unused bit in
each entry
 attach it to a list of free directory entries.
 copy the last entry in the directory into the freed location
and to decrease the length
Operating System Concepts – 10 Edition
th 14.16
of the directory. Silberschatz, Galvin and Gagne
Directory Implementation
 Hash Table – linear list with hash data structure
 Decreases directory search time
 Collisions – situations where two file names hash to the
same location
 Only good if entries are fixed size.
 For larger hash table, a new hash function required to
map file names to the larger range, and reorganize the
existing directory entries to reflect their new hash-
function values.
 Chained-overflow method, where each hash entry can be
a linked list instead of an individual value

Operating System Concepts – 10 th Edition 14.17 Silberschatz, Galvin and Gagne


Allocation Methods - Contiguous
 An allocation method refers to how disk blocks are allocated for
files:
 Contiguous allocation – each file occupies set of contiguous blocks
 Best performance in most cases
 The number of disk seeks and seek time is minimal
 Simple – only starting location (block #) and length (number of
blocks) are required
 Fast sequential access, easy direct (random) access
 Problems include finding space for file, knowing file size,
external fragmentation, need for compaction off-line (downtime)
or on-line (performance penalty)
 How much file size – difficult to estimate, if file size known, pre-
allocation is inefficient as it may lead to internal fragmentation
 Modified scheme – initial allocation, then add extents
what happens if
file c needs 2
sectors???
file a (base=1,len=3) file b (base=5,len=2)
Operating System Concepts – 10 th Edition 14.18 Silberschatz, Galvin and Gagne
Contiguous Allocation

 Mapping from logical to


physical

LA/512

Block to be accessed = Q +
starting address
Displacement into block = R

Operating System Concepts – 10 th Edition 14.19 Silberschatz, Galvin and Gagne


Extent-Based Systems

 Many newer file systems (i.e., Veritas File System)


use a modified contiguous allocation scheme

 Extent-based file systems allocate disk blocks in


extents

 An extent is a contiguous block of disks


 Extents are allocated for file allocation
 A file consists of one or more extents

Operating System Concepts – 10 th Edition 14.20 Silberschatz, Galvin and Gagne


Allocation Methods - Linked
 Linked allocation – each file a linked list of blocks
 File ends at nil pointer
 No external fragmentation
 Each block contains pointer to next block
 No compaction, external fragmentation
 Free space management system called when new block needed
 Improve efficiency by clustering blocks into groups but
increases internal fragmentation
 Reliability can be a problem, lose block, lose rest of the file
 Locating a block can take many I/Os and disk seeks
 Can grow files dynamically and free list is managed same as file
 Sequential access: seek between each block; Random access:
horrible

how do you find


the last block in a?

file a (base=1) file b (base=5)


Operating System Concepts – 10 th Edition 14.21 Silberschatz, Galvin and Gagne
Example: DOS FS (simplified)
 Used linked list; but instead of embedding links in pages, they used
a separate structure called the File Allocation Table (FAT).

FAT (16-bit entries)


0 free
file a
Directory (5) 1 eof
6 4 3
2 1
a: 6
3 eof file b
b: 2
4 3 2 1
5 eof
6 4
... block on the disk and the entries
 The FAT has an entry for each
corresponding to the blocks of a particular file are linked up.

Operating System Concepts – 10 th Edition 14.22 Silberschatz, Galvin and Gagne


FAT discussion
 Space overhead of FAT is trivial:
 2 bytes / 512 byte block = ~.4% (Compare to Unix)
 Reliability: how to protect against errors?
 Create duplicate copies of FAT on disk.
 State duplication a very common theme in reliability
 Bootstrapping: where is root directory?
 Fixed location on disk:

FAT (opt) FAT root dir …

FAT (MS-DOS)

Operating System Concepts – 10 th Edition 14.23 Silberschatz, Galvin and Gagne


Linked Allocation
 Each file is a linked list of disk blocks: blocks may be
scattered anywhere on the disk
block = pointer

 Mapping
Q
LA/511
R
Block to be accessed is the Qth block in the linked chain
of blocks representing the file.

Displacement into block = R + 1

Operating System Concepts – 10 th Edition 14.24 Silberschatz, Galvin and Gagne


Linked Allocation

Operating System Concepts – 10 th Edition 14.25 Silberschatz, Galvin and Gagne


File-Allocation Table

Operating System Concepts – 10 th Edition 14.26 Silberschatz, Galvin and Gagne


Allocation Methods - Indexed

 Indexed allocation
 Each file has its own index block(s) of pointers to its data
blocks

 Logical view

index table

Operating System Concepts – 10 th Edition 14.27 Silberschatz, Galvin and Gagne


Example of Indexed Allocation

Operating System Concepts – 10 th Edition 14.28 Silberschatz, Galvin and Gagne


Indexed files (Nachos, VMS)
 system allocates a file header to hold an array of
pointers big enough to point to file size number of
blocks.
File header Disk blocks

Null

 Pros & Cons:


 + Can easily grow up to space allocated for descriptor
 + Random access is fast
 – Clumsy to grow file bigger than table size
 – Still lots of seeks: blocks can be spread all over the
disk,
Operating System so
Concepts
th
– 10sequential
Edition access
14.29 is slow. Silberschatz, Galvin and Gagne
Indexed Allocation (Cont.)
 Need index table

 Random access

 Dynamic access without external fragmentation, but


have overhead of index block

 Mapping from logical to physical in a file of maximum


size of 256K bytes and block size of 512 bytes. We
need only 1 block for index table
Q
LA/512
R

Q = displacement into index table


R = displacement into block

Operating System Concepts – 10 th Edition 14.30 Silberschatz, Galvin and Gagne


Indexed Allocation – Mapping (Cont.)
 Mapping from logical to physical in a file of unbounded length
(block size of 512 words)

 Linked scheme – Link blocks of index table (no limit on size)

Q1
LA / (512 x 511)
R1
Q1 = block of index table
R1 is used as follows:
Q2
R1 / 512
R2

Q2 = displacement into block of index table


R2 displacement into block of file:

Operating System Concepts – 10 th Edition 14.31 Silberschatz, Galvin and Gagne


Indexed Allocation – Mapping (Cont.)
 Two-level index (4,096-byte blocks could store 1,024 four-byte pointers
in outer index -> 1,048,576 data blocks and file size of up to 4GB)

Q1
LA / (512 x 512)
R1

Q1 = displacement into outer-index


R1 is used as follows:
Q2
R1 / 512
R2

Q2 = displacement into block of index table


R2 displacement into block of file:

Operating System Concepts – 10 th Edition 14.32 Silberschatz, Galvin and Gagne


Indexed Allocation – Mapping (Cont.)

Operating System Concepts – 10 th Edition 14.33 Silberschatz, Galvin and Gagne


Multilevel indexed (UNIX 4.1)
 Key idea: efficient for small files, but still allow big files
 File header contains 13 pointers
 fixed size table, pointers not all equivalent
 the header is called an “inode” in UNIX
 First 10 are pointers to data blocks.
 If file is small enough, some pointers will be NULL.

Operating System Concepts – 10 th Edition 14.34 Silberschatz, Galvin and Gagne


Multilevel indexed (UNIX 4.1)

1 Disk blocks
File header 2
1
2
3 11

11 266
267
12 256
13

256
256

256

256 256
256
256

Operating System Concepts – 10 th Edition 14.35 Silberschatz, Galvin and Gagne


Multilevel indexed (UNIX 4.1)
 What if you allocate 11th block?
 Pointer to an indirect block – a block of pointers to data
blocks.
 Gives us 256 blocks, + 10 (from file header) = 1/4 MB
 What if you allocate a 267th block?
 Pointer to a doubly indirect block – a block of pointers to
indirect blocks (in turn, block of pointers to data blocks).
 Gives us about 64K blocks => 64MB
 What if want a file bigger than this?
 One last pointer – what should it point to?
 Instead, pointer to triply indirect block – block of pointers
to doubly indirect blocks (which are ...)

Operating System Concepts – 10 th Edition 14.36 Silberschatz, Galvin and Gagne


Multilevel indexed (UNIX 4.1)
 Thus, file header is:
 First 10 data block pointers – point to one block each, so
10 blocks
 11 indirect block pointer – points to 256 blocks
 12 doubly indirect block pointer – points to 64K blocks
 13 triply indirect block pointer – points to 16M blocks
1. Bad news: Still an upper limit on file size ~ 16 GB.
2. Pointers get filled in dynamically: need to allocate indirect
block only when file grows > 10 blocks.
 If small file, no indirection needed.
3. How many disk accesses to reach block #23? (2)

Operating System Concepts – 10 th Edition 14.37 Silberschatz, Galvin and Gagne


Multilevel indexed (UNIX 4.1)
 How about block # 5? (1)
 How about block # 340? (3)
 UNIX Pros & Cons:
 + Simple (more or less)
 + Files can easily expand (up to a point)
 + Small files particularly cheap and easy
 – Very large files, spend lots of time reading indirect
blocks
 – Lots of seeks

Operating System Concepts – 10 th Edition 14.38 Silberschatz, Galvin and Gagne


DEMOS
 OS for Cray-1, mid to late 70’s.
 File system approach corresponds to segmentation.

 Cray-1 had 12 ns cycle time, so CPU:disk speed ratio about


the same as today (a few million instructions = 1 seek).

 Idea: reduce disk seeks by using contiguous allocation in


normal case, but allow flexibility to have non-contiguous
allocation.

Operating System Concepts – 10 th Edition 14.39 Silberschatz, Galvin and Gagne


DEMOS
 File header: table of base & size (10 “block group” pointers)

On disk
Base size

File header

Block group
 Each “block group” – a contiguous region of blocks
 Are 10 block group pointers enough?
 No. If need more than 10 block groups, set flag in file header:
BIGFILE.
Operating System Concepts – 10 th Edition 14.40 Silberschatz, Galvin and Gagne
DEMOS
 Each table entry now points to an indirect block group – a block
group of pointers to block groups.

On disk
Base size

File header
Block group
Indirect block group
 Can get huge files this way: Suppose 1000 blocks in a block
group, each block 8K size, each entry 8 byte => (10 ptrs1024
groups/ptr1000 blocks/group)*8K =80 GB max file size
 How do you find an available block group?
 Use bit map to find block of 0’s
Operating System Concepts – 10 th Edition 14.41 Silberschatz, Galvin and Gagne
DEMOS
 Pros & cons:
 + Fast sequential access
 + Easy to find free block groups
 + Free areas merge automatically
 – When disk fills up:
 a. No long runs of blocks (fragmentation)
 b. High CPU overhead to find free block
 Solution:
 Don’t let disk get full – keep portion in reserve
 Free count = # blocks free in bitmap.
 Normally, don’t even try allocate if free count = 0.
 Change this to: don’t allocate if free count < reserve
 Why do this?
 Tradeoff: pay for more disk space, get contiguous allocation
 How much of a reserve do you need?
 In practice, 10% seems like enough.

Operating System Concepts – 10 th Edition 14.42 Silberschatz, Galvin and Gagne


UNIX BSD 4.2
 Exactly the same as BSD 4.1 (same file header and triply indirect
blocks), except incorporated some ideas from DEMOS:
 Uses bitmap allocation in place of free list
 Attempt to allocate files contiguously
 10% reserved disk space
 Skip sector positioning
 Problem: when you create a file, don’t know how big it will
become (in UNIX, most writes are by appending to the file).
 So how much contiguous space do you allocate for a file, when
it’s created?

Operating System Concepts – 10 th Edition 14.43 Silberschatz, Galvin and Gagne


UNIX BSD 4.2
 In Demos, power of 2 growth: once it grows past 1 MB, allocate
2MB, etc.
 In BSD 4.2, just find some range of free blocks, put each new file
at the front of a different range.
 When need to expand a file, you first try successive blocks in
bitmap, then choose new range of blocks.
 Also, rotational delay can cause a problem, even with sequential
files.

Operating System Concepts – 10 th Edition 14.44 Silberschatz, Galvin and Gagne


Combined Scheme: UNIX UFS
4K bytes per block, 32-bit addresses

More index blocks than can be addressed with 32-bit file pointer

Operating System Concepts – 10 th Edition 14.45 Silberschatz, Galvin and Gagne


Performance
 Best method depends on file access type
 Contiguous great for sequential and random
 Linked good for sequential, not random
 Declare access type at creation -> select either contiguous or
linked
 Indexed more complex
 Single block access could require 2 index block reads
then data block read
 Clustering can help improve throughput, reduce CPU
overhead

Operating System Concepts – 10 th Edition 14.46 Silberschatz, Galvin and Gagne


Performance (Cont.)
 Adding instructions to the execution path to save one disk I/O
is reasonable
 Intel Core i7 Extreme Edition 990x (2011) at 3.46Ghz =
159,000 MIPS
 http://en.wikipedia.org/wiki/Instructions_per_second
 Typical disk drive at 250 I/Os per second
 159,000 MIPS / 250 = 630 million instructions during one
disk I/O
 Fast SSD drives provide 60,000 IOPS
 159,000 MIPS / 60,000 = 2.65 millions instructions during
one disk I/O

Operating System Concepts – 10 th Edition 14.47 Silberschatz, Galvin and Gagne


Free-Space Management
 File system maintains free-space list to track available blocks/clusters
 (Using term “block” for simplicity)
 Bit vector or bit map (n blocks)

0 1 2 n-1

1  block[i] free
bit[i] =


0  block[i] occupied

Block number calculation

(number of bits per word) *


(number of 0-value words) +
offset of first 1 bit
CPUs have instructions to return offset within word of first “1” bit

Operating System Concepts – 10 th Edition 14.48 Silberschatz, Galvin and Gagne


Free-Space Management (Cont.)
 Bit map requires extra space
 Example:
block size = 4KB = 212 bytes
disk size = 240 bytes (1 terabyte)
n = 240/212 = 228 bits (or 32MB)
if clusters of 4 blocks -> 8MB of
memory

 Easy to get contiguous files

Operating System Concepts – 10 th Edition 14.49 Silberschatz, Galvin and Gagne


Linked Free Space List on Disk
 Linked list (free list)
 pointer to the first free
block in a special
location in the file
system and caching it in
memory.
 This first block contains
a pointer to the next free
block, and so on
 Cannot get contiguous
space easily
 No waste of space
 No need to traverse the
entire list (if # free
blocks recorded)

Operating System Concepts – 10 th Edition 14.50 Silberschatz, Galvin and Gagne


Free-Space Management (Cont.)
 Grouping
 Modify linked list to store address of next n-1 free blocks in
first free block, plus a pointer to next block that contains
free-block-pointers (like this one)

 Counting
 Because space is frequently contiguously used and freed,
with contiguous-allocation allocation, extents, or clustering
 Keep address of first free block and count of following
free blocks
 Free space list then has entries containing addresses
and counts
 These entries can be stored in a balanced tree, rather
than a linked list, for efficient lookup, insertion, and
deletion.

Operating System Concepts – 10 th Edition 14.51 Silberschatz, Galvin and Gagne


Space Maps
 Oracle’s ZFS file system (Solaris, etc.) was designed to encompass
huge numbers of files, directories, and even file systems (in ZFS,
can create file-system hierarchies).
 Consider meta-data I/O on very large file systems
 Full data structures like bitmaps couldn’t fit in memory ->
thousands of I/Os and bitmaps scattered on disk
 Divides device space into metaslab, chunks of manageable size
 Given volume can contain hundreds of metaslabs
 Each metaslab has an associated space map
 Uses counting algorithm to store information about free blocks
 But records counting structures to log file rather than file system
on disk
 Space map is the log of all block activity (allocating and
freeing), in time order, in counting format
 Metaslab activity -> load space map into memory in balanced-tree
structure, indexed by offset and replays the log into that structure
 Combine contiguous free blocks into single entry to condense
the space map
 The free-space list is updated on disk as part of the transaction-
oriented operations of ZFS
Operating System Concepts – 10 th Edition 14.52 Silberschatz, Galvin and Gagne
Efficiency and Performance
 Efficiency of storage devices dependent on:
 Disk allocation and directory algorithms
 UNIX inodes are preallocated on a volume and by spreading
them across the volume improves file system’s performance
 Types of data kept in file’s directory (or inode) entry
 a “last write date” determines if the file needs to be backed up.
 a “last access date,” determines when the file was last read.
 Thus, whenever the file is read, a field in the directory structure
must be written to.
 This requirement can be inefficient for frequently accessed files
 Size of the pointers to access the data affects efficiency
 32-bit pointers limits the size of a file to 232 (4 GB), 64-bit
pointers allows file size of 264, but require more space to store.
 As a result, the allocation and free-space-management methods
(linked lists, indexes, and so on) use more storage space.

Operating System Concepts – 10 th Edition 14.53 Silberschatz, Galvin and Gagne


Efficiency and Performance (Cont.)
 Performance
 Keeping data and metadata close together
 Buffer cache – separate section of main memory for frequently
used blocks
 A page cache caches file data as pages rather than disk
blocks using virtual memory techniques and addresses
 More efficient;
 Solaris, Linux, and Windows, use page caching to cache
both process pages and file data, known as unified virtual
memory.
 Some versions of UNIX and Linux provide a unified buffer
cache
 Two alternatives for opening and accessing a file
 Memory mapping
 System calls read() and write()

Operating System Concepts – 10 th Edition 14.54 Silberschatz, Galvin and Gagne


I/O Without a Unified Buffer Cache
 read() and write() system
calls go through the buffer
cache.
 Memory-mapping call
requires using the page
cache and the buffer cache.
 A memory mapping reads in
disk blocks from the file
system and stores them in
the buffer cache.
 Double caching - the contents
of the file in the buffer cache
must be copied into the page
cache, caching file-system
data twice.
 Wastes memory, CPU and I/O
cycles.
 Inconsistencies between the
two caches can result in
corrupt files.
Operating System Concepts – 10 th Edition 14.55 Silberschatz, Galvin and Gagne
I/O Using a Unified Buffer Cache
 A unified buffer
cache uses the same
page cache to cache
both memory-mapped
pages and the read()
and write() system
calls to avoid double
caching
 allows the virtual
memory system to
manage file-
system data

Operating System Concepts – 10 th Edition 14.56 Silberschatz, Galvin and Gagne


Page Replacement Policies
 Caching storage blocks or pages (or both), least recently used (LRU)
seems a reasonable general-purpose algorithm for block or page
replacement.
 Evolution of Solaris
 Solaris 2.5.1 or earlier, allocated pages from the same pool to a
process and to the page cache.
 A system performing many I/O operations used most of the
available memory for caching pages.
 Because of the high rates of I/O, the page scanner reclaimed
pages from processes, rather than from the page cache, when
free memory ran low.
 Solaris 2.6 and Solaris 7 optionally implemented priority paging,
in which the page scanner gave priority to process pages over
the page cache.
 Solaris 8 applied a fixed limit to process pages and the file-
system page cache, preventing either from forcing the other out
of memory.
 Solaris 9 and 10 again changed the algorithms to maximize
memory use and minimize thrashing.
Operating System Concepts – 10 th Edition 14.57 Silberschatz, Galvin and Gagne
Performance of I/O on writes
 Synchronous writes occur in the order in which the storage
subsystem receives them, and the writes are not buffered.
 Thus, the calling routine must wait for the data to reach the
drive before it can proceed.
 In an asynchronous write, the data are stored in the cache, and
control returns to the caller.
 Most writes are asynchronous, however, metadata writes,
among others, can be synchronous.
 A flag in the open() system call allows a process to perform
writes synchronously
 E.g. databases use this feature for atomic transactions, to
assure that data reach stable storage in the required order.

Operating System Concepts – 10 th Edition 14.58 Silberschatz, Galvin and Gagne


Optimizing page cache
 Page cache can be optimized by using different replacement
algorithms, depending on the access type of the file.
 For sequential access, pages should not be replaced in LRU order,
because the most recently used page will be used last, or perhaps
never again.
 Sequential access can be optimized by techniques known as free-
behind and read-ahead.
 Free-behind removes a page from the buffer as soon as the next page
is requested.
 The previous pages are not likely to be used again and waste
buffer space.
 Read-ahead - a requested page and several subsequent pages are
read and cached.
 These pages are likely to be requested after the current page is
processed, caching them saves a considerable amount of time.
 Because of the high latency and overhead involved in making
many small transfers from the track cache to main memory, read-
ahead is beneficial.

Operating System Concepts – 10 th Edition 14.59 Silberschatz, Galvin and Gagne


Recovery
 Files and directories are kept both in main memory and on the
storage volume
 ensure that a system failure does not result in loss of data or in
data inconsistency.
 A system crash can cause inconsistencies among on-storage file-
system data structures, such as directory structures, free-block
pointers, and free FCB pointers.
 Consistency checking – compares data in directory structure and
other metadata with data blocks on disk, and tries to fix
inconsistencies
 Can be slow and sometimes fails

 Use system programs to back up data from disk to another storage


device (magnetic tape, other magnetic disk, optical)

 Recover lost file or disk by restoring data from backup

Operating System Concepts – 10 th Edition 14.60 Silberschatz, Galvin and Gagne


Log Structured File Systems
 Log structured (or journaling) file systems record each
metadata update to the file system as a transaction
 All transactions are written to a log
 A transaction is considered committed once it is written to
the log (sequentially)
 Sometimes to a separate device or section of disk
 However, the file system may not yet be updated
 The transactions in the log are asynchronously written to the
file system structures
 When the file system structures are modified, the
transaction is removed from the log
 If the file system crashes, all remaining transactions in the log
must still be performed
 Faster recovery from crash, removes chance of inconsistency
of metadata
 NTFS, Veritas file system, recent versions of UFS on Solaris
(ext3, ext4, and ZFS) use this method
Operating System Concepts – 10 th Edition 14.61 Silberschatz, Galvin and Gagne
Example: WAFL File System
 write-anywhere file layout (WAFL) from NetApp, Inc. is a
powerful, elegant file system optimized for random writes.
 WAFL is used exclusively on network file servers produced by
NetApp and is meant for use as a distributed file system.
 It can provide files to clients via the NFS, CIFS, iSCSI, ftp, and
http protocols.
 The NFS and CIFS protocols cache data from read operations,
so writes are of the greatest concern to file-server creators.
 WAFL is used on file servers that include an NVRAM cache for
writes.
 Similar to Berkeley Fast File System, with extensive
modifications
 It is block-based and uses inodes to describe files.
 Each inode contains 16 pointers to blocks (or indirect blocks)
belonging to the file described by the inode.

Operating System Concepts – 10 th Edition 14.62 Silberschatz, Galvin and Gagne


The WAFL File Layout

 Each file system has a root inode.


 All of the metadata lives in files.
 All inodes are in one file, the free-block map in another, and the
free-inode map in a third
 The snapshot is a view of the file system at a specific point in
time (before any updates after that time were applied)
 To take a snapshot, WAFL creates a copy of the root inode.
 Any file or metadata updates after that go to new blocks rather
than overwriting their existing blocks.
Operating System Concepts – 10 th Edition 14.63 Silberschatz, Galvin and Gagne
Snapshots in WAFL
 Many snapshots can exist simultaneously,
so one can be taken each hour of the day
and each day of the month, for example.
 A user with access to these snapshots can
access files as they were at any of the
times the snapshots were taken.
 The snapshot facility is also useful for
backups, testing, versioning, and so on.
 Newer versions of WAFL actually allow
read–write snapshots, known as clones.
 Another feature that naturally results from
the WAFL file system implementation is
replication, the duplication and
synchronization of a set of data over a
network to another system.
 ZFS file system supports similarly efficient
snapshots, clones, and replication

Operating System Concepts – 10 th Edition 14.64 Silberschatz, Galvin and Gagne


End of Chapter 14

Operating System Concepts – 10 th Edition Silberschatz, Galvin and Gagne

You might also like