Understanding Shared File Systems

Understanding Differences Between Types of Shared
File Systems
Three primary shared file systems are available on the Solaris OS: NFS, QFS/Shared
Writer (QFS/SSW), and the cluster Proxy File System (PxFS). Each of these file systems
is designed to do different things and their unique characteristics give them wide
differentiation.
Sharing Data With NFS
To Solaris OS users, NFS is by far the most familiar file system. It is an explicit over-the-
wire file sharing protocol that has been a part of the Solaris OS since 1986. Its manifest
purpose is to permit safe, deterministic access to files located on a server with reasonable
security. Although NFS is media independent, it is most commonly seen operating over
TCP/IP networks.
NFS is specifically designed to operate in multiclient environments and to provide a

reasonable tradeoff between performance, consistency, and ease-of-administration.
Although NFS has historically been neither particularly fast nor particularly secure,
recent enhancements address both of these areas. Performance improved by 50–60
percent between the Solaris 8 and Solaris 9 OSs, primarily due to greatly increased
efficiency processing attribute-oriented operations5. Data-intensive operations don't
improve by the same margin because they are dominated by data transfer times rather
than attribute operations.
Security, particularly authentication, has been addressed through the use of much
stronger authentication mechanisms such as those available using Kerberos. NFS clients
now need to trust only their servers, rather than their servers and their client peers.
Understanding the Sharing Limitations of UFS
UFS is not a shared file system. Despite a fairly widespread interest in a limited-use
configuration (specifically, mounted for read/write operation on one system, while
mounted read-only on one or more "secondary" systems), UFS is not sharable without the
use of an explicit file sharing protocol such as NFS. Although read-only sharing seems as
though it should work, it doesn't. This is due to fairly fundamental decisions made in the
UFS implementation many years ago, specifically in the caching of metadata. UFS was
designed with only a single system in mind and it also has a relatively complex data
structure for files, notably including "indirect blocks," which are blocks of metadata that
contain the addresses of real user data.
To maintain reasonable performance, UFS caches metadata in memory, even though it

writes metadata to disk synchronously. This way, it is not required to re-read inodes,
indirect-blocks, and double-indirect blocks to follow an advancing file pointer. In a
single-system environment, this is a safe assumption. However, when another system has
access to the metadata, assuming that cached metadata is valid is unsafe at best and
catastrophic at worst.
A writable UFS file system can change the metadata and write it to disk. Meanwhile, a
read-only UFS file system on another node holds a cached copy of that metadata. If the
writable system creates a new file or removes or extends an existing file, the metadata
changes to reflect the request. Unfortunately, the read-only system does not see these
changes and, therefore, has a stale view of the system. This is nearly always a serious
problem, with the consequences ranging from corrupted data to a system crash. For
example, if the writable system removes a file, its blocks are placed in the free list. The
read-only system isn't provided with this information, therefore, a read of the same file
will cause the read-only to follow the original data pointers and read blocks that are now
on the free list!
Rather than risk such extreme consequences, it is better to use one of the many other
options that exist. The selection of which option is driven by a combination of how often
updated data must be made available to the other systems, and the size of the data sets
involved. If the data is not updated too often, the most logical option is to make a copy of
the file system and to provide the copy to other nodes. With point-in-time copy facilities
such as Sun Instant Image, HDS ShadowImage, and EMC TimeFinder, copying a file
system does not need to be an expensive operation. It is entirely reasonable to export a
point-in-time copy of a UFS file system from storage to another node (for example, for
backup) without risk because neither the original nor the copy is being shared. If the data
changes frequently, the most practical alternative is to use NFS. Although performance is
usually cited as a reason not to do this, the requirements are usually not demanding
enough to warrant other solutions. NFS is far faster than most users realize, especially in
environments that involve typical files smaller than 5–10 megabytes. If the application
involves distributing rapidly changing large streams of bulk data to multiple clients,
QFS/SW is a more suitable solution, albeit not bundled with the operating system.
Maintaining Metadata Consistency Among Multiple Systems With QFS

Shared Writer
The architectural problem that prevents UFS file systems from being mounted on
multiple systems is that there exist no provisions for maintaining metadata consistency
among multiple systems. QFS Shared Writer (QFS/SW) implements precisely such a
mechanism by centralizing access to metadata in a metadata server located in the
network. Typically, this is accomplished using a split data and metadata path. Metadata is
accessed through IP networks, while user data is transferred over a SAN.
All access to metadata is required to go over regular networks for arbitration by the
metadata server. The metadata server is responsible for coordinating possibly conflicting
access to metadata from varying clients. Assured by the protocol and the centralized
server that metadata are consistent, all client systems are free to cache metadata without
fear of catastrophic changes. Clients then use the metadata to access user data directly
from the underlying disk resources, providing the most efficient available path to user
data.
Direct Access Shared Storage
The direct access architecture offers vastly higher performance than existing network-
sharing protocols when it comes to manipulating bulk data. This arrangement eliminates
or greatly reduces two completely different types of overhead. First, data is transferred
using the semantic-free SCSI block protocol. Transferring the data between the disk array
and the client system requires no interpretation; no semantics are implied in the nature of
the protocol, thus eliminating any interpretation overhead. By comparison, the equivalent
NFS operations must use the NFS, RPC, XFS, TCP, and IP protocols, all of which are
normally interpreted by the main processors.
The QFS/SW arrangement also eliminates most of the copies involved in traditional file
sharing protocols such as NFS and CIFS. These file systems transfer data several times to
get from the disk to the client's memory. A typical NFS configuration transfers from the
disk array to the server's memory, then from the server's memory to the NIC, then across
the network, and then from the NIC to the client's memory. (This description overlooks
many implementation details.) In contrast, QFS/SW simply transfers data directly from
the disk to the client's memory, once the metadata operations are completed and the client
is given permission to access data. For these reasons, QFS/SW handles bulk data
transfers at vastly higher performance than traditional file sharing techniques.
Furthermore, QFS/SSW shares the on-disk format with QFS/local. In particular, user data
can be configured with all of the options available with QFS/local disk groups, including
striped and round-robin organizations. These capabilities make it far easier to aggregate
data transfer bandwidth with QFS than with NFS, further increasing the achievable
throughput for bulk data operations. User installations have measured single-stream
transfers in excess of 800 megabytes per second using QFS or QFS/SW. One system has
been observed transferring in excess of 3 gigabytes per second. (Obviously, such transfer
rates require non-trivial underlying storage configurations, usually requiring 10–32 Fibre-
Channel disk arrays, depending on both the file system parameters and array capability
and configuration.) For comparison, the maximum currently achievable single-stream
throughput with NFS is roughly 70 megabytes per second.
Handling of Metadata
The performance advantages of direct-client access to bulk data are so compelling that
one might reasonably ask why data isn't always handled this way. There are several
reasons. Oddly enough, one is performance, particularly scalability. Although
performance is vastly improved when accessing user data using direct storage access,
metadata operations are essentially the same speed for NFS and QFS/SW. However, the
metadata protocols are not equivalent because they were designed for quite different
applications. NFS scales well with the number of clients. NFS servers are able to support
hundreds or thousands of clients with essentially the same performance. NFS was
designed with sharing many files to many clients in mind, and it scales accordingly in this
dimension, even when multiple clients are accessing the same files.
Scalability
QFS/SW was designed primarily for environments in which data sets are accessed by
only a few clients, and engineering tradeoffs favor high-performance bulk transfer over
linear client scalability. Empirical studies report that while QFS/SW scales well for small
numbers of nodes (four to eight), scalability diminishes rapidly thereafter. To a large
degree, this is to be expected: the bulk transfer capabilities provided are so high that a
few clients can easily exhaust the capabilities of even high performance disk arrays.
Another consideration is that the efficiency of QFS/SW is fundamentally derived from its
direct, low-overhead access to the storage. This necessarily limits the storage
configurations to which it can be applied. QFS/SW is distinctly not a wide-area sharing
protocol. Furthermore, as noted elsewhere in this article, the NFS and QFS/SW trust
models are completely different. One of the key considerations in the efficiency of shared
file systems is the relative weight of metadata operations when compared to the weight of
data transfer operations. Most file sharing processes involve a number of metadata
operations. Opening a file typically requires reading and identifying the file itself and all
of its containing directories, as well as identifying access rights to each. The process of
finding and opening /home/user/.csh requires an average of seven metadata lookup
operations and nine client/server exchanges with NFS; QFS/SW is of similar complexity.
Compared with the typical 70-kilobyte file in typical user directories, these metadata
operations so dominate the cost of data transfer that even completely eliminating transfer
overhead would have little material impact on the efficiency of the client/server system.
The efficiency advantages of direct access storage are only meaningful when the storage
is sufficiently accessible and when the data is large enough for the transfer overhead to
overwhelm the cost of the required metadata operations.

Understanding Shared File Systems

Uploaded by

Copyright:

Available Formats

You might also like

Understanding Shared File Systems

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Understanding Shared File Systems

Uploaded by

Copyright:

Available Formats

Understanding Differences Between Types of Shared

Sharing Data With NFS

NFS is specifically designed to operate in multiclient environments and to provide a

Understanding the Sharing Limitations of UFS

To maintain reasonable performance, UFS caches metadata in memory, even though it

Maintaining Metadata Consistency Among Multiple Systems With QFS

Direct Access Shared Storage

You might also like