Dos

Distributed File Systems
Pablo de la Fuente
Departamento de Informática
E.T.S. de Ingeniería Informática
Campus Miguel Delibes
47011 Valladolid (Spain)
pfuente@infor.uva.es
Organization
General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
Organization
General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
Introduction (1)
The main purposes of using file in operating systems are:
1. Permanent storage of information.

2. Sharing the information. A file can be created by one aplication
and the shared with different applications.
A file system is a subsystem of an operating system that perform file

management activities such as organization, storing, retrieval, naming,
sharing and protection of files. (Sinha)
A file system provide an abstraction of a storage device.
Introduction (2)
A distributed file system (DFS) provides similar abstraction to users of a

distributed system. In addition to the general purposes a DFS support
normally the following:
1. Remote information sharing.
2. User mobility.
3. Availability.
4. Diskless workstations.
A DFS provide three types of services, tthat can be thought as a
component of the system.
1. Storage service.
2. True file service.
3. Name service. Also called directory service.
Desiderable features (1)
A good distributed file system should have the following features.

• Transparency. Between the different types of transparency previously
detailed are desiderable:
1. Structure transparency.
2. Access transparency.
3. Naming transparency. Some authors speak about location
transparency.
4. Replication transparency. The clients do not need to know the
existence or locations of multiple file copies.
Desiderable features (2)
Another features are:

• User mobility.
• Performance.
• Simplicity and ease of use.
• Scalability
• High availability.
• High reliability.
• Data integrity.
• Security.
• Heterogeneity.
Accessing Remote Files
There ares several models to service a client’s file access request

when the accessed file is a remote file
•Remote service model. The client’s request is performed at the
server’s node.
•Data-caching model. It attempts to reduce the amount of network
traffic by taking advantage of the locality feature found in file access.
The client’s request is performed at the client’s node itself by using the
cached data. Compared with remote service model, this model greatly
reduces network traffic.
Almost all existing distributed file systems implement some form of
caching.
One problem, referred to as the cache consistency problem, is keeping
the cached data consistent with the original file content.
Units of Data Transfer
In the systems that use the data-caching model, an important issue is to

decide the unit of data transfer. The other models are:
• File-level transfer model. In this model, when an operation requires file
data to be transferred across the network in either direction between a
client and a server, the whole file is moved.
• Block-level transfer model. In this model, file data transfers across the
network between a client and a server take place in units of file bloks.
• Byte-level transfer model. In this model, file data transfers across the
network between a client and a server take place in units of bytes.
• Record-level transfer model. In this model, the transference unit is the
record.
Semantics of File Sharing (1)
a) On a single processor, when a read
follows a write, the value returned by the
read is the value just written.
b) In a distributed system with caching,
obsolete values may be returned.
Semantics of File Sharing (2)
Method Comment
UNIX semantics Every operation on a file is instantly visible to all processes
Session semantics No changes are visible to other processes until the file is closed
Immutable files No updates are possible; simplifies sharing and replication
Transaction All changes occur atomically
Four ways of dealing with the shared files in a distributed system.

Semantics of File Sharing (3). Unix
•Every operation on a file is instantly visible to all parties
•A Read following a Write will return the value just written
» For all users of the file
•Enforces (requires) a total global order on all file operations to return most
recent value
» On a single physical machine this results from using a shared I-Node
to control all file operations
» File data is thus shared data structure among all users
» Distributed file server must reproduce this behavior
•Performance implications of “instant updates”
•Fine grain operations increase overhead
Semantics of File Sharing (4). Unix
•Distributed UNIX Semantics

– Could use a single centralized server which would thus serialize all
file operations
– Provides poor performance under many use patterns
•Performance constraints require that the clients cache file blocks, but the
system must manage consistency among cached blocks to produce UNIX
semantics
– Writes invalidate cached blocks
– Read operations on local copies “after” the write according to a
global clock happened “before” the write
•Serializable operations in transaction systems
•Global virtual clock orders on all writes, not reads
Semantics of File Sharing (5). Session
•UNIX semantics are still expensive

» Write invalidation of all cached blocks slows write operations and
reduces read performance
» Relaxation of the file interaction semantics helps
» Make changes to local copies and propagate them when the file is
closed
•Session semantics because the changes become visible when the
session is finished
•Final file state depends on who closes last
» OK for processes whose file modification is transaction oriented,
open-modify-close
» Very Bad for mode of open for a series of operations
Semantics of File Sharing (6). Session
•Semantics could arbitrarily chose update order

» No real guidelines or obvious reason to formulate a rule
» Modification of file by a process is monolithic
•Violates the familiar UNIX semantics implied by a single file pointer
shared among parents and children
» Two processes appending to a file should produce cumulative
results interleaved by write operation order
» Session semantics would produce one process’s changes or the
other, not both
Many processes keep files open for long periods
•Usable with caution but differs from many programmers’ previous
experience, so must be approach with caution
Semantics of File Sharing (7). Inmutable files
•No updates are possible

» Simplifies sharing and replication
•No way to open a file for writing or appending
•Only directory entries may be modified
•Create a new file to replace an old one
•Also fine for many applications
» Again, though, different enough that it must be approached with
caution
•Design Principle:
» Many applications of distribution involve porting existing non-
distributed code along with its assumptions
Semantics of File Sharing (8). Atomic Transactions
• Changes are all or nothing

» Begin-Transaction
» End-Transaction
• System responsible for enforcing serialization
» Ensuring that concurrent transactions produce results consistent with
some serial execution
» Transaction systems commonly track the read/write component
operations
• Familiar aid of atomicity provided by transaction model to implementers
of distributed systems
» Commit ad rollback both very useful in simplifying implementation
Caching (1)
• Cache path translations on client to speed operation

» Files frequently used
» Frequent prefixes (/usr/local/bin)
• Cache misses default to basic lookup behavior
• Cache hits give binary file references
» BUT the reference may be stale so the file server must be able to
reject such a reference and tell the client that it should do a regular
lookup
» This is MORE expensive (latency and messages) so hints must be
right most of the time
Caching (2)
Servidor
Caching (3)
Concerning caching there are several key decisions as in centralized

systems as in distributed systems:
» Granularity of cached data (large versus small)
» Cache size (large versus small, fixed versus dynamically changing)
» Replacement policy
In distributed systems a file-caching scheme should also address the
following key decisions:
» Cache location
» Modification propagation
» Cache validation
Caching (4)
1 No caching
3 Client’s main Client’s main

2 Cache located in
memory memory 2 server’s main
memory
3 Cache located in
client’s disk
4 1
Client’s disk Server’s disk
4 Cache located in
client’s main
memory
Not available in diskless Original file location

workstations
Possible cache locations
Sinha. Distributed operating systems
Caching (5)
Cache location Access cost on Advantages
cache hit
Server’s main memory One network access 1. Easy to implement
2. Totally tansparent to the
clients
3. Easy to keep consistency
4. Easy to support UNIX
semantics
Client’s disk One disk access 1. Realiability against crashes
2. Large storage capacity
3. Supporting siconnected
operation
4. Contributes to scalability
and reliability
Client’s main mermory ------- 1. Maximun performance
gain
2. Permits workstations to be
diskless
3. Contributes to scalability
and reliability
Caching (6)
Modification propagation.-
The aim is keeping file data cached at multiple client nodes consistent.
There are several approaches related with:
1.When to propagate modifications made to a cached data to the
corresponding file server
2.How to verify the validity of cached data
The modification propagation used has a critical efect on the system’s

performance and reliability. The file semantics supported depends
greatly on the modification propagation scheme used.
Caching (7 )
Write through Scheme.-

When a cache entry is modified, the new value is inmediatily sent to the
server for updating the original copy of the file. Advantages: reliability
and suitability for UNIX-like semantics. Drawback: Poor write
performance. Suitable for situations with the relation read-to-write
accesses is large.
Delayed-Write Scheme.-
The aim is to reduce network traffic for writes. When a cache entry is
modified, the new value is written only to the cache and the client just
makes a note.
• Write on ejecution from cache
• Periodic write
• Write on close

Caching (8 )
Delayed-Write Scheme helps in perfonmance improvement for write

accesses due the following reasons:
1. Write accesses complete more quickly because the new value is

written only in the cache pf the client performing the write.
2. Modified data my be deleted before it is time to send them to the
server. Example temporary files.
3. Gathering of all file updates and sending them together to the
server is more efficient than sending each update separately.
However can be some reliability problems. Modifications not yet send to

the server from a client’s cache will be lost if the client crashes. The
sharing-file semantics can results some fuzzy.

Caching (9 )
Cache Validation Schemes.-

A file data may simoultaneously reside in the cache of multiple nodes. The
modification propagation polcy only specifies when the master copy of a
file at the server node is update upon modification of a cache entry.
Client initiated approach.-
1. Checking before every access. Thies approach defeats the main
purpose of caching, but it is suitable for supporting UNIX-like semantics.
2. Periodic checking. A check is initaited very fixed interval of time.
3. Check on file open. With this option, a client’s cache entry is validated
only when the client opens the corresponding file for use. It is suitable
for supporting session semantics
Server initiated approach.-
File Replication (1)
Differences between Replication and Caching
1. A replica is associated with a server, whereas a cached copy is

normally associated with a client.
2. The existence of a cached copy is primarily depedent on the locality in
flie access patterns, whereas the existence of a replica normally
depends on availability and performance requirements.
3. As compared to a cached copy, a replica is more persistent, widely
known, secure, avilable, complete and accurate.
4. A cached copy is contingent upon a replica. Only by periodic
revalidation with respect to a replica can a cached copy be useful

File Replication (2)
The possible benefits that offers the replication of data are:
1. Increased availability
2. Increased reliability
3. Improved response time
4. Reduced network traffic
5. Improved system throughput
6. Better scalability
7. Autonomous operation

Fault tolerance
The fault tolerance is an important issue in the design of a distributed file

system. The characteristics of that kind of systems make possible
several fault situations. The primary file properties that directly influence
the availabilty of a distributed system to tolerate faults are:
1. Availability. It is a reference about the fractgion of the time for which

the file is available for use. Replication is a mechanism for
improving the availability of a file.
2. Robustness. It refers to its power to survive crashes of the storage
device and decays of the storage medium on wich is stored.
3. Recoverability. Ability to be rolled back to an earlier, consistent state
when an operation on the file fials or is aborted by the client.

Stateful vs. Stateless file server
Client process Server process
Open(filename, mode)
File table.
Return(fid)
fid Mode R/W pointer
Read(fid, 100, buf)

Return(bytes 0 to 99)
Read(fid, 100, buf)

Client process Server process

Read(filename, 0,100,buf)
File state infor.
File
Mode R/W pointer
name
Read(filename,100,100,buf)

Distinctions between Stateful vs. Stateless service (1)
Failure Recovery.
- A stateful server loses all its volatile state in a crash.
• Restore state by recovery protocol based on a dialog with
clients, or abort operations that were underway when the crash
occurred.
• Server needs to be aware of client failures in order to reclaim
space allocated to record the state of crashed client processes
(orphan detection and elimination).
- With stateless server, the effects of server failures and recovery are
almost unnoticeable. A newly reincarnated server can respond to a
self-contained request without any difficulty.
Distinctions between Stateful vs. Stateless service (2)
Penalties for using the robust stateless service:
- Longer request messages.
- Slower request processing.
- Difficulty in providing UNIX file semantics.
Some environments require stateful service.
- A server employing server-initiated cache validation cannot
provide stateless service, since it maintains a record of which files
are cached by which clients.
- UNIX use of file descriptors and implicit offsets is inherently
stateful; servers must maintain tables to map the file descriptors to
inodes, and store the current offset within a file.
Organization
General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
NFS. Introduction
• An industry standard for file sharing on local networks since

the 1980s
• An open standard with clear and simple interfaces
• Supports many of the design requirements already
mentioned:
– transparency
– heterogeneity
– efficiency
– fault tolerance
• Limited achievement of:
– concurrency
– replication
– consistency
– security
NFS Architecture (1)
a) The remote access model.

b) The upload/download model
Client computer
Client computer NFS Server computer

Application
program Client
Application Application Application

Kernel program
program program
UNIX
system calls
UNIX kernel Virtual file system Virtual file system

Operations Operations
on local files on
file system
remote files
UNIX NFS UNIX
NFS NFS Client
file file
Other
client server
system system
NFS
protocol
(remote operations)
Colouris Distributed systems
Does the implementation have to be in the system kernel?

No:
– there are examples of NFS clients and servers that run at application-
level as libraries or processes (e.g. early Windows and MacOS
implementations, current PocketPC, etc.)
But, for a Unix implementation there are advantages:
– Binary code compatible - no need to recompile applications
• Standard system calls that access remote files can be routed
through the NFS client module by the kernel
– Shared cache of recently-used blocks at client
– Kernel-level server can access i-nodes and file blocks directly
• but a privileged (root) application program could do almost the
same.
– Security of the encryption key used for authentication.
File System Model
Operation v3 v4 Description
Create Yes No Create a regular file
Create No Yes Create a nonregular file
Link Yes Yes Create a hard link to a file
Symlink Yes No Create a symbolic link to a file
Mkdir Yes No Create a subdirectory in a given directory
Mknod Yes No Create a special file
Rename Yes Yes Change the name of a file
Rmdir Yes No Remove an empty subdirectory from a directory
Open No Yes Open a file
Close No Yes Close a file
Lookup Yes Yes Look up a file by means of a file name
Readdir Yes Yes Read the entries in a directory
Readlink Yes Yes Read the path name stored in a symbolic link
Getattr Yes Yes Read the attribute values for a file
Setattr Yes Yes Set one or more attribute values for a file
Read Yes Yes Read the data contained in a file
Write Yes Yes Write data to a file
An incomplete list of file system operations supported by NFS.

Tanenbaum. Distributed systems
Communication
a) Reading data from a file in NFS version 3.

b) Reading data using a compound procedure in version 4.

NFS. Characteristics
• Stateless server, so the user's identity and access rights must be checked
by the server on each request.
– In the local file system they are checked only on open()
• Every client request is accompanied by the userID and groupID
• Server is exposed to imposter attacks unless the userID and groupID are
protected by encryption
• Kerberos has been integrated with NFS to provide a stronger and more
comprehensive security solution
y Mount operation:
mount(remotehost, remotedirectory, localdirectory)
y Server maintains a table of clients who have mounted filesystems at that
server
y Each client maintains a table of mounted file systems holding:
< IP address, port number, file handle>
y Hard versus soft mounts
NFS. Naming (1)

NFS. Naming (2)

NFS. Automounting (1)
NFS client catches attempts to access 'empty' mount points and routes
them to the Automounter
– Automounter has a table of mount points and multiple candidate
serves for each
– it sends a probe message to each candidate server and then uses
the mount service to mount the filesystem at the first server to
respond
• Keeps the mount table small
• Provides a simple form of replication for read-only filesystems
– E.g. if there are several servers with identical copies of /usr/lib then
each server will have a chance of being mounted at some clients.

Using symbolic links with automounting.

NFS. File Attributes (1)
Attribute Description
TYPE The type of the file (regular, directory, symbolic link)
SIZE The length of the file in bytes
Indicator for a client to see if and/or when the file has
CHANGE
changed
FSID Server-unique identifier of the file's file system
Some general mandatory file attributes in NFS.

NFS. File Attributes (2)
Attribute Description
ACL an access control list associated with the file
FILEHANDLE The server-provided file handle of this file
FILEID A file-system unique identifier for this file
FS_LOCATIONS Locations in the network where this file system may be found
OWNER The character-string name of the file's owner
TIME_ACCESS Time when the file data were last accessed
TIME_MODIFY Time when the file data were last modified
TIME_CREATE Time when the file was created
Some general recommended file attributes.

File Locking in NFS (1)
Operation Description
Lock Creates a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the leas on a specified lock
NFS version 4 operations related to file locking.

File Locking in NFS (2)
Current file denial state

NONE READ WRITE BOTH
Request READ Succeed Fail Succeed Succeed

access WRITE Succeed Succeed Fail Succeed
BOTH Succeed Succeed Succeed Fail
(a)
Requested file denial state

NONE READ WRITE BOTH
Current
READ Succeed Fail Succeed Succeed
access
state WRITE Succeed Succeed Fail Succeed
BOTH Succeed Succeed Succeed Fail
(b)
The result of an open operation with share reservations in NFS.

• When the client requests shared access given the current denial state.
• When the client requests a denial state given the current file access state.

NFS. Client Caching (1)
• Client-side caching in NFS.

NFS. Client Caching (2)

NFS. The security architecture

NFS. Access Control
Operation Description
Read_data Permission to read the data contained in a file
Write_data Permission to to modify a file's data
Append_data Permission to to append data to a file
Execute Permission to to execute a file
List_directory Permission to to list the contents of a directory
Add_file Permission to to add a new file t5o a directory
Add_subdirectory Permission to to create a subdirectory to a directory
Delete Permission to to delete a file
Delete_child Permission to to delete a file or directory within a directory
Read_acl Permission to to read the ACL
Write_acl Permission to to write the ACL
Read_attributes The ability to read the other basic attributes of a file
Write_attributes Permission to to change the other basic attributes of a file
Read_named_attrs Permission to to read the named attributes of a file
Write_named_attrs Permission to to write the named attributes of a file
Write_owner Permission to to change the owner
Synchronize Permission to to access a file locally at the server with synchronous reads and writes
The classification of operations recognized by NFS with respect to access control.

NFS. Scalability
The performance of a single server can be increased by the addition of

processors, disks and controllers.
When the limits of that process are reached, additional servers must be
installed and the filesystems must be reallocated between them.
The effectiveness of that strategy is limited by the existence of ‘hot spot’
files.
When loads exceed the maximum performance, a distributed file system
that supports replication of updatable files, or one that reduces the protocol
traffic by the caching of whole files, may offer a better solution.
Organization
General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
AFS. Introduction
• A distributed computing environment under development since 1983 at
Carnegie-Mellon University.
• Andrew is highly scalable; the system is targeted to span over 5000
workstations.
• Andrew distinguishes between client machines (workstations) and
dedicated server machines. Servers and clients run the 4.2BSD UNIX
OS and are interconnected by an inter-net of LANs.
• NFS compatible.
Design characteristics:
Whole-file serving. (In AFS-3 files larger than 64 kbytes are transferred
in 64-kbyte chunks).
Whole-file caching. The copy or chunk transferred to client is stored in
a cache on local disk. The cache is permanent. On open request the
local copies are preferred to remote copies.
AFS. Initial considerations
• most files are small--transfer files rather than disk blocks?
• reading more common than writing
• most access is sequential
• most files have a short lifetime--lots of applications generate
temporary files (such as a compiler).
• file sharing is unusual (in terms of reads and writes)--argues for
client caching
• processes use few files
• files can be divided into classes--handle “system” files and “user”
files differently.
AFS. Characteristics
• Clients are presented with a partitioned space of file names: a local name
space and a shared name space.
• Dedicated servers, called Vice, present the shared name space to the
clients as an homogeneous, identical, and location transparent file
hierarchy.
• The local name space is the root file system of a workstation, from which
the shared name space descends.
• Workstations run the Virtue (Venus) protocol to communicate with Vice, and
are required to have local disks where they store their local name space.
• Servers collectively are responsible for the storage and management of the
shared name space.
• Clients and servers are structured in clusters interconnected by a backbone
LAN.
• A cluster consists of a collection of workstations and a cluster server and is
connected to the backbone by a router.
• A key mechanism selected for remote file operations is whole file caching.
Opening a file causes it to be cached, in its entirety, on the local disk.
AFS. Processes distribution
Workstations Servers
User Venus
program
Vice
UNIX kernel
UNIX kernel
Venus
User Network
program
UNIX kernel
Vice
Venus
User
program UNIX kernel
UNIX kernel

AFS. System call interception
Workstation
User Venus
program
UNIX file Non-local file
system calls operations
UNIX kernel
UNIX file system
Local
disk

AFS. The main components of the Vice service interface
Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file
identified by the fid and records a callback promise on it.
Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified
file.
Create() -> fid Creates a new file and records a callback promise on it.
Remove(fid) Deletes the specified file.
SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the
lock may be shared or exclusive. Locks that are not removed
expire after 30 minutes.
ReleaseLock(fid) Unlocks the specified file or directory.
RemoveCallback(fid) Informs server that a Venus process has flushed a file from its
cache.
BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels
the callback promise on the relevant file.

AFS. Implementation of file system calls
User process UNIX kernel Venus (Virtue) Net Vice
open(FileName, If FileName refers to a
mode) file in shared file space,
pass the request to Check list of files in
local cache. If not
Venus. present or there is no
valid callback promise,
send a request for the
file to the Vice server
that is custodian of the
volume containing the Transfer a copy of the
file. file and a callback
promiseto the
workstation. Log the
Place the copy of the callback promise.
file in the local file
Open the local file and system, enter its local
return the file name in the local cache
descriptor to the list and return the local
application. name to UNIX.
read(FileDescriptor, Perform a normal
Buffer, length) UNIX read operation
on the local copy.
write(FileDescriptor, Perform a normal
Buffer, length) UNIX write operation
on the local copy.
close(FileDescriptor) Close the local copy
and notify Venus that
the file has been closed. If the local copy has
been changed, send a
copy to the Vice server Replace the file
that is the custodian of contents and send a
the file. callback to all other
clients holdingcallback
promiseson the file.
AFS. Questions about implementation
There are many open questions about the implementation of AFS:
• How does AFS gain control when an open or close system call
referring to a file in the shared file space is issued by a client?
• How is the server holding the required file located?
• What space is allocated for cached files in workstations?
• How does AFS ensures that the cached copies are up to date when
files may be updated by several clients?
One of the file partitions on the local disk of each workstations is used as a
cache holding the cached copies of files from shared space. Venus
manages the cache. The workstation cache is usaully large enough to
accommodate several hundred average-sized files. If the user do not
modify the files cached, the workstations are largely independent of the
Vice servers.
Organization
General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
Coda. Introduction (I)
• Built by M. Satyanarayanan. Based on the Andrew File System

(AFS)
• Relatively few servers
9 simpler administration
9 centralized storage/backup
• Only servers are trusted
9 improved security
9 no user code on servers
• Location-transparent file system
9 easy data sharing
9 user mobility
9 workstation independence
• Preserve AFS strengths
9 shared unix file system model
9 scalability and performance
9 security
Coda. Introduction (II)
• Salient features:
– Support for disconnected operations
• Desirable for mobile users
– Support for a large number of users
Disconnected operation
– a temporary deviation from normal operation as a client of a
shared repository
Why?
– enhance availability
How?
– data cache
Coda. Design overview (1)
Two strategies for availability

make shared repository more robust
enhance local autonomy
neither adequate by itself
Server replication
improves availability of data
useless if client is isolated
Disconnected operation
system useful also when isolated
limited to data in local cache
Coda. Design overview (2)
• Application area: academic and research, not for highly
concurrent, fine granularity data access, safety-critical system
• Volume, unit of replication, subtree of the Coda namespace
mapped to individual file servers
– VSG. Volume Storage Group, the set of replication sites for a
Volume
– AVSG, Accesible Volume Storage Group, currently
accessible VSG
• Callback, when a workstation caches a file or directory, the
server promise to notify it before allowing modification by others
• Venus, cache manager
– caches latest data from AVSG
– propagates changes to AVSG
– detects AVSG membership changes
Coda. Consistency guarantees
• AFS guarantees
9open: result of last close anywhere
9close: immediate propagation everywhere
9failure: server or network failure
• Coda guarantees
9open: result of last close in accessible universe
9close: immediate propagation to accessible universe
eventual propagation everywhere
9failure: cache miss when disconnected
Coda
The overall organization of AFS.

Read-one-data, read-all-status, write-all
Read - Cache Miss on Open Write - Update

Coda. Naming
• Clients in Coda have access to a single shared name space

• Files are grouped into volumes [partial subtree in the directory structure]
– Volume is the basic unit of mounting
– Namespace: /afs/filesrv.cs.umass.edu [same namespace on all client; different from
NFS]
– Name lookup can cross mount points: support for detecting crossing and automounts
Sharing Files in Coda
• Transactional behavior for sharing files: similar to share reservations in

NFS
– File open: transfer entire file to client machine [similar to delegation]
– Uses session semantics: each session is like a transaction
• Updates are sent back to the server only when the file is closed
Transactional Semantics
File-associated data Read? Modified?

File identifier Yes No
Access rights Yes No
Last modification time Yes Yes
File length Yes Yes
File contents Yes Yes
• Network partition: part of network isolated from rest

– Allow conflicting operations on replicas across file partitions
– Reconcile upon reconnection
– Transactional semantics => operations must be serializable
• Ensure that operations were serializable after thay have executed
– Conflict => force manual reconciliation
Coda. Client Caching
• Cache consistency maintained using callbacks

– Server tracks all clients that have a copy of the file [provide
callback promise]
– Upon modification: send invalidate to clients
Coda. Server Replication
• Use replicated writes: read-once write-all

– Writes are sent to all AVSG (all accessible replicas)
• How to handle network partitions?
– Use optimistic strategy for replication
– Detect conflicts using a Coda version vector
– Example: [2,2,1] and [1,1,2] is a conflict => manual reconciliation
Disconnected Operation
• The state-transition diagram of a Coda client with respect to a

volume.
• Use hoarding to provide file access during disconnection
– Prefetch all files that may be accessed and cache (hoard) locally
– If AVSG=0, go to emulation mode and reintegrate upon
reconnection
Distributed Systems. Tanenbaum, Van Steen. © Prentice-Hall 2002

Coda. Hoarding and Emulation
Hoarding
• Prioritized Cache Management
– Hoard Profiles specify user interest (directories allowed)
– Recent usage
– Hoard priority based on above two
• Hoard walking
– Since priority based on recent usage, every once in a while need to update
file system to reflect priorities
– 10 min default. Can be changed.
Emulation
• Allow updating without contacting file server
• All updates logged in a per volume “replay log”
• Log optimizations to reduce log size
• Persistence achieved using Recoverable Virtual Memory (RVM)
Coda. Reintegration
• Updates propagated to servers and vice versa (one volume at a time)

– 4 stages (of a transaction)
• Log parsed, files locked
• Validation: conflict detection, disk-space check, integrity
• Fetching: updated files from client
• Commit: locks released and changes finalized
– Failures must be manually examined from logs
Conflicts resolution
• Unresolved conflict represented as dangling symbolic link
• Application specific resolvers (ASRs)

– executed at clients
Organization
General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
NFS enhancement - Spritely NFS
• Is an implementation of the NFS protocol with the addition of

open and close calls.
• The parameters of the Sprite open operation specify a mode

and include counts of the number of local processes that
currently have the file open for reading and for writing.
• Spritely NFS implements a recovery protocol that interrogates a

list of clients to recover the full open files table.
NFS enhancement - NQNFS
• maintains similar client-related state concerning open files, but it

uses leases to aid recovery after a server crash.
• Callbacks are used in a similar manner to Spritely NFS to request
clients to flush their caches when a write request occurs.
NFS enhancement - WebNFS
• makes it possible for application programs to become clients of

NFS servers anywhere in the Internet (using the NFS protocol
directly)
• implementing Internet applications that share data directly, such
as multi-user games or clients of large dynamics databases.
NFS enhancement - NFS version 4
• will include the features of WebNFS

• the use of callback or leases to maintain consistency
• on-the-fly recovery
• Scalability will be improved by using proxy servers in a manner
analogous to their use in the Web.
Bibliography
George Coulouris, Jean Dollimore and Tim Kindberg. Distributed Systems:
Concepts and Design (Edition 3 ). Addison-Wesley 2001 http://www.cdk3.net/
Andrew S. Tanenbaum, Maarten van Steen. Distributed Systems: Principles and
Paradigms. Prentice-Hall 2002.
http://www.cs.vu.nl/~ast/books/ds1/
P. K. Sinha, P.K. "Distributed Operating Systems, Concepts and Design", IEEE
Press, 1993
Sape J. Mullender, editor. Distributed Systems, 2nd edition, ACM Press, 1993
http://wwwhome.cs.utwente.nl/~sape/gos0102.html
Jean Bacon. Concurrent Systems.Operating systems, database and distributed
systems an integrated approach. Addison Wesley, 2nd edition 1998
http://www.cl.cam.ac.uk/users/jmb/cs.html

Dos

Uploaded by

Copyright:

Available Formats

You might also like

Dos

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dos

Uploaded by

Copyright:

Available Formats

Distributed File Systems

The main purposes of using file in operating systems are:

1. Permanent storage of information.

A file system is a subsystem of an operating system that perform file

A distributed file system (DFS) provides similar abstraction to users of a

A good distributed file system should have the following features.

Another features are:

There ares several models to service a client’s file access request

In the systems that use the data-caching model, an important issue is to

Four ways of dealing with the shared files in a distributed system.

•Distributed UNIX Semantics

•UNIX semantics are still expensive

•Semantics could arbitrarily chose update order

•No updates are possible

• Changes are all or nothing

• Cache path translations on client to speed operation

Concerning caching there are several key decisions as in centralized

3 Client’s main Client’s main

Not available in diskless Original file location

The modification propagation used has a critical efect on the system’s

Write through Scheme.-

Sinha. Distributed operating systems

Delayed-Write Scheme helps in perfonmance improvement for write

1. Write accesses complete more quickly because the new value is

However can be some reliability problems. Modifications not yet send to

Sinha. Distributed operating systems

Cache Validation Schemes.-

Differences between Replication and Caching

1. A replica is associated with a server, whereas a cached copy is

Sinha. Distributed operating systems

The possible benefits that offers the replication of data are:

Sinha. Distributed operating systems

The fault tolerance is an important issue in the design of a distributed file

1. Availability. It is a reference about the fractgion of the time for which

Sinha. Distributed operating systems

Read(fid, 100, buf)

Read(fid, 100, buf)

Client process Server process

Return(bytes 100 to 199)

Sinha. Distributed operating systems

• An industry standard for file sharing on local networks since

a) The remote access model.

Client computer NFS Server computer

Application Application Application

UNIX kernel Virtual file system Virtual file system

Does the implementation have to be in the system kernel?

An incomplete list of file system operations supported by NFS.

a) Reading data from a file in NFS version 3.

Tanenbaum. Distributed systems

Tanenbaum. Distributed systems

Tanenbaum. Distributed systems

Tanenbaum. Distributed systems

Using symbolic links with automounting.

Some general mandatory file attributes in NFS.

Tanenbaum. Distributed systems

Some general recommended file attributes.

Tanenbaum. Distributed systems

The performance of a single server can be increased by the addition of