Dos

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Distributed File Systems

Pablo de la Fuente
Departamento de Informática
E.T.S. de Ingeniería Informática
Campus Miguel Delibes
47011 Valladolid (Spain)
pfuente@infor.uva.es
Organization

General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
Organization

General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
Introduction (1)

The main purposes of using file in operating systems are:

1. Permanent storage of information.


2. Sharing the information. A file can be created by one aplication
and the shared with different applications.

A file system is a subsystem of an operating system that perform file


management activities such as organization, storing, retrieval, naming,
sharing and protection of files. (Sinha)
A file system provide an abstraction of a storage device.
Introduction (2)

A distributed file system (DFS) provides similar abstraction to users of a


distributed system. In addition to the general purposes a DFS support
normally the following:
1. Remote information sharing.
2. User mobility.
3. Availability.
4. Diskless workstations.
A DFS provide three types of services, tthat can be thought as a
component of the system.
1. Storage service.
2. True file service.
3. Name service. Also called directory service.
Desiderable features (1)

A good distributed file system should have the following features.


• Transparency. Between the different types of transparency previously
detailed are desiderable:
1. Structure transparency.
2. Access transparency.
3. Naming transparency. Some authors speak about location
transparency.
4. Replication transparency. The clients do not need to know the
existence or locations of multiple file copies.
Desiderable features (2)

Another features are:


• User mobility.
• Performance.
• Simplicity and ease of use.
• Scalability
• High availability.
• High reliability.
• Data integrity.
• Security.
• Heterogeneity.
Accessing Remote Files

There ares several models to service a client’s file access request


when the accessed file is a remote file
•Remote service model. The client’s request is performed at the
server’s node.
•Data-caching model. It attempts to reduce the amount of network
traffic by taking advantage of the locality feature found in file access.
The client’s request is performed at the client’s node itself by using the
cached data. Compared with remote service model, this model greatly
reduces network traffic.
Almost all existing distributed file systems implement some form of
caching.
One problem, referred to as the cache consistency problem, is keeping
the cached data consistent with the original file content.
Units of Data Transfer

In the systems that use the data-caching model, an important issue is to


decide the unit of data transfer. The other models are:
• File-level transfer model. In this model, when an operation requires file
data to be transferred across the network in either direction between a
client and a server, the whole file is moved.
• Block-level transfer model. In this model, file data transfers across the
network between a client and a server take place in units of file bloks.
• Byte-level transfer model. In this model, file data transfers across the
network between a client and a server take place in units of bytes.
• Record-level transfer model. In this model, the transference unit is the
record.
Semantics of File Sharing (1)
a) On a single processor, when a read
follows a write, the value returned by the
read is the value just written.
b) In a distributed system with caching,
obsolete values may be returned.
Semantics of File Sharing (2)

Method Comment
UNIX semantics Every operation on a file is instantly visible to all processes
Session semantics No changes are visible to other processes until the file is closed
Immutable files No updates are possible; simplifies sharing and replication
Transaction All changes occur atomically

Four ways of dealing with the shared files in a distributed system.


Semantics of File Sharing (3). Unix
•Every operation on a file is instantly visible to all parties
•A Read following a Write will return the value just written
» For all users of the file
•Enforces (requires) a total global order on all file operations to return most
recent value
» On a single physical machine this results from using a shared I-Node
to control all file operations
» File data is thus shared data structure among all users
» Distributed file server must reproduce this behavior
•Performance implications of “instant updates”
•Fine grain operations increase overhead
Semantics of File Sharing (4). Unix

•Distributed UNIX Semantics


– Could use a single centralized server which would thus serialize all
file operations
– Provides poor performance under many use patterns
•Performance constraints require that the clients cache file blocks, but the
system must manage consistency among cached blocks to produce UNIX
semantics
– Writes invalidate cached blocks
– Read operations on local copies “after” the write according to a
global clock happened “before” the write
•Serializable operations in transaction systems
•Global virtual clock orders on all writes, not reads
Semantics of File Sharing (5). Session

•UNIX semantics are still expensive


» Write invalidation of all cached blocks slows write operations and
reduces read performance
» Relaxation of the file interaction semantics helps
» Make changes to local copies and propagate them when the file is
closed
•Session semantics because the changes become visible when the
session is finished
•Final file state depends on who closes last
» OK for processes whose file modification is transaction oriented,
open-modify-close
» Very Bad for mode of open for a series of operations
Semantics of File Sharing (6). Session

•Semantics could arbitrarily chose update order


» No real guidelines or obvious reason to formulate a rule
» Modification of file by a process is monolithic
•Violates the familiar UNIX semantics implied by a single file pointer
shared among parents and children
» Two processes appending to a file should produce cumulative
results interleaved by write operation order
» Session semantics would produce one process’s changes or the
other, not both
Many processes keep files open for long periods
•Usable with caution but differs from many programmers’ previous
experience, so must be approach with caution
Semantics of File Sharing (7). Inmutable files

•No updates are possible


» Simplifies sharing and replication
•No way to open a file for writing or appending
•Only directory entries may be modified
•Create a new file to replace an old one
•Also fine for many applications
» Again, though, different enough that it must be approached with
caution
•Design Principle:
» Many applications of distribution involve porting existing non-
distributed code along with its assumptions
Semantics of File Sharing (8). Atomic Transactions

• Changes are all or nothing


» Begin-Transaction
» End-Transaction
• System responsible for enforcing serialization
» Ensuring that concurrent transactions produce results consistent with
some serial execution
» Transaction systems commonly track the read/write component
operations
• Familiar aid of atomicity provided by transaction model to implementers
of distributed systems
» Commit ad rollback both very useful in simplifying implementation
Caching (1)

• Cache path translations on client to speed operation


» Files frequently used
» Frequent prefixes (/usr/local/bin)
• Cache misses default to basic lookup behavior
• Cache hits give binary file references
» BUT the reference may be stale so the file server must be able to
reject such a reference and tell the client that it should do a regular
lookup
» This is MORE expensive (latency and messages) so hints must be
right most of the time
Caching (2)

Servidor
Caching (3)

Concerning caching there are several key decisions as in centralized


systems as in distributed systems:
» Granularity of cached data (large versus small)
» Cache size (large versus small, fixed versus dynamically changing)
» Replacement policy
In distributed systems a file-caching scheme should also address the
following key decisions:
» Cache location
» Modification propagation
» Cache validation
Caching (4)

1 No caching

3 Client’s main Client’s main


2 Cache located in
memory memory 2 server’s main
memory

3 Cache located in
client’s disk

4 1
Client’s disk Server’s disk
4 Cache located in
client’s main
memory

Not available in diskless Original file location


workstations
Possible cache locations
Sinha. Distributed operating systems
Caching (5)
Cache location Access cost on Advantages
cache hit
Server’s main memory One network access 1. Easy to implement
2. Totally tansparent to the
clients
3. Easy to keep consistency
4. Easy to support UNIX
semantics
Client’s disk One disk access 1. Realiability against crashes
2. Large storage capacity
3. Supporting siconnected
operation
4. Contributes to scalability
and reliability
Client’s main mermory ------- 1. Maximun performance
gain
2. Permits workstations to be
diskless
3. Contributes to scalability
and reliability
Sinha. Distributed operating systems
Caching (6)

Modification propagation.-
The aim is keeping file data cached at multiple client nodes consistent.
There are several approaches related with:
1.When to propagate modifications made to a cached data to the
corresponding file server
2.How to verify the validity of cached data

The modification propagation used has a critical efect on the system’s


performance and reliability. The file semantics supported depends
greatly on the modification propagation scheme used.
Caching (7 )

Write through Scheme.-


When a cache entry is modified, the new value is inmediatily sent to the
server for updating the original copy of the file. Advantages: reliability
and suitability for UNIX-like semantics. Drawback: Poor write
performance. Suitable for situations with the relation read-to-write
accesses is large.
Delayed-Write Scheme.-
The aim is to reduce network traffic for writes. When a cache entry is
modified, the new value is written only to the cache and the client just
makes a note.
• Write on ejecution from cache
• Periodic write
• Write on close

Sinha. Distributed operating systems


Caching (8 )

Delayed-Write Scheme helps in perfonmance improvement for write


accesses due the following reasons:

1. Write accesses complete more quickly because the new value is


written only in the cache pf the client performing the write.
2. Modified data my be deleted before it is time to send them to the
server. Example temporary files.
3. Gathering of all file updates and sending them together to the
server is more efficient than sending each update separately.

However can be some reliability problems. Modifications not yet send to


the server from a client’s cache will be lost if the client crashes. The
sharing-file semantics can results some fuzzy.

Sinha. Distributed operating systems


Caching (9 )

Cache Validation Schemes.-


A file data may simoultaneously reside in the cache of multiple nodes. The
modification propagation polcy only specifies when the master copy of a
file at the server node is update upon modification of a cache entry.
Client initiated approach.-
1. Checking before every access. Thies approach defeats the main
purpose of caching, but it is suitable for supporting UNIX-like semantics.
2. Periodic checking. A check is initaited very fixed interval of time.
3. Check on file open. With this option, a client’s cache entry is validated
only when the client opens the corresponding file for use. It is suitable
for supporting session semantics
Server initiated approach.-
File Replication (1)

Differences between Replication and Caching

1. A replica is associated with a server, whereas a cached copy is


normally associated with a client.
2. The existence of a cached copy is primarily depedent on the locality in
flie access patterns, whereas the existence of a replica normally
depends on availability and performance requirements.
3. As compared to a cached copy, a replica is more persistent, widely
known, secure, avilable, complete and accurate.
4. A cached copy is contingent upon a replica. Only by periodic
revalidation with respect to a replica can a cached copy be useful

Sinha. Distributed operating systems


File Replication (2)

The possible benefits that offers the replication of data are:

1. Increased availability
2. Increased reliability
3. Improved response time
4. Reduced network traffic
5. Improved system throughput
6. Better scalability
7. Autonomous operation

Sinha. Distributed operating systems


Fault tolerance

The fault tolerance is an important issue in the design of a distributed file


system. The characteristics of that kind of systems make possible
several fault situations. The primary file properties that directly influence
the availabilty of a distributed system to tolerate faults are:

1. Availability. It is a reference about the fractgion of the time for which


the file is available for use. Replication is a mechanism for
improving the availability of a file.
2. Robustness. It refers to its power to survive crashes of the storage
device and decays of the storage medium on wich is stored.
3. Recoverability. Ability to be rolled back to an earlier, consistent state
when an operation on the file fials or is aborted by the client.

Sinha. Distributed operating systems


Stateful vs. Stateless file server
Client process Server process
Open(filename, mode)
File table.
Return(fid)
fid Mode R/W pointer

Read(fid, 100, buf)


Return(bytes 0 to 99)

Read(fid, 100, buf)


Return(bytes 100 to 199)

Client process Server process


Read(filename, 0,100,buf)
File state infor.
File
Mode R/W pointer
name
Return(bytes 0 to 99)

Read(filename,100,100,buf)

Return(bytes 100 to 199)

Sinha. Distributed operating systems


Distinctions between Stateful vs. Stateless service (1)
Failure Recovery.
- A stateful server loses all its volatile state in a crash.
• Restore state by recovery protocol based on a dialog with
clients, or abort operations that were underway when the crash
occurred.
• Server needs to be aware of client failures in order to reclaim
space allocated to record the state of crashed client processes
(orphan detection and elimination).
- With stateless server, the effects of server failures and recovery are
almost unnoticeable. A newly reincarnated server can respond to a
self-contained request without any difficulty.
Distinctions between Stateful vs. Stateless service (2)
‰Penalties for using the robust stateless service:
- Longer request messages.
- Slower request processing.
- Difficulty in providing UNIX file semantics.
‰Some environments require stateful service.
- A server employing server-initiated cache validation cannot
provide stateless service, since it maintains a record of which files
are cached by which clients.
- UNIX use of file descriptors and implicit offsets is inherently
stateful; servers must maintain tables to map the file descriptors to
inodes, and store the current offset within a file.
Organization

General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
NFS. Introduction

• An industry standard for file sharing on local networks since


the 1980s
• An open standard with clear and simple interfaces
• Supports many of the design requirements already
mentioned:
– transparency
– heterogeneity
– efficiency
– fault tolerance
• Limited achievement of:
– concurrency
– replication
– consistency
– security
NFS Architecture (1)

a) The remote access model.


b) The upload/download model
NFS Architecture (2)
NFS Architecture (3)
Client computer

Client computer NFS Server computer


Application
program Client

Application Application Application


Kernel program
program program
UNIX
system calls

UNIX kernel Virtual file system Virtual file system


Operations Operations
on local files on
file system

remote files
UNIX NFS UNIX
NFS NFS Client
file file
Other

client server
system system

NFS
protocol
(remote operations)
Colouris Distributed systems
NFS Architecture (4)

Does the implementation have to be in the system kernel?


No:
– there are examples of NFS clients and servers that run at application-
level as libraries or processes (e.g. early Windows and MacOS
implementations, current PocketPC, etc.)
But, for a Unix implementation there are advantages:
– Binary code compatible - no need to recompile applications
• Standard system calls that access remote files can be routed
through the NFS client module by the kernel
– Shared cache of recently-used blocks at client
– Kernel-level server can access i-nodes and file blocks directly
• but a privileged (root) application program could do almost the
same.
– Security of the encryption key used for authentication.
File System Model
Operation v3 v4 Description
Create Yes No Create a regular file
Create No Yes Create a nonregular file
Link Yes Yes Create a hard link to a file
Symlink Yes No Create a symbolic link to a file
Mkdir Yes No Create a subdirectory in a given directory
Mknod Yes No Create a special file
Rename Yes Yes Change the name of a file
Rmdir Yes No Remove an empty subdirectory from a directory
Open No Yes Open a file
Close No Yes Close a file
Lookup Yes Yes Look up a file by means of a file name
Readdir Yes Yes Read the entries in a directory
Readlink Yes Yes Read the path name stored in a symbolic link
Getattr Yes Yes Read the attribute values for a file
Setattr Yes Yes Set one or more attribute values for a file
Read Yes Yes Read the data contained in a file
Write Yes Yes Write data to a file

An incomplete list of file system operations supported by NFS.


Tanenbaum. Distributed systems
Communication

a) Reading data from a file in NFS version 3.


b) Reading data using a compound procedure in version 4.

Tanenbaum. Distributed systems


NFS. Characteristics

• Stateless server, so the user's identity and access rights must be checked
by the server on each request.
– In the local file system they are checked only on open()
• Every client request is accompanied by the userID and groupID
• Server is exposed to imposter attacks unless the userID and groupID are
protected by encryption
• Kerberos has been integrated with NFS to provide a stronger and more
comprehensive security solution
y Mount operation:
mount(remotehost, remotedirectory, localdirectory)
y Server maintains a table of clients who have mounted filesystems at that
server
y Each client maintains a table of mounted file systems holding:
< IP address, port number, file handle>
y Hard versus soft mounts
NFS. Naming (1)

Tanenbaum. Distributed systems


NFS. Naming (2)

Tanenbaum. Distributed systems


NFS. Automounting (1)

NFS client catches attempts to access 'empty' mount points and routes
them to the Automounter
– Automounter has a table of mount points and multiple candidate
serves for each
– it sends a probe message to each candidate server and then uses
the mount service to mount the filesystem at the first server to
respond
• Keeps the mount table small
• Provides a simple form of replication for read-only filesystems
– E.g. if there are several servers with identical copies of /usr/lib then
each server will have a chance of being mounted at some clients.
NFS. Automounting (2)

Tanenbaum. Distributed systems


NFS. Automounting (3)

Using symbolic links with automounting.


Tanenbaum. Distributed systems
NFS. File Attributes (1)

Attribute Description
TYPE The type of the file (regular, directory, symbolic link)
SIZE The length of the file in bytes
Indicator for a client to see if and/or when the file has
CHANGE
changed
FSID Server-unique identifier of the file's file system

Some general mandatory file attributes in NFS.

Tanenbaum. Distributed systems


NFS. File Attributes (2)

Attribute Description
ACL an access control list associated with the file
FILEHANDLE The server-provided file handle of this file
FILEID A file-system unique identifier for this file
FS_LOCATIONS Locations in the network where this file system may be found
OWNER The character-string name of the file's owner
TIME_ACCESS Time when the file data were last accessed
TIME_MODIFY Time when the file data were last modified
TIME_CREATE Time when the file was created

Some general recommended file attributes.

Tanenbaum. Distributed systems


File Locking in NFS (1)

Operation Description
Lock Creates a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the leas on a specified lock

NFS version 4 operations related to file locking.

Tanenbaum. Distributed systems


File Locking in NFS (2)

Current file denial state


NONE READ WRITE BOTH

Request READ Succeed Fail Succeed Succeed


access WRITE Succeed Succeed Fail Succeed
BOTH Succeed Succeed Succeed Fail
(a)

Requested file denial state


NONE READ WRITE BOTH
Current
READ Succeed Fail Succeed Succeed
access
state WRITE Succeed Succeed Fail Succeed
BOTH Succeed Succeed Succeed Fail
(b)

The result of an open operation with share reservations in NFS.


• When the client requests shared access given the current denial state.
• When the client requests a denial state given the current file access state.

Tanenbaum. Distributed systems


NFS. Client Caching (1)

• Client-side caching in NFS.

Tanenbaum. Distributed systems


NFS. Client Caching (2)

Tanenbaum. Distributed systems


NFS. The security architecture

Tanenbaum. Distributed systems


NFS. Access Control
Operation Description
Read_data Permission to read the data contained in a file
Write_data Permission to to modify a file's data
Append_data Permission to to append data to a file
Execute Permission to to execute a file
List_directory Permission to to list the contents of a directory
Add_file Permission to to add a new file t5o a directory
Add_subdirectory Permission to to create a subdirectory to a directory
Delete Permission to to delete a file
Delete_child Permission to to delete a file or directory within a directory
Read_acl Permission to to read the ACL
Write_acl Permission to to write the ACL
Read_attributes The ability to read the other basic attributes of a file
Write_attributes Permission to to change the other basic attributes of a file
Read_named_attrs Permission to to read the named attributes of a file
Write_named_attrs Permission to to write the named attributes of a file
Write_owner Permission to to change the owner
Synchronize Permission to to access a file locally at the server with synchronous reads and writes

The classification of operations recognized by NFS with respect to access control.


Tanenbaum. Distributed systems
NFS. Scalability

ŠThe performance of a single server can be increased by the addition of


processors, disks and controllers.
ŠWhen the limits of that process are reached, additional servers must be
installed and the filesystems must be reallocated between them.
ŠThe effectiveness of that strategy is limited by the existence of ‘hot spot’
files.
ŠWhen loads exceed the maximum performance, a distributed file system
that supports replication of updatable files, or one that reduces the protocol
traffic by the caching of whole files, may offer a better solution.
Organization

General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
AFS. Introduction
• A distributed computing environment under development since 1983 at
Carnegie-Mellon University.
• Andrew is highly scalable; the system is targeted to span over 5000
workstations.
• Andrew distinguishes between client machines (workstations) and
dedicated server machines. Servers and clients run the 4.2BSD UNIX
OS and are interconnected by an inter-net of LANs.
• NFS compatible.

Design characteristics:
Whole-file serving. (In AFS-3 files larger than 64 kbytes are transferred
in 64-kbyte chunks).
Whole-file caching. The copy or chunk transferred to client is stored in
a cache on local disk. The cache is permanent. On open request the
local copies are preferred to remote copies.
AFS. Initial considerations
• most files are small--transfer files rather than disk blocks?
• reading more common than writing
• most access is sequential
• most files have a short lifetime--lots of applications generate
temporary files (such as a compiler).
• file sharing is unusual (in terms of reads and writes)--argues for
client caching
• processes use few files
• files can be divided into classes--handle “system” files and “user”
files differently.
AFS. Characteristics
• Clients are presented with a partitioned space of file names: a local name
space and a shared name space.
• Dedicated servers, called Vice, present the shared name space to the
clients as an homogeneous, identical, and location transparent file
hierarchy.
• The local name space is the root file system of a workstation, from which
the shared name space descends.
• Workstations run the Virtue (Venus) protocol to communicate with Vice, and
are required to have local disks where they store their local name space.
• Servers collectively are responsible for the storage and management of the
shared name space.
• Clients and servers are structured in clusters interconnected by a backbone
LAN.
• A cluster consists of a collection of workstations and a cluster server and is
connected to the backbone by a router.
• A key mechanism selected for remote file operations is whole file caching.
Opening a file causes it to be cached, in its entirety, on the local disk.
AFS. Processes distribution

Workstations Servers

User Venus
program
Vice
UNIX kernel

UNIX kernel

Venus
User Network
program
UNIX kernel

Vice

Venus
User
program UNIX kernel
UNIX kernel

Colouris Distributed systems


AFS. System call interception

Workstation

User Venus
program
UNIX file Non-local file
system calls operations

UNIX kernel
UNIX file system

Local
disk

Colouris Distributed systems


AFS. The main components of the Vice service interface

Fetch(fid) -> attr, data Returns the attributes (status) and, optionally, the contents of file
identified by the fid and records a callback promise on it.
Store(fid, attr, data) Updates the attributes and (optionally) the contents of a specified
file.
Create() -> fid Creates a new file and records a callback promise on it.
Remove(fid) Deletes the specified file.
SetLock(fid, mode) Sets a lock on the specified file or directory. The mode of the
lock may be shared or exclusive. Locks that are not removed
expire after 30 minutes.
ReleaseLock(fid) Unlocks the specified file or directory.
RemoveCallback(fid) Informs server that a Venus process has flushed a file from its
cache.
BreakCallback(fid) This call is made by a Vice server to a Venus process. It cancels
the callback promise on the relevant file.

Colouris Distributed systems


AFS. Implementation of file system calls
User process UNIX kernel Venus (Virtue) Net Vice
open(FileName, If FileName refers to a
mode) file in shared file space,
pass the request to Check list of files in
local cache. If not
Venus. present or there is no
valid callback promise,
send a request for the
file to the Vice server
that is custodian of the
volume containing the Transfer a copy of the
file. file and a callback
promiseto the
workstation. Log the
Place the copy of the callback promise.
file in the local file
Open the local file and system, enter its local
return the file name in the local cache
descriptor to the list and return the local
application. name to UNIX.
read(FileDescriptor, Perform a normal
Buffer, length) UNIX read operation
on the local copy.
write(FileDescriptor, Perform a normal
Buffer, length) UNIX write operation
on the local copy.
close(FileDescriptor) Close the local copy
and notify Venus that
the file has been closed. If the local copy has
been changed, send a
copy to the Vice server Replace the file
that is the custodian of contents and send a
the file. callback to all other
clients holdingcallback
promiseson the file.
Colouris Distributed systems
AFS. Questions about implementation
There are many open questions about the implementation of AFS:

• How does AFS gain control when an open or close system call
referring to a file in the shared file space is issued by a client?
• How is the server holding the required file located?
• What space is allocated for cached files in workstations?
• How does AFS ensures that the cached copies are up to date when
files may be updated by several clients?

One of the file partitions on the local disk of each workstations is used as a
cache holding the cached copies of files from shared space. Venus
manages the cache. The workstation cache is usaully large enough to
accommodate several hundred average-sized files. If the user do not
modify the files cached, the workstations are largely independent of the
Vice servers.
Organization

General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
Coda. Introduction (I)

• Built by M. Satyanarayanan. Based on the Andrew File System


(AFS)
• Relatively few servers
9 simpler administration
9 centralized storage/backup
• Only servers are trusted
9 improved security
9 no user code on servers
• Location-transparent file system
9 easy data sharing
9 user mobility
9 workstation independence
• Preserve AFS strengths
9 shared unix file system model
9 scalability and performance
9 security
Coda. Introduction (II)

• Salient features:
– Support for disconnected operations
• Desirable for mobile users
– Support for a large number of users

‰ Disconnected operation
– a temporary deviation from normal operation as a client of a
shared repository

‰ Why?
– enhance availability
‰ How?
– data cache
Coda. Design overview (1)

Two strategies for availability


make shared repository more robust
enhance local autonomy
neither adequate by itself
Server replication
improves availability of data
useless if client is isolated
Disconnected operation
system useful also when isolated
limited to data in local cache
Coda. Design overview (2)
• Application area: academic and research, not for highly
concurrent, fine granularity data access, safety-critical system
• Volume, unit of replication, subtree of the Coda namespace
mapped to individual file servers
– VSG. Volume Storage Group, the set of replication sites for a
Volume
– AVSG, Accesible Volume Storage Group, currently
accessible VSG
• Callback, when a workstation caches a file or directory, the
server promise to notify it before allowing modification by others
• Venus, cache manager
– caches latest data from AVSG
– propagates changes to AVSG
– detects AVSG membership changes
Coda. Consistency guarantees

• AFS guarantees
9open: result of last close anywhere
9close: immediate propagation everywhere
9failure: server or network failure

• Coda guarantees
9open: result of last close in accessible universe
9close: immediate propagation to accessible universe
eventual propagation everywhere
9failure: cache miss when disconnected
Coda

The overall organization of AFS.


Read-one-data, read-all-status, write-all

Read - Cache Miss on Open Write - Update


Coda. Naming

• Clients in Coda have access to a single shared name space


• Files are grouped into volumes [partial subtree in the directory structure]
– Volume is the basic unit of mounting
– Namespace: /afs/filesrv.cs.umass.edu [same namespace on all client; different from
NFS]
– Name lookup can cross mount points: support for detecting crossing and automounts
Sharing Files in Coda

• Transactional behavior for sharing files: similar to share reservations in


NFS
– File open: transfer entire file to client machine [similar to delegation]
– Uses session semantics: each session is like a transaction
• Updates are sent back to the server only when the file is closed
Transactional Semantics

File-associated data Read? Modified?


File identifier Yes No
Access rights Yes No
Last modification time Yes Yes
File length Yes Yes
File contents Yes Yes

• Network partition: part of network isolated from rest


– Allow conflicting operations on replicas across file partitions
– Reconcile upon reconnection
– Transactional semantics => operations must be serializable
• Ensure that operations were serializable after thay have executed
– Conflict => force manual reconciliation
Coda. Client Caching

• Cache consistency maintained using callbacks


– Server tracks all clients that have a copy of the file [provide
callback promise]
– Upon modification: send invalidate to clients
Coda. Server Replication

• Use replicated writes: read-once write-all


– Writes are sent to all AVSG (all accessible replicas)
• How to handle network partitions?
– Use optimistic strategy for replication
– Detect conflicts using a Coda version vector
– Example: [2,2,1] and [1,1,2] is a conflict => manual reconciliation
Disconnected Operation

• The state-transition diagram of a Coda client with respect to a


volume.
• Use hoarding to provide file access during disconnection
– Prefetch all files that may be accessed and cache (hoard) locally
– If AVSG=0, go to emulation mode and reintegrate upon
reconnection

Distributed Systems. Tanenbaum, Van Steen. © Prentice-Hall 2002


Coda. Hoarding and Emulation

Hoarding
• Prioritized Cache Management
– Hoard Profiles specify user interest (directories allowed)
– Recent usage
– Hoard priority based on above two

• Hoard walking
– Since priority based on recent usage, every once in a while need to update
file system to reflect priorities
– 10 min default. Can be changed.

Emulation
• Allow updating without contacting file server
• All updates logged in a per volume “replay log”
• Log optimizations to reduce log size
• Persistence achieved using Recoverable Virtual Memory (RVM)
Coda. Reintegration

• Updates propagated to servers and vice versa (one volume at a time)


– 4 stages (of a transaction)
• Log parsed, files locked
• Validation: conflict detection, disk-space check, integrity
• Fetching: updated files from client
• Commit: locks released and changes finalized
– Failures must be manually examined from logs

Conflicts resolution
• Unresolved conflict represented as dangling symbolic link

• Application specific resolvers (ASRs)


– executed at clients
Organization

General concepts.
NFS.
AFS.
Coda.
Enhancements to NFS
NFS enhancement - Spritely NFS

• Is an implementation of the NFS protocol with the addition of


open and close calls.

• The parameters of the Sprite open operation specify a mode


and include counts of the number of local processes that
currently have the file open for reading and for writing.

• Spritely NFS implements a recovery protocol that interrogates a


list of clients to recover the full open files table.
NFS enhancement - NQNFS

• maintains similar client-related state concerning open files, but it


uses leases to aid recovery after a server crash.
• Callbacks are used in a similar manner to Spritely NFS to request
clients to flush their caches when a write request occurs.
NFS enhancement - WebNFS

• makes it possible for application programs to become clients of


NFS servers anywhere in the Internet (using the NFS protocol
directly)
• implementing Internet applications that share data directly, such
as multi-user games or clients of large dynamics databases.
NFS enhancement - NFS version 4

• will include the features of WebNFS


• the use of callback or leases to maintain consistency
• on-the-fly recovery
• Scalability will be improved by using proxy servers in a manner
analogous to their use in the Web.
Bibliography
George Coulouris, Jean Dollimore and Tim Kindberg. Distributed Systems:
Concepts and Design (Edition 3 ). Addison-Wesley 2001 http://www.cdk3.net/
Andrew S. Tanenbaum, Maarten van Steen. Distributed Systems: Principles and
Paradigms. Prentice-Hall 2002.
http://www.cs.vu.nl/~ast/books/ds1/
P. K. Sinha, P.K. "Distributed Operating Systems, Concepts and Design", IEEE
Press, 1993
Sape J. Mullender, editor. Distributed Systems, 2nd edition, ACM Press, 1993
http://wwwhome.cs.utwente.nl/~sape/gos0102.html
Jean Bacon. Concurrent Systems.Operating systems, database and distributed
systems an integrated approach. Addison Wesley, 2nd edition 1998
http://www.cl.cam.ac.uk/users/jmb/cs.html

You might also like