Professional Documents
Culture Documents
DC Unit-6
DC Unit-6
DC Unit-6
1. Storage services
2. True file services
3. Name services
Distributed File system
❏ A special case of distributed system
❏ Allows multi-computer systems to share files, even when no other IPC or
RPC is needed
❏ Sharing devices: Special case of sharing files
❏ Eg, NFS (Sun’s Network File System), Windows NT, 2000, XP, Andrew File
System (AFS)
❏ One of most common uses of distributed computing
❏ Goal: provide common view of centralized file system, but distributed
implementation.
1. Ability to open & update any file on any machine on network.
2. All of synchronization issues and capabilities of shared local files
DFS Architecture
❏ In general, files in a DFS can be located in “any” system. We call the
“source(s)” of files to be servers and those accessing them to be clients.
❏ Potentially, a server for a file can become a client for another file.
❏ However, most distributed systems distinguish between clients and
servers in more strict way:
1. Clients simply access files and do not have/share local files.
2. Even if clients have disks, they (disks) are used for swapping, caching,
loading the OS, etc.
3. Servers are the actual sources of files.
4. In most cases, servers are more powerful machines (in terms of CPU,
physical memory, disk bandwidth, ..)
❏ Service: software entity running on one or more machines and providing
a particular type of function to a priori unknown clients.
❏ Server: service software running on a single machine.
❏ Client: process that can invoke a service using a set of operations that
forms its client interface.
1. A client interface for a file service is formed by a set of primitive file
operations(create, delete, read, write).
2. Interface of a DFS should be transparent i.e not distinguish between local
and remote files.
Features of DFS
❏ Transparency : Structure, Access, Naming, Replication
❏ User Mobility
❏ Performance
❏ Simplicity and ease of use
❏ Scalability
❏ High availability
❏ High reliability
❏ Data Integrity
❏ Security
❏ Heterogeneity
Design issues
❏ Naming: Locating the file/directory in a DFS based on name.
❏ Location of cache: disk, main memory, both.
❏ Writing policy: Updating original data source when cache content gets
modified.
❏ Cache consistency: Modifying cache when data source gets modified.
❏ Availability: More copies of files/resources.
❏ Scalability: Ability to handle more clients/users.
❏ Semantics: Meaning of different operations (read, write,…)
Naming of distributed files
Naming – mapping between logical and physical objects.
❏ A Transparent DFS hides the location where in the network the file is
stored.
❏ Location transparency – file name does not reveal the file’s physical
storage location.
1. File name denotes a specific, hidden, set of physical disk blocks.
2. Convenient way to share data.
3. Could expose correspondence between component units and machines.
❏ Location independence – filename does not need to be changed when
the file’s physical storage location changes.
1. Better file abstraction.
2. Promotes sharing the storage space itself.
3. Separates the naming hierarchy from the storage-devices hierarchy.
Naming
❏ Name space: (e.g.,) /home/students/jack, /home/staff/jill.
❏ Namespace is a collection of names.
❏ Location transparency: file names do not indicate their physical locations.
❏ Name resolution: mapping namespace to an object/device/file/directory.
❏ Naming approaches:
❏ Simple Concatenation: add hostname to file names.
❏ Guarantees unique names.
❏ No transparency. Moving a file to another host involves a file name
change.
DFS: Three Naming Schemes
1.Mount remote directories to local directories, giving the appearance of a
coherent local directory tree
•Mounted remote directories can be accessed transparently.
•Unix/Linux with NFS; Windows with mapped drives
2.Files named by combination of host name and local name;
•Guarantees a unique system wide name
•Windows Network Places, Apollo Domain
3.Total integration of component file systems.
•A single global name structure spans all the files in the system.
•If a server is unavailable, some arbitrary set of directories on different
machines also becomes unavailable.
Mechanism of DFS
❏ Mounting: to help in combining files/directories in different systems and
form a single file system structure.
❏ Caching: to reduce the response time in bringing data from remote
machines. Modified caching
❏ Bulk data transfer: helps in reducing the delay due to transfer of files
over the network. Bulk:
● Obtain multiple number of blocks with a single seek
● Format, transfer large number of packets in a single context switch.
● Reduce the number of acknowledgements to be sent.
● (e.g) useful when downloading OS onto a diskless client.
❏ Encryption: Establish a key for encryption with the help of an
authentication server.
File Access Model : Remote File access
❏ Reduce network traffic by retaining recently accessed disk blocks in a
cache, so that repeated accesses to the same information can be handled
locally.
–If needed data not already cached, a copy of data is brought from the server
to the user.
–Accesses are performed on the cached copy.
–Files identified with one master copy residing at the server machine, but
copies of (parts of) the file are scattered in different caches.
–Cache-consistency problem keeping the cached copies consistent with the
master file.
Caching
❏ Performance of distributed file system, in terms of response time,
depends on the ability to “get” the files to the user.
❏ When files are in different servers, caching might be needed to improve
the response time.
❏ A copy of data (in files) is brought to the client (when referenced).
Subsequent data accesses are made on the client cache.
❏ Client cache can be on disk or main memory.
❏ Data cached may include future blocks that may be referenced too.
❏ Caching implies DFS needs to guarantee consistency of data.
DFS : File Caches
In client memory
❏ Performance speed up; faster access
❏ Good when local usage is transient
❏ Enables diskless workstations
On client disk
❏ Good when local usage dominates (e.g, AFS)
❏ Caches larger files
❏ Helps protect clients from server crashes
DFS : Cache update policy
❏ When does the client update the master file?
I.e. when is cached data written from the cache to the file?
❏ Write-through – write data through to disk ASAP
● I.e., following write() or put(), same as on local disks.
● Reliable, but poor performance.
❏ Delayed-write – cache and then written to the server later.
● Write operations complete quickly; some data may be overwritten in
cache, saving needless network I/O.
● Poor reliability
1. unwritten data may be lost when client machine crashes
2. Inconsistent data
● Variation – scan cache at regular intervals and flush dirty blocks.
DFS Data access
DFS - File consistency
Is locally cached copy of the data consistent with the master copy?
Client-initiated approach
❏ Client initiates a validity check with server.
❏ Server verifies local data with the master copy
❏ E.g., time stamps, etc.
Server-initiated approach
❏ Server records (parts of) files cached in each client.
❏ When server detects a potential inconsistency, it reacts
When should a modified source content be transferred to the cache?
Server-initiated policy:
❏ Server cache manager informs client cache managers that can then
retrieve the data.
Client-initiated policy:
❏ Client cache manager checks the freshness of data before delivering to
users. Overhead for every data access.
❏ Concurrent-write sharing policy:
❏ Multiple clients open the file, at least one client is writing.
❏ File server asks other clients to purge/remove the cached data for the file,
to maintain consistency.
Sequential-write sharing policy: a client opens a file that was
recently closed after writing.
1. This client may have outdated cache blocks of the file (since the other
client might have modified the file contents).
❏ Use time stamps for both cache and files. Compare the time stamps to
know the freshness of blocks.
2. The other client (which was writing previously) may still have modified
data in its cache that has not yet been updated on server.(e.g.,) due to
delayed writing.
❏ Server can force the previous client to flush its cache whenever a new
client opens the file.
DFS - Remote Service vs Caching
Remote Service – all file actions implemented by server.
❏ RPC functions
❏ Use for small memory diskless machines
❏ Particularly applicable if large amount of write activity
Cached System
❏ Many “remote” accesses handled efficiently by the local cache
❏ Most served as fast as local ones.
❏ Servers contacted only occasionally
❏ Reduces server load and network traffic.
❏ Enhances potential for scalability.
❏ Reduces total network overhead
DFS - File Server semantics
Stateless Service
❏ Avoids state information in server by making each request self-contained.
❏ Each request identifies the file and position in the file.
❏ No need to establish and terminate a connection by open and close
operations.
❏ Poor support for locking or synchronization among concurrent accesses
Stateful Service
❏ Client opens a file (as in Unix & Windows).
❏ Server fetches information about file from disk, stores in server memory,
1. Returns to client a connection identifier unique to client and open file.
2. Identifier used for subsequent accesses until session ends.
❏ Server must reclaim space used by no longer active clients.
❏ Increased performance; fewer disk accesses.
❏ Server retains knowledge about file
E.g, read ahead next blocks for sequential access.
E.g, file locking for managing writes, Windows
DFS - Server Semantics comparison
Failure Recovery: Stateful server loses all volatile state in a crash.
1. Restore state by recovery protocol based on a dialog with clients.
2. Server needs to be aware of crashed client processes
orphan detection and elimination.
Failure Recovery: Stateless server failure and recovery are almost unnoticeable.
1. Newly restarted server responds to self-contained requests without difficulty.
Penalties for using the robust stateless service:
1. longer request messages
2. slower request processing
Some environments require stateful service.
1. Server-initiated cache validation cannot provide stateless service.
2. File locking (one writer, many readers).
DFS - Replication
❏ Replicas of the same file reside on failure-independent machines.
❏ Improves availability and can shorten service time.
❏ Naming scheme maps a replicated file name to a particular replica.
1. Existence of replicas should be invisible to higher levels.
2. Replicas must be distinguished from one another by different lower-level
names.
❏ Updates
1. Replicas of a file denote the same logical entity.
2. Update to any replica must be reflected on all other replicas.
DFS - Case Studies
NFS (Network File System)
❏ Developed by Sun Microsystems (in 1985)
❏ Most popular, open, and widely used.
❏ NFS protocol standardised through IETF (RFC 1813)
AFS (Andrew File System)
❏ Developed by Carnegie Mellon University as part of Andrew distributed
computing environments (in 1986)
❏ A research project to create campus wide file system.
❏ Public domain implementation is available on Linux (LinuxAFS)
❏ It was adopted as a basis for the DCE/DFS file system in the Open
Software Foundation (OSF, www.opengroup.org) DCE (Distributed
Computing Environment)
Fundamentals
❏ NFS was developed to allow a machine to mount a disk partition on a
remote machine as if it were local.
❏ This allows for fast, seamless sharing of file across network.
❏ There is no global naming hierarchy.
❏ NFS allows any machine to be client or server.
NFS Protocols:
❏ Mount Protocol
❏ File Access Protocol
❏ Their functionality is defined as a set of Remote Procedure Calls (RPC).
NFS
❏ NFS developed by SUN Microsystems for use on its UNIX-based
workstations.
❏ A distributed file system which allows users to access files and directories
located on remote computers but, data potentially stored on another
machine.
❏ NFS builds on the Open Network Computing Remote Procedure Call (ONC
RPC) system.
❏ Mechanism for storing files on a network. Allows users to ‘Share’ a
directory.
❏ NFS most commonly used with UNIX systems. Other software platforms:
-Mac OS, Microsoft Windows, Novell NetWare, etc.
❏ NFS runs over LAN, even WAN (slowly)
❏ Any system may be both a client and server
❏ Basic idea:
1. Remote directory is mounted onto local directory.
2. Remote directory may contain mounted directories within.
Version 1 and Version 2
❏ Sun used only for in-house experimental purposes
❏ Did not release it to the public
❏ V2 of the protocol originally operated entirely over UDP and was meant to
keep the protocol stateless, with locking (for example) implemented
outside of the core protocol.
❏ Both suffered from performance problems Both suffered from security
problems (security dependant upon IP address).
Version 3
❏ NFS v3 can operate across TCP as well as UDP Support for asynchronous
writes on the server
❏ Obtains multiple file name, handles and attributes Support for 64-bit file
sizes and offsets
Handle files larger than 4 gigabytes (GB) Improves performance, and allowed
it to work more reliably across the Internet.
Version 4
Solution :
❏ Global lock manager: Separate from NFS
❏ Typical locking operations
•Lock – acquire lock (non-blocking)
•Lockt – test a lock
•Locku – unlock a lock
•Renew – renew lease on a lock
NFS implementation
❏ Remote procedure calls for all operations:
1. Implemented in Sun ONC.
2. XDR is interface definition language.
❏ Network communication is client-initiated:
1. RPC based on UDP (non-reliable protocol).
2. Response to remote procedure call is de facto acknowledgement.
❏ Lost requests are simply re-transmitted:
1. As many times as necessary to get a response.
NFS Caching
❏ On client open(), client asks server if its cached attribute blocks are up to
date.
❏ Once file is open, different client processes can write it and get
inconsistent data.
❏ Modified data is flushed back to the server every 30 seconds.
NFS Failure Recovery
Server crashes are transparent to client
❏ Each client request contains all information
❏ Server can re-fetch from disk if not in its caches
❏ Client retransmits request if interrupted by crash (i.e., no response)