Professional Documents
Culture Documents
Unit-3 (Bit-43)
Unit-3 (Bit-43)
In the case of failure and heavy load, these components together improve data availability by
allowing the sharing of data in different locations to be logically grouped under one folder,
which is known as the “DFS root”.
It is not necessary to use both the two components of DFS together, it is possible to use the
namespace component without using the file replication component and it is perfectly
possible to use the file replication component without using the namespace component
between servers.
Working of DFS :
There are two ways in which DFS can be implemented:
Standalone DFS namespace –
It allows only for those DFS roots that exist on the local computer and are not using
Active Directory. A Standalone DFS can only be acquired on those computers on
which it is created. It does not provide any fault liberation and cannot be linked to any
other DFS. Standalone DFS roots are rarely come across because of their limited
advantage.
Domain-based DFS namespace –
It stores the configuration of DFS in Active Directory, creating the DFS namespace
root accessible at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
Advantages :
Distributed Systems are the systems that make a single system image to users of the networks.
The failure of one system in the network will not be coming to the picture of all other uses.
Here, all the systems act in a dual role such as both client as well as server.
The distributed File system provides a similar abstraction to the users of a distributed
system and makes it convenient for them to use files in a distributed environment.
Characteristic of distributed file system.
Remote data/file sharing: It allows a file to be transparently accessed by
processes of any node of the system irrespective of the file’s location. Example:
Any process ‘A’ can create the file and share it with other processes ‘B’ or ‘C’
and the same file can be accessed/modified process running in other nodes.
User mobility: Users in the distributed systems are allowed to work in any
system at any time. So, users need not relocate secondary storage devices in
distributed file systems.
Availability: Distributed file systems keep multiple copies of the same file in
multiple places. Hence, the availability of the distributed file system is high and
it maintains a better fault tolerance for the system.
Data Integrity: A file system is typically shared by several users. Data saved in
a transferred file must have its integrity protected by the file system. The correct
synchronisation of concurrent access requests from multiple users vying for
access to the same file requires a concurrency control method. Atomic
transactions, which are high-level concurrency management systems for data
integrity, are frequently made available to users by file systems.
Performance: Performance is evaluated using the typical amount of time it
takes to persuade a client. It must function in a manner comparable to a centralised
file system.
Diskless workstations: Distributed file systems allow the use of diskless
workstations to reduce noise and heat in the system. Also, diskless workstations
are more economical than disk full workstations.
Use of FIle Models: The DFS uses different conceptual models of a file. The
following are the two basic criteria for file modeling, which include file structure
and modifiability. The files can be unstructured or structured based on the
applications used in file systems. Also, the modifiability of the file can be
categorized as mutable and immutable files.
Use of FIle Accessing Models: A distributed file system may use one of the
following models to service a client’s file access request when the accessed file is
a remote file. There are two such models are there, viz., the Remote service model
and the Data-caching model.
Use of FIle sharing Semantics: A shared file may be simultaneously accessed
by multiple users. The types of file-sharing semantics can be used such as Unix
Semantics, Session Semantics, Immutable shared files semantics, and transaction-
like semantics.
Use of FIle -Caching Schemes: Basically following key criteria used in file
caching scheme viz., cache location, modification propagation, and cache
validation
Use of FIle Replications: File replication is the primary mechanism for
improving file availability in a distributed systems environment. A replicated file
is a file that has multiple copies with each copy located on a separate file server.
The advent of distributed computing was marked by the introduction of distributed file
systems. Such systems involved multiple client machines and one or a few servers. The server
stores data on its disks and the clients may request data through some protocol messages.
File Attributes: “File attributes” is a term commonly used in NFS terminology. This is a
collective term for the tracked metadata of a file, including file creation time, last modified,
size, ownership permissions etc. This can be accessed by calling stat() on the file. NFSv2
Message Description
Given file handle and name of the file to look up, returns file
NFSPROC_LOOKUP
handle.
Message Description
Given file handle, offset, count data and attributes, reads the
NFSPROC_READ
data.
Given file handle, offset, count data and attributes, writes data
NFSPROC_WRITE
into the file.
NFSPROC_REMOVE Given the directory handle and name of file, deletes the file.
2. Sprite Formats: The SFS supports various sprite formats, such as PNG, JPEG, GIF, or
proprietary formats optimized for performance and storage efficiency. Each sprite file contains
the graphical data necessary to render the sprite, including pixel information, transparency data,
and metadata.
3. Metadata: Each sprite file may contain metadata providing additional information about the
sprite, such as its dimensions, position, animation frames, collision properties, and other
attributes relevant to its usage in the game or application.
4. Compression and Optimization: To minimize storage space and optimize loading times,
the SFS may employ compression techniques tailored for graphical data. Compression
algorithms like zlib or LZMA can be used to reduce the size of sprite files without significant
loss of quality. Additionally, the SFS may optimize sprite assets for specific target platforms or
display resolutions.
5. Texture Atlases: In many cases, sprites are combined into texture atlases or sprite sheets to
further optimize rendering performance. Texture atlases pack multiple sprites into a single
image, reducing the number of texture swaps during rendering and improving GPU efficiency.
6. Runtime Loading and Caching: The SFS provides functionality for loading sprite assets
at runtime as needed by the application. This allows for dynamic loading of sprites during
gameplay or level transitions, reducing initial loading times and memory usage. Additionally,
the SFS may implement caching mechanisms to store frequently accessed sprite data in
memory for faster retrieval.
7. Version Control and Collaboration: For collaborative projects, the SFS may integrate with
version control systems like Git or Subversion to track changes to sprite assets over time and
facilitate collaboration among team members. Version control ensures that changes to sprite
assets are documented, reversible, and synchronized across multiple development
environments.
Overall, a Sprite File System plays a crucial role in managing sprite assets efficiently,
optimizing performance, and streamlining the development workflow in graphics-intensive
applications and games.
Log-Structured File Systems were introduced by Rosenblum and Ousterhout in the early 90’s
to address the following issues.
Growing system memories: With growing disk sizes, the amount of data that
can be cached also increases. Since reads are serviced by the cache, the file system
performance begins to depend solely on its write performance.
Sequential I/O performance trumps over random I/O performance: Over
the years, the bandwidth of accessing bits off the hard drive has increased because
more bits can be accommodated over the same area. However, it is physically
difficult for the small rotors to move the disk more quickly. Therefore, sequential
access can improve disk performance significantly.
Inefficiency of existing file systems: Existing file systems perform a large
number of writes for as much as creating a new file, including inode, bitmap and
data block writes and subsequent updates. The short seeks and rotational delays
incurred reduces bandwidth.
File systems are not RAID-aware: Further, file systems do not have any
mechanism to counter the small-write problem in RAID-4 and RAID-5.
Even though processor speeds and main memory sizes have increased at an exponential rate,
disk access costs have evolved much more slowly. This calls for a file system which focusses
on write performance, makes use of the sequential bandwidth, and works efficiently on both
disk writes as well as metadata updates. This is where the motivation is Log-Structured File
System (LFS) is rooted.
The following are the data structures used in the LFS implementation.
Inodes: As in Unix, inodes contain physical block pointers to files.
Inode Map: This table indicates the location of each inode on the disk. The
inode map is written in the segment itself.
Segment Summary: This maintains information about each block in the segment.
Segment Usage Table: This tells us the amount of data on a block.
DSM is a mechanism that manages memory across multiple nodes and makes inter-process
communications transparent to end-users. The applications will think that they are running
on shared memory. DSM is a mechanism of allowing user processes to access shared data
without using inter-process communications. In DSM every node has its own memory and
provides memory read and write services and it provides consistency protocols. The
distributed shared memory (DSM) implements the shared memory model in distributed
systems but it doesn’t have physical shared memory. All the nodes share the virtual address
space provided by the shared memory model. The Data moves between the main memories
of different nodes.
On-Chip Memory:
Bus-Based Multiprocessors:
A set of parallel wires called a bus acts as a connection between CPU and
memory.
accessing of same memory simultaneously by multiple CPUs is prevented by
using some algorithms
Cache memory is used to reduce network traffic.
Ring-Based Multiprocessors:
Apart from the above-mentioned advantages, DSM has furthermore advantages like:
Less expensive when compared to using a multiprocessor system.
No bottlenecks in data access.
Scalability i.e. Scales are pretty good with a large number of nodes.
In this, a central server maintains all shared data. It services read requests from other
nodes by returning the data items to them and write requests by updating the data and
returning acknowledgement messages.
Time-out can be used in case of failed acknowledgement while sequence number can
be used to avoid duplicate write requests.
It is simpler to implement but the central server can become bottleneck and to
overcome this shared data can be distributed among several servers. This distribution
can be by address or by using a mapping function to locate the appropriate server.
2. Migration Algorithm:
In contrast to central server algo where every data access request is forwarded to
location of data while in this data is shipped to location of data access request which
allows subsequent access to be performed locally.
It allows only one node to access a shared data at a time and the whole block
containing data item migrates instead of individual item requested.
It is susceptible to thrashing where pages frequently migrate between nodes while
servicing only a few requests.
This algo provides an opportunity to integrate DSM with virtual memory provided by
operating system at individual nodes.
3. Read Replication Algorithm:
This extends the migration algorithm by replicating data blocks and allowing multiple
nodes to have read access or one node to have both read write access.
It improves system performance by allowing multiple nodes to access data
concurrently.
The write operation in this is expensive as all copies of a shared block at various nodes
will either have to invalidated or updated with the current value to maintain
consistency of shared data block.
DSM must keep track of location of all copies of data blocks in this.
sequential consistency: the result of any execution of the operations of all the processors
is the same as if they were executed in a sequential order.
General Consistency: All the copies of a memory location eventually contain the same
data when all the writes issues by every processor have completed.
Weak consistency: synchronization accesses are sequentially consistent. All data access
must be performed before each synchronization.
Other consistency models: general consistency, processor consistency, release
consistency.
Coherence Protocols
The needs to make the data replicas consistent
Two types of basic protocols
o Write-Invalidate Protocol: a write to a shared data causes the invalidation of all
copies except one before the write.
o Write-Update Protocol: A write to a share data causes all copies of that data to
be updated.
o Case Study: Cache coherence in the PLUS system.
General consistency
Unit of replication: a page (4KB)
Coherence maintenance in the unit of one word
A virtual page is PLUS corresponds to a list of replicas, one of the replica is the master
copy. The locations of other replicas are maintained through a distributed link list (copy
list)(Figure 10.6)
On a read fault: if local memory, read local memory. Otherwise, send request a
specified remote node and get the data
For write: First update the master copy and then propagated to the copies linked by the
copy list. On a write fault, if the address indicates a remote node, the update request is
sent to the remote node. If the copy is not the master copy, the update request is sent
to the nod containing the master copy for updating and then further propagation.
write is nonblocking.
read is blocked when all writes completes.
write-fence is used to flush all previous writes.
Granularity
Page replacement
least recently used (LRU) may not be appropriate -- data can be accessed in different
mode: shared, private, read-only, writable, etc.
replacement policy needs to take access modes into consideration.e.g. private data
should be replaced before shared data. READ-only page can just be deleted.
The techniques that are used for scheduling the processes in distributed systems are as
follows:
1. Task Assignment Approach: In the Task Assignment Approach, the user-submitted
process is composed of multiple related tasks which are scheduled to appropriate
nodes in a system to improve the performance of a system as a whole.
2. Load Balancing Approach: In the Load Balancing Approach, as the name implies,
the workload is balanced among the nodes of the system.
3. Load Sharing Approach: In the Load Sharing Approach, it is assured that no node
would be idle while processes are waiting for their processing.