Professional Documents
Culture Documents
SSZG554 Merged DDS PDF
SSZG554 Merged DDS PDF
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Anil Kumar Ghadiyaram
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
T1 : M. Tamer Özsu • Patrick Valduriez Principles
of Distributed Database Systems Third Edition
T2 : Distributed Operating Systems: Concepts And
Design By Pradeep K. Sinha, PHI Learning Pvt.
Ltd
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
• Server-centric IT architecture and its limitations
• Storage-centric IT architecture and its advantages
• Architecture of intelligent disk subsystems
• Hard disks and internal I/O channels
• JBOD: Just a bunch of disks
• Storage virtualisation using RAID
• Different RAID levels
– RAID 0: block-by-block striping
– RAID 1: block-by-block mirroring
– RAID 0+1/RAID 10: striping and mirroring combined
– RAID 0+1: striping and mirroring combined
– RAID 10: striping and mirroring combined
• Storage devices are normally only connected to a single server
• To increase fault tolerance, storage devices are sometimes connected to two
servers, with only one server actually able to use the storage device at any
one time
• In both cases, the storage device exists only in relation to the server to which
it is connected. Other servers cannot directly access the data; they always
have to go through the server that is connected to the storage device
• Servers and storage devices are generally connected together by SCSI
cables
In a server-centric IT
architecture storage devices
exist only in relation to
servers.
• The failure of both of these computers would make it impossible to access
this data Company must be available around the clock data for example,
patient files, websites
• Each computer can accommodate only a limited number of I/O cards (for
example, SCSI cards).
• The length of scsi cables is limited to a maximum of 25m. This means that
the storage capacity that can be connected to a computer using conventional
technologies is limited
• Conventional technologies are therefore no longer sufficient to satisfy the
growing demand for storage capacity.
• Computer cannot access storage devices that are connected to a different
computer
SSZG554 - Distributed Data Systems 8th Aug 2020 7 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
SERVER-CENTRIC IT ARCHITECTURE AND ITS LIMITATIONS
The storage capacity on server 2
is full. It cannot make use of the
fact that there is still storage
space free on server 1 and
server 3.
• Storage networks can solve the problems of server-centric IT architecture
• Storage networks open up new possibilities for data management.
• The idea behind storage networks is that the SCSI cable is replaced by a
network that is installed in addition to the existing LAN and is primarily used
for data exchange between computers and storage device in storage
networks storage devices exist completely independently of any computer.
• Several servers can access the same storage device directly over the
storage network without another server having to be involved
• When a storage network is introduced, the storage devices are usually also
consolidated.
• This involves replacing the many small hard disks attached to the computers
with a large disk subsystem.
• Disk subsystems currently (in the year 2009) have a maximum storage
capacity of up to a petabyte.
• The storage network permits all computers to access the disk subsystem
and share it.
• Free storage capacity can thus be flexibly assigned to the computer that
needs it at the time. In the same manner, many small tape libraries can be
replaced by one big one
• Disk subsystem can be visualised as a hard disk server.
• Servers are connected to the connection port of the disk subsystem using
standard I/O techniques such as Small Computer System Interface (SCSI),
Fibre Channel or Internet SCSI (iSCSI) and can thus use the storage
capacity that the disk subsystem provides
• The internal structure of the disk subsystem is completely hidden from the
server, which sees only the hard disks that the disk subsystem provides to
the server.
• The connection ports are extended to the hard disks of the disk subsystem
by means of internal I/O channels
• In most disk subsystems there is a controller between the connection ports
and the hard disks.
• The controller can significantly increase the data availability and data
access performance with the aid of a so-called RAID procedure.
• Furthermore, some controllers realize the copying services instant copy and
remote mirroring and further additional services.
• The controller uses a cache in an attempt to accelerate read and write
accesses to the server.
• Standard I/O techniques such as SCSI, Fibre Channel, increasingly Serial
ATA (SATA) and Serial Attached SCSI (SAS) and, still to a degree, Serial
Storage Architecture (SSA) are being used for internal I/O channels between
connection ports and controller as well as between controller and internal
hard disks
• The I/O channels can be designed with built-in redundancy in order to
increase the fault-tolerance of a disk subsystem.
The following cases can be differentiated here:
ü Active
ü Active/passive
ü Active/active (no load sharing)
ü Active/active (load sharing)
Disk subsystem – active
Disk subsystem – Active/passive
• In active/passive cabling the individual hard disks
are connected via two I/O channels
• In normal operation the controller communicates
with the hard disks via the first I/O channel and
the second I/O channel is not used.
• In the event of the failure of the first I/O channel,
the disk subsystem switches from the first to the
second I/O channel.
Disk subsystem – Active/active (no load sharing)
• In this cabling method the controller uses both I/O
channels in normal operation
• The hard disks are divided into two groups
• In normal operation the first group is addressed
via the first I/O channel and the second via the
second I/O channel.
• If one I/O channel fails, both groups are
addressed via the other I/O channel.
Disk subsystem – Active/active (load sharing)
• In this approach all hard disks are addressed via
both I/O channels in normal operation
• The controller divides the load dynamically
between the two I/O channels so that the available
hardware can be optimally utilised.
• If one I/O channel fails, then the communication
goes through the other channel only.
If we compare disk subsystems with regard to their controllers we can
differentiate between three levels of complexity:
1. No controller
2. RAID controller
3. Intelligent controller with additional services such as instant copy and
remote mirroring
• If the disk subsystem has no internal controller, it is only an enclosure full of
disks (JBODs). In this instance, the hard disks are permanently fitted into the
enclosure and the connections for I/O channels and power supply are taken
outwards at a single point.
• Therefore, a JBOD is simpler to manage than a few loose hard disks.
• Typical JBOD disk subsystems have space for 8 or 16 hard disks.
• A connected server recognizes all these hard disks as independent disks.
Therefore, 16 device addresses are required for a JBOD disk subsystem
incorporating 16 hard disks
• A connected server recognises all these hard disks as independent disks.
• Therefore, 16 device addresses are required for a JBOD disk subsystem
incorporating 16 hard disks.
• In some I/O techniques such as SCSI and Fibre Channel arbitrated loop this
can lead to a bottleneck at device addresses
• In contrast to intelligent disk subsystems, a JBOD disk subsystem in
particular is not capable of supporting RAID or other forms of virtualisation.
• If required, however, these can be realised outside the JBOD disk
subsystem, for example, as software in the server or as an independent
virtualisation entity in the storage network
• RAID has two main goals: to increase performance by striping and to
increase fault-tolerance by redundancy.
• Striping distributes the data over several hard disks and thus distributes the
load over more hardware.
• Redundancy means that additional information is stored so that the operation
of the application itself can continue in the event of the failure of a hard disk.
• The bundle of physical hard disks brought together by the RAID controller
are also known as virtual hard disks.
• A server that is connected to a RAID system sees only the virtual hard disk;
the fact that the RAID controller actually distributes the data over several
physical hard disks is completely hidden to the server
• This is only visible to the administrator from outside.
• The hot spare disks are not used in normal operation. If a disk fails, the RAID
controller immediately begins to copy the data of the remaining intact disk
onto a hot spare disk.
• After the replacement of the defective disk this is included in the pool of hot
spare disks
• The RAID controller combines several physical hard disks to create a virtual
hard disk.
• The server sees only a single virtual hard disk. The controller hides the
assignment of the virtual hard disk to the individual physical hard disks.
• RAID has developed since its original definition in 1987. Due to technical
progress some RAID levels are now practically meaningless, whilst others
have been modified or added at a later date.
• RAID 0 distributes the data that the server writes to the virtual hard disk onto
one physical hard disk after another block-by-block
• RAID 0 increases the performance of the virtual hard disk as follows: the
individual hard disks can exchange data with the RAID controller via the I/O
channel significantly more quickly than they can write to or read from the
rotating disk
• RAID 0 increases the performance of the virtual hard disk, but not its fault-
tolerance.
• If a physical hard disk is lost, all the data on the virtual hard disk is lost.
• To be precise, therefore, the ‘R’ for ‘Redundant’ in RAID is incorrect in the
case of RAID 0, with ‘RAID 0’ standing instead for ‘zero redundancy’.
• In contrast to RAID 0, in RAID 1 fault-tolerance is of primary importance.
• The basic form of RAID 1 brings together two physical hard disks to form a
virtual hard disk by mirroring the data on the two physical hard disks.
• If the server writes a block to the virtual hard disk, the RAID controller writes
this block to both physical hard disks
• The individual copies are also called mirrors. Normally, two or sometimes
three copies of the data are kept (three-way mirror).
• When writing with RAID 1 it tends to be the case that reductions in
performance may even have to be taken into account.
• This is because the RAID controller has to send the data to both hard disks.
• This disadvantage can be disregarded for an individual write operation
• The problem with RAID 0 and RAID 1 is that they increase either
performance (RAID 0) or fault-tolerance (RAID 1).
• In RAID 10 (striped mirrors) the sequence of RAID 0 (striping) and RAID 1
(mirroring) is reversed in relation to RAID 0+ 1 (mirrored stripes).
• Principle behind RAID 0+ 1 (mirrored stripes). In the example, eight physical
hard disks are used.
• The RAID controller initially brings together each four physical hard disks to
form a total of two virtual hard disks that are only visible within the RAID
controller by means of RAID 0 (striping).
• In the second level, it consolidates these two virtual hard disks into a single
virtual hard disk by means of RAID 1 (mirroring); only this virtual hard disk is
visible to the server.
• The principle underlying RAID 10 based again on eight physical hard disks.
• In RAID 10 the RAID controller initially brings together the physical hard
disks in pairs by means of RAID 1 (mirroring) to form a total of four virtual
hard disks that are only visible within the RAID controller.
• In the second stage, the RAID controller consolidates these four virtual hard
disks into a virtual hard disk by means of RAID 0 (striping). Here too, only
this last virtual hard disk is visible to the server.
T1 : M. Tamer Özsu • Patrick Valduriez Principles
of Distributed Database Systems Third Edition
T2 : Distributed Operating Systems: Concepts And
Design By Pradeep K. Sinha, PHI Learning Pvt.
Ltd
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
• RAID 4 AND RAID 5
• RAID 6: DOUBLE PARITY
• RAID 2
• RAID 3
• Comparison of the raid levels
• Basic forms of storage
• RAID 10 provides excellent performance at a high level of fault-tolerance.
The problem with this is that mirroring using RAID 1 means that all data is
written to the physical hard disk twice.
• RAID 10 thus doubles the required storage capacity.
• The idea of RAID 4 and RAID 5 is to replace all mirror disks of RAID 10 with
a single parity hard disk.
• RAID controller calculates a parity block for every four blocks and writes this
onto the fifth physical hard disk.
• For example, the RAID controller calculates the parity block PABCD for the
blocks A, B, C and D.
• If one of the four data disks fails, the RAID controller can reconstruct the
data of the defective disks using the three other data disks and the parity
disk.
• From a mathematical point of view the parity block is calculated with the aid
of the logical XOR operator (Exclusive OR).
• For example, the equation PABCD = A XOR B XOR C XOR D applies.
• The space saving offered by RAID 4 and RAID 5, comes at a price in relation
to RAID 10.
• Changing a data block changes the value of the associated parity block.
This means that each write operation to the virtual hard disk requires
(1) the physical writing of the data block,
(2) the recalculation of the parity block and
(3) the physical writing of the newly calculated parity block.
This extra cost for write operations in RAID 4 and RAID 5 is called the write
penalty of RAID 4 or the write penalty of RAID 5
• RAID 4 (parity disk) is designed to reduce the storage requirement of RAID
0+1 and RAID 10.
• In the example, the data blocks are distributed over four physical hard disks
by means of RAID 0 (striping).
• Instead of mirroring all data once again, only a parity block is stored for each
four blocks.
• RAID 6 offers a compromise between RAID 5 and RAID 10 by adding a
second parity hard disk to extend RAID 5, which then uses less storage
capacity than RAID 10.
• There are different approaches available today for calculating the two parity
blocks of a parity group.
• However, none of these procedures has been adopted yet as an industry
standard.
• Irrespective of an exact procedure, RAID 6 has a poor write performance
because the write penalty for RAID 5 strikes twice
• The early work on RAID began at a time when disks were not yet very
reliable bit errors were possible that could lead to a written ‘one’ being read
as ‘zero’ or a written ‘zero’ being read as ‘one’.
• In RAID 2 the Hamming code is used, so that redundant information is stored
in addition to the actual data.
• This additional data permits the recognition of read errors and to some
degree also makes it possible to correct them.
• Today, comparable functions are performed by the controller of each
individual hard disk, which means that RAID 2 no longer has any practical
significance.
• Like RAID 4 or RAID 5, RAID 3 stores parity data. RAID 3 distributes the
data of a block amongst all the disks of the RAID 3 system so that, in
contrast to RAID 4 or RAID 5, all disks are involved in every read or write
access. RAID 3 only permits the reading and writing of whole blocks, thus
dispensing with the write penalty that occurs in RAID 4 and RAID 5.
• The writing of individual blocks of a parity group is thus not possible.
• In addition, in RAID 3 the rotation of the individual hard disks is synchronised
so that the data of a block can truly be written simultaneously.
• RAID 3 was for a long time called the recommended RAID level for
sequential write and read profiles such as data mining and video processing.
• Current hard disks come with a large cache of their own, which means that
they can temporarily store the data of an entire track, and they have
significantly higher rotation speeds than the hard disks of the past.
• As a result of these innovations, other RAID levels are now suitable for
sequential load profiles, meaning that RAID 3 is becoming less and less
important.
There are three basic forms of Storage
1. Direct access storage (DAS)
2. Network attached storage (NAS)
3. Storage area network (SAN)
And a number of variations on each (especially the last two)
Source Courtesy: Some of the contents of this PPT are sourced from materials provided by Publishers of T1 & T2
Text Books
T1 : M. Tamer Özsu • Patrick Valduriez Principles
of Distributed Database Systems Third Edition
T2 : Distributed Operating Systems: Concepts And
Design By Pradeep K. Sinha, PHI Learning Pvt.
Ltd
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
• File • File Replication
• File System • Fault Tolerance
• A distributed file system • File Security
• File Models • Authentication
– Unstructured and Structured Files – Approaches to Authentication
– Mutable and immutable Files – Proof by knowledge
• File Accessing Models – Proof by possession
– Accessing Remote Files – Proof by property
– Unit of Data Transfer • Access Control
• File Sharing Semantics
• File Caching Schemes
In a computer system, a file is a named object that comes into
existence by explicit creation, is immune to temporary failures in the system,
and persists until explicitly destroyed. The two main purposes of using files are
as follows:
1. Permanent storage of information. This is achieved by storing a file on a
secondary storage media such as a magnetic disk.
2. Sharing of information. Files provide a natural and easy means of
information sharing. That is, a file can be created by one application and
then shared with different applications at a later time.
• A file system is a subsystem of an operating system that performs file
management activities such as organization, storing, retrieval, naming,
sharing, and protection of files.
• It is designed to allow programs to use a set of operations that characterize
the file abstraction and free the programmers from concerns about the
details of space allocation and layout of the secondary storage device.
• Therefore, a file system provides an abstraction of a storage device; that is, it
is a convenient mechanism for storing and retrieving information from the
storage device,
• A distributed file system provides similar abstraction to the users of a
distributed system and makes it convenient for them to use files in a
distributed environment.
• In addition to the advantages of permanent storage and sharing of
information provided by the file system of a single-processor system, a
distributed file system normally supports the following:
1. Remote information sharing.
2. User mobility
3. Availability
4. Diskless workstations
A distributed file system typically provides the following three types of
services:
1. Storage Service
2. True File Service
3. Name Service
A good Distributed file system should have the following features
ü Transparency
ü User Mobility
ü Performance
ü Simplicity and Ease of Use
ü Scalability
ü High Availability
ü High Reliability
ü Data Integrity
ü Security
ü Heterogeneity
• Different file systems use different conceptual models of a file.
• The two most commonly used criteria for file modeling are
– Structure
– Modifiability.
• A file is an unstructured sequence of data.
• In this model, there is no substructure known to the file server and the
contents of each file of the file system appears to the file server as an
uninterpreted sequence of bytes.
• The operating system is not interested in the information stored in the files.
• Hence, the interpretation of the meaning and structure of the data stored in
the files are entirely up to the application programs.
• UNIX and MS-DOS use this file model.
• Another file model that is rarely used nowadays is the structured file model.
• In this model, a file appears to the file server as an ordered sequence of
records.
• Structured files are again of two types
– Files with non-indexed records and
– Files with indexed records.
• In the non-indexed model, a file record is accessed by specifying its position
within the file, for example, the fifth record from the beginning of the file or
the second record from the end of the file.
• In the indexed model, records have one or more key fields and can be
addressed by specifying the values of the key fields.
• In file systems that allow indexed records, a file is maintained as a B-tree or
other suitable data structure or a hash table is used to locate records quickly.
• According to the modifiability criteria, files are of two types
– Mutable
– Immutable.
• Most existing operating systems use the mutable file model.
• In this model, an update performed on a file overwrites on its old contents to
produce the new contents
• That is, a file is represented as a single stored sequence that is altered by
each update operation.
• On the other hand, some more recent file systems, such as the Cedar File
System (CFS) use the immutable file model.
• In this model, a file cannot be modified once it has been created except to be
deleted.
• The file versioning approach is normally used to implement file updates, and
each file is represented by a history of immutable versions.
• That is, rather than updating the same file, a new version of the file is
created each time a change is made to the file contents and the old version
is retained unchanged.
• The manner in which a client's request to access a file is serviced depends
on the file accessing model used by the file system.
• The file-accessing model of a distributed file system mainly depends on two
factors
– The method used for accessing remote files and
– The unit of data access.
• Notice that the data packing and communication overheads per message
can be significant.
• Therefore, if the remote service model is used, the file server interface and
the communication protocols must be designed carefully to minimize the
overhead of generating messages as well as the number of messages that
must be exchanged in order to satisfy a particular request.
2. Data-caching model
• In the remote service model, every remote file access request results in
network traffic.
• The data-caching model attempts to reduce the amount of network traffic by
taking advantage of the locality feature found in file accesses.
• In this model, if the data needed to satisfy the client's access request is not
present locally, it is copied from the server's node to the client’s node and is
cached there.
• The client's request is processed on the client's node itself by using the
cached data.
• Recently accessed data are retained in the cache for some time so that
repeated accesses to the same data can be handled locally.
• A replacement policy, such as the least recently used (LRU), is used to keep
the cache size bounded.
• In file systems that use the data-caching model, an important design issue is
to decide the unit of data transfer
• Unit of data transfer refers to the fraction (or its multiples) of a file data that is
transferred to and from clients as a result of a single read or write operation.
• A shared file may be simultaneously accessed by multiple users.
• In such a situation, an important design issue for any file system is to clearly
define when modifications of file data made by a user are observable by
other users.
• This is defined by the type of file sharing semantics adopted by a file system.
1. UNIX semantics.
• This semantics enforces an absolute time ordering on all operations and
ensures that every read operation on a file sees the effects of all previous
Write operations performed on that file
• In particular, writes to an open file by a user immediately become visible to
other users who have this file open at the same time.
• The UNIX semantics is commonly implemented in file systems for single
processor systems because it is the most desirable semantics and also
because it is easy to serialize all read/write requests.
UNIX semantics
2. Session semantics.
• A client opens a file, performs a series of read/write operations on the file,
and finally closes the file when he or she is done with the file.
• A session is a series of file accesses made between the open and close
operations.
• In session semantics, all changes made to a file during a session are initially
made visible only to the client process that opened the session and are
invisible to other remote processes who have the same file open
simultaneously.
• Once the session is closed, the changes made to the file are made visible to
remote processes only in later starting sessions.
• Already open instances of the file do not reflect these changes.
• With this approach, since shared files cannot be changed at all, the problem
of when to make the changes made to a file by a user visible to other users
simply disappears.
4. Transaction-like semantics.
• This semantics is based on the transaction mechanism, which is a high-level
mechanism for controlling concurrent access to shared, mutable data.
• A transaction is a set of operations enclosed in-between a pair of begin_
transaction and end transaction-like operations.
• The transaction mechanism ensures that the partial modifications made to
the shared data by a transaction will not be visible to other concurrently
executing transactions until the transaction ends
SSZG554 - Distributed Data Systems 5th Sep 2020 100 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Sharing Semantics
• Therefore, in multiple concurrent transactions operating on a file, the final file
content will be the same as if all the transactions were run in some
sequential order.
SSZG554 - Distributed Data Systems 5th Sep 2020 101 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
• File caching has been implemented in several file systems for centralized
time-sharing systems to improve file I/O performance
• In implementing a file-caching scheme for a centralized file system, one has
to make several key decisions, such as the granularity of cached data (large
versus small), cache size (large versus small, fixed versus dynamically
changing), and the replacement policy.
SSZG554 - Distributed Data Systems 5th Sep 2020 102 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
In addition to these issues, a file-caching scheme for a distributed file
system should also address the following key decisions:
1. Cache location
2. Modification propagation
3. Cache validation
SSZG554 - Distributed Data Systems 5th Sep 2020 103 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
• Cache location refers to the place where the cached data is stored.
• Assuming that the original location of a file is on its server's disk, there are
three possible cache locations in a distributed file system
ü Server’s Main Memory
ü Client’s Disk
ü Client’s Main Memory
SSZG554 - Distributed Data Systems 5th Sep 2020 104 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
Modification propagation
• Keeping file data cached at multiple client nodes consistent is an important
design issue in those distributed file systems that use client caching.
SSZG554 - Distributed Data Systems 5th Sep 2020 105 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
Write-through Scheme
• In this scheme, when a cache entry is modified, the new value is immediately
sent to the server for updating the master copy of the file.
• This scheme has two main advantages—high degree of reliability and
suitability for UNIX-like semantics.
• Since every modification is immediately propagated to the server having the
master copy of the file, the risk of updated data getting lost (when a client
crashes) is very low.
• A major drawback of this scheme is its poor write performance.
SSZG554 - Distributed Data Systems 5th Sep 2020 106 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
Delayed-Write Scheme
• Although the write-through scheme helps on reads, it does not help in
reducing the network traffic for writes.
• Therefore, to reduce network traffic for writes as well, some systems use the
delayed-write scheme.
• In this scheme, when a cache entry is modified, the new value is written only
to the cache and the client just makes a note that the cache entry has been
updated.
• Some time later, all updated cache entries corresponding to a file are
gathered together and sent to the server at a time.
SSZG554 - Distributed Data Systems 5th Sep 2020 107 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
Cache validation
• The modification propagation policy only specifies when the master copy of a
file at the server node is updated upon modification of a cache entry.
• It does not tell anything about when the file data residing in the cache of
other nodes is updated.
• Obviously, a client's cache entry becomes stale as soon as some other client
modifies the data corresponding to the cache entry in the master copy of the
file.
SSZG554 - Distributed Data Systems 5th Sep 2020 108 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Caching Schemes
• Therefore, it becomes necessary to verify if the data cached at a client node
is consistent with the master copy.
• If not, the cached data must be invalidated and the updated version of the
data must be fetched again from the server.
• There are basically two approaches to verify the validity of cached data
– the client initiated approach and
– the server-initiated approach
SSZG554 - Distributed Data Systems 5th Sep 2020 109 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Replication
• High availability is a desirable feature of a good distributed file system and
file replication is the primary mechanism for improving file availability.
• A replicated file is a file that has multiple copies, with each copy located on a
separate file server.
• Each copy of the set of copies that comprises a replicated file is referred to
as a replica of the replicated file
SSZG554 - Distributed Data Systems 5th Sep 2020 110 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Replication
The replication of data in a distributed system offers the following potential
benefits:
ü Increased Availability
ü Increased Reliability
ü Improved Response Time
ü Reduced Network Traffic
ü Improved System throughput
ü Better Scalability
ü Autonomous operation
SSZG554 - Distributed Data Systems 5th Sep 2020 111 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Fault Tolerance
• Various types of faults could harm the integrity of the data stored by such a
system.
• For instance, a processor loses the contents of its main memory in the event
of a crash.
• Such a failure could result in logically complete but physically incomplete file
operations, making the data that are stored by the file system inconsistent.
• Similarly, during a request processing, the server or client machine may
crash, resulting in the loss of state information of the file being accessed.
SSZG554 - Distributed Data Systems 5th Sep 2020 112 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Fault Tolerance
• This may have an uncertain effect on the integrity of file data.
• Also, other adverse environmental phenomena such as transient faults
(caused by electromagnetic fluctuations) or decay of disk storage devices
may result in the loss or corruption of data stored by a file system.
• A portion of a disk storage device is said to be “decayed” if the data on that
portion of the device are irretrievable.
SSZG554 - Distributed Data Systems 5th Sep 2020 113 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
• Computer systems store large amounts of information, some of which is
highly sensitive and valuable to their users.
• Users can trust the system and rely on it only if the various resources and
information of a computer system are protected against destruction and
unauthorized access.
• Obviously, the security requirements are different for different computer
systems depending on the environment in which they are supposed to
operate.
SSZG554 - Distributed Data Systems 5th Sep 2020 114 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
• For example, security requirements for systems meant to operate in a
military environment are different from those for systems that are meant to
operate in an educational environment.
• The security goals of a computer system are decided by its security policies,
and the methods used to achieve these goals are called security
mechanisms.
SSZG554 - Distributed Data Systems 5th Sep 2020 115 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
Irrespective of the operation environment, some of the common goals
of computer security are as follows
1. Secrecy. Information within the system must be accessible only to
authorized users
2. Privacy. Misuse of information must be prevented. That is, a piece
of information given to a user should be used only for the purpose
for which it was given.
SSZG554 - Distributed Data Systems 5th Sep 2020 116 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
3. Authenticity. When a user receives some data, the user must be able to
verify its authenticity. That is, the data arrived indeed from its expected sender
and not from any other source.
4. Integrity. Information within the system must be protected against accidental
destruction or intentional corruption by an unauthorized user.
SSZG554 - Distributed Data Systems 5th Sep 2020 117 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
• A total approach to computer security involves both external and internal
security.
• External security deals with securing the computer system against external
factors such as fires, floods, earthquakes, stolen disks/tapes, leaking out of
stored information by a person who has access to the information, and so on.
SSZG554 - Distributed Data Systems 5th Sep 2020 118 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
SSZG554 - Distributed Data Systems 5th Sep 2020 119 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
Internal security, on the other hand, mainly deals with the following
two aspects:
1. User authentication. Once a user is allowed physical access to the
computer facility, the user's identification must be checked by the system
before the user can actually use the facility.
2. Access control. A computer system contains many resources and several
types of information. Obviously, not all resources and information are
meant for all users.
SSZG554 - Distributed Data Systems 5th Sep 2020 120 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
File Security
• In fact, a secure system requires that at any time a subject (person or
program) should be allowed to access only those resources that it currently
needs to complete its task. This requirement is commonly referred to as the
need-to-know principle or the principle of least privilege.
SSZG554 - Distributed Data Systems 5th Sep 2020 121 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Authentication
• Authentication deals with the problem of verifying the identity of a user
(person or program) before permitting access to the requested resource.
• That is, an authentication mechanism prohibits the use of the system (or
some resource of the system) by unauthorized users by verifying the identity
of a user making a request.
• Authentication basically involves identification and verification.
SSZG554 - Distributed Data Systems 5th Sep 2020 122 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Authentication
• Identification is the process of claiming a certain identity by a user, while
verification is the process of verifying the user's claimed identity.
• Thus, the correctness of an authentication process relies heavily on the
verification procedure employed.
SSZG554 - Distributed Data Systems 5th Sep 2020 123 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Authentication
SSZG554 - Distributed Data Systems 5th Sep 2020 124 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Approaches to Authentication
• Proof by knowledge.
• Proof by Possession
• Proof by Property
SSZG554 - Distributed Data Systems 5th Sep 2020 125 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by knowledge
• In this approach, authentication involves verifying something that can only be
known by an authorized principal.
• Authentication of a user based on the password supplied by him or her is an
example of proof by knowledge.
• Authentication methods based on the concept of proof by knowledge are
again of two types
– Direct demonstration method and
– Challenge-response method.
SSZG554 - Distributed Data Systems 5th Sep 2020 126 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by knowledge
• In the direct demonstration method, a user claims his or her identity by
supplying information (like typing in a password) that the verifier checks
against pre stored information.
• On the other hand, in the challenge-response method, a user proves his or
her identity by responding correctly to the challenge questions asked by the
verifier.
• For instance, when signing up as a user, the user picks a function, for
example, x + 18.
SSZG554 - Distributed Data Systems 5th Sep 2020 127 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by knowledge
• For further security improvement, several functions may be used by the
same user.
• At the time of login, the function to be used will depend on when the login is
made.
• For example, a user may use seven different functions, one for each day of
the week.
• In another variation of this method, a list of questions such as what is the
name of your father, what is the name of your mother, what is the name of
your street on which your house is located, is maintained by the system.
SSZG554 - Distributed Data Systems 5th Sep 2020 128 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by knowledge
• When signing up as a user, the user has to reply to all these questions and
the system stores the answers.
• At login, the system asks one of these questions at random and verifies the
answer supplied by the user.
SSZG554 - Distributed Data Systems 5th Sep 2020 129 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by possession
• In this approach, a user proves his or her identity by producing some item
that can only be possessed by an authorized principal.
• The system is designed to verify the produced item to confirm the claimed
identity.
• For example, a plastic card with a magnetic strip on it that has a user
identifier number written on it in invisible, electronic form may be used as the
item to be produced by the user.
SSZG554 - Distributed Data Systems 5th Sep 2020 130 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by possession
• The user inserts the card in a slot meant for this purpose in the system's
terminal, which then extracts the user identifier number from the card and
checks to see if the card produced belongs to an authorized user.
• Obviously, security can be ensured only if the item to be produced is
unforgeable and safely guarded.
SSZG554 - Distributed Data Systems 5th Sep 2020 131 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by property
• In this approach, the system is designed to verify the identity of a user by
measuring some physical characteristics of the user that are hard to forge.
• The measured property must be distinguishing, that is, unique among all
possible users.
• For example, a special device (known as a biometric device) may be
attached to each terminal of the system that verifies some physical
characteristic of the user, such as the person's appearance, fingerprints,
hand geometry, voice, signature.
SSZG554 - Distributed Data Systems 5th Sep 2020 132 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Proof by property
• In deciding the physical characteristic to be measured, an important factor to
be considered is that the scheme must be phycologically acceptable to the
user community.
• Biometric systems offer the greatest degree of confidence that a user
actually is who he or she claims to be, but they are also generally the most
expensive to implement.
• Moreover, they often have user acceptance problems because users see
biometric devices as unduly intrusive.
SSZG554 - Distributed Data Systems 5th Sep 2020 133 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• Of the three authentication approaches described above, proof by
knowledge and possession can be applied to all types of authentication
needs in a secure distributed system, while proof by property is generally
limited to the authentication of human users by a system equipped with
specialized measuring instruments.
• Moreover, in practice, a system may use a combination of two or more of
these authentication methods.
SSZG554 - Distributed Data Systems 5th Sep 2020 134 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• For example, the authentication mechanism used by automated cash-
dispensing machines usually employ a combination of the first two
approaches.
• That is, a user is allowed to withdraw money only if he or she produces a
valid identification card and specifies the correct password corresponding to
the identification number on the card.
SSZG554 - Distributed Data Systems 5th Sep 2020 135 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Access Control
• Once a user or a process has been authenticated, the next step in security is
to devise ways to prohibit the user or the process from accessing those
resources/information that he or she or it is not authorized to access.
• This issue is called authorization and is dealt with by using access control
mechanisms.
• Access control mechanisms (also known as protection mechanisms) used in
distributed systems are basically the same as those used in centralized
systems.
SSZG554 - Distributed Data Systems 5th Sep 2020 136 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Access Control
• The main difference is that since all resources are centrally located in a
centralized system, access control can be performed by a central authority.
• However, in a distributed client-server environment, each server is
responsible for controlling access to its own resources.
SSZG554 - Distributed Data Systems 5th Sep 2020 137 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Access Control
Objects
• An object is an entity to which access must be controlled.
• An object may be an abstract entity, such as a process, a file, a database, a
semaphore, a tree data structure, or a physical entity, such as a CPU, a
memory segment, a printer, a card reader, a tape drive, a site of a network.
• An object is referenced by its unique name. In addition, associated with each
object is a “type” that determines the set of operations that may be
performed on it.
SSZG554 - Distributed Data Systems 5th Sep 2020 138 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Access Control
• For example, the set of operations possible on objects belonging to the type
“data file” may be Open, Close, Create, Delete, Read, and Write, whereas
for objects belonging to the type “program file,” the set of possible operations
may be Read, Write, and Execute.
• Similarly, for objects of type “semaphore,” the set of possible operations may
be Up and Down, and for objects of type "tape drive,” the set of possible
operations may be Read, Write, and Rewind.
SSZG554 - Distributed Data Systems 5th Sep 2020 139 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Access Control
Subjects
• A subject is an active entity whose access to objects must be controlled.
• That is, entities wishing to access and perform operations on objects and to
which access authorizations are granted are called subjects.
• Examples of subjects are processes and users.
• Note that subjects are also objects since they too must be protected.
Therefore, each subject also has a unique name.
SSZG554 - Distributed Data Systems 5th Sep 2020 140 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Access Control
Protection rules.
• Protection rules define the possible ways in which subjects and objects are
allowed to interact.
• Therefore, associated with each (subject, object) pair is an access right that
defines the subset of the set of possible operations for the object type that
the subject may perform on the object.
• The complete set of access rights of a system defines which subjects can
perform what operations on which objects.
• At any particular instance of time, this set defines the protection state of the
system at that time.
SSZG554 - Distributed Data Systems 5th Sep 2020 141 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
THANK YOU
SSZG554 - Distributed Data Systems 5th Sep 2020 142 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani Presentation
BITS Pilani Anil Kumar Ghadiyaram
Pilani|Dubai|Goa|Hyderabad
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Source Courtesy: Some of the contents of this PPT are sourced from materials provided by
Text Books
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
Note: In order to broaden understanding of concepts as applied to Indian IT
industry, students are
advised to refer books of their choice and case-studies in their own organizations
SSZG554 - Distributed Data Systems 12th Sep 2020 14 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
PRESENTATIONOVERVIEW
1. Level of sharing
2. Behavior of access patterns
3. Level of knowledge on access pattern behavior
• Along the second dimension of access pattern behavior, it is possible to identify two
alternatives.
• The access patterns of user requests may be static, so that they do not change over
time, or dynamic
• The significant question, then, is not whether a system is static or dynamic, but how
dynamic it is
• The third dimension of classification is the level of knowledge about the access
• pattern behavior. One possibility, of course, is that the designers do not have any
• information about how users will access the database.
• The more practical alternatives are that the designers have complete information,
where the access patterns can reasonably be predicted and do not deviate
significantly from these predictions, or partial information, where there are
deviations from the predictions
• The view design activity deals with defining the interfaces for end users.
• The conceptual design , on the other hand, is the process by which the enterprise is
examined to determine entity types and relationships among these entities
• This process can be divided into two related activity groups
– Entity analysis and
– Functional analysis.
• Entity analysis is concerned with determining the entities, their attributes, and the
relationships among them.
• Functional analysis is concerned with determining the fundamental functions with
which the modeled enterprise is involved.
• The results of these two steps need to be cross-referenced to get a better
understanding of which functions deal with which entities.
• There is a relationship between the conceptual design and the view design.
• The conceptual design can be interpreted as being an integration of user views.
• Even though this view integration activity is very important, the conceptual model
should support not only the existing applications, but also future applications.
• View integration should be used to ensure that entity and relationship
requirements for all the views are covered in the conceptual schema.
• In conceptual design and view design activities the user needs to specify the data
entities and must determine the applications that will run on the database as well
as statistical information about these applications.
• Statistical information includes the specification of the frequency of user
applications, the volume of various information, and like
• Till here we have not yet considered the implications of the distributed
environment
• The global conceptual schema (GCS) and access pattern information collected as a
result of view design are inputs to the distribution design step.
• The objective is to design the local conceptual schemas (LCSs) by distributing the
entities over the sites of the distributed system.
• Rather than distributing relations, it is quite common to divide them into
subrelations, called fragments , which are then distributed.
• Thus, the distribution design activity consists of two steps: fragmentation and
allocation .
• The last step in the design process is the physical design, which maps the local
conceptual schemas to the physical storage devices available at the corresponding
sites.
• The inputs to this process are the local conceptual schema and the access pattern
information about the fragments in them.
• It is well known that design and development activity of any kind is an ongoing
process requiring constant monitoring and periodic adjustment and tuning handled
by observation and monitoring as a major activity in this process.
• Not only the behavior of the database implementation but also the suitability of
user views.
• The result is some form of feedback, which may result in backing up to one of the
earlier steps in the design.
• Second, if the applications that have views defined on a given relation reside at
different sites, two alternatives can be followed, with the entire relation being the
unit of distribution.
• Either the relation is not replicated and is stored at only one site, or it is replicated
at all or some of the sites where the applications reside.
• Finally, the decomposition of a relation into fragments, each being treated as a unit,
permits a number of transactions to execute concurrently.
• In addition, the fragmentation of relations typically results in the parallel execution
of a single query by dividing it into a set of subqueries that operate on fragments.
• Thus fragmentation typically increases the level of concurrency and therefore the
system throughput.
• Relation instances are essentially tables, so the issue is one of finding alternative
ways of dividing a table into smaller ones.
• There are clearly two alternatives for this:
– dividing it horizontally or
– dividing it vertically
• The degree of fragmentation goes from one extreme, that is, not to fragment at all,
to the other extreme, to fragment to the level of individual tuples (in the case of
horizontal fragmentation) or to the level of individual attributes (in the case of
vertical fragmentation).
• Level can only be defined with respect to the applications that will run on the
database. The issue is, how?
• In general, the applications need to be characterized with respect to a number of
parameters.
• According to the values of these parameters, individual fragments can be identified
• Assuming that the database is fragmented properly, one has to decide on the
allocation of the fragments to various sites on the network.
• When data are allocated, it may either be replicated or maintained as a single copy.
• The reasons for replication are reliability and efficiency of read-only queries.
• If there are multiple copies of a data item, there is a good chance that some copy of
the data will be accessible somewhere even when system failures occur.
• On the other hand, the execution of update queries cause trouble since the system
has to ensure that all the copies of the data are updated properly
• One aspect of distribution design is that too many factors contribute to an optimal
design.
• The logical organization of the database, the location of the applications, the access
characteristics of the applications to the database, and the properties of the
computer systems at each site all have an influence on distribution decisions
• The information needed for distribution design can be divided into four categories:
– Database information
– Application information
– Communication network information
– Computer system information
where Fi is the selection formula used to obtain fragment Ri also called the
fragmentation predicate
• The decomposition of relation PROJ into horizontal fragments PROJ1 and PROJ2 is
defined as follows:
• If a new tuple with a BUDGET value of, say, $600,000 were to be inserted into PROJ,
one would have had to review the fragmentation to decide if the new tuple is to go
into PROJ2 or if the fragments need to be revised and a new fragment needs to be
defined as
• Consider relation PROJ we can define the following horizontal fragments based on
the project location.
• The resulting fragments are
• PROJ relation is partitioned vertically into two subrelations, PROJ1 and PROJ2 .
• PROJ1 contains only the information about project budgets, whereas PROJ2
contains project names and locations.
• It is important to notice that the primary key to the relation (PNO) is included in
both fragments
• Consider relation PROJ Assume that the following applications are defined to run on
this relation.
• According to these four applications, the attribute usage values can be defined.
• As a notational convenience, we let
– A1 = PNO,
– A2 = PNAME,
– A3 = BUDGET, and
– A4 = LOC.
• Attribute usage values are not sufficiently general to form the basis of attribute
splitting and fragmentation.
• This is because these values do not represent the weight of application frequencies.
• The frequency measure can be included in the definition of the attribute affinity
measure aff(Ai;Aj) , which measures the bond between two attributes of a relation
according to how they are accessed by applications
• where re fl(qk) is the number of accesses to attributes (Ai;Aj) for each execution of
application qk at site Sl and accl(qk) is the application access frequency measure
previously defined and modified to include frequencies at different sites.
• Assume that there are a set of fragments F = {F1;F2; : : : ;Fn} and a distributed
system consisting of sites S = {S1;S2; : : : ;Sm} on which a set of applications Q =
{q1;q2; : : : ;qq} is running.
• The allocation problem involves finding the “optimal” distribution of F to S.
Minimal cost
• The cost function consists of the cost of storing each Fi at a site Sj, the cost of
querying Fi at site Sj , the cost of updating Fi at all sites where it is stored, and the
cost of data communication.
• The allocation problem, then, attempts to find an allocation scheme that minimizes
a combined cost function.
• Why such models are not available ?
Performance
• The allocation strategy is designed to maintain a performance metric.
• Two well-known ones are to minimize the response time and to maximize the
system throughput at each site.
• It is apparent that the “optimality” measure should include both the performance
and the cost factors.
• In other words, one should be looking for an allocation scheme that, for example,
answers user queries in minimal time while keeping the cost of processing minimal.
• A similar statement can be made for throughput maximization.
• In local-as-view (LAV) systems, the GCS definition exists, and each LCS is treated as a
view definition over it.
• In global-as-view systems (GAV), on the other hand, the GCS is defined as a set of
views over the LCSs.
• These views indicate how the elements of the GCS can be derived, when needed,
from the elements of LCSs
• In the first step, the component database schemas are translated to a common
intermediate canonical representation
• The use of a canonical representation facilitates the translation process by reducing
the number of translators that need to be written.
• The choice of the canonical model is important it should be one that is sufficiently
expressive to incorporate the concepts available in all the databases that will later
be integrated.
• Alternatives that have been used include the entity-relationship model, object-
oriented model or a graph that may be simplified to a tree
• The translation step is necessary only if the component databases are
heterogeneous and local schemas are defined using different data models
• In the second step of bottom-up design, the intermediate schemas are used to
generate a GCS. In some methodologies, local external (or export) schemas are
considered for integration rather than full database schemas
• Local systems may only be willing to contribute some of their data to the
multidatabase
• Aside from schema heterogeneity, other issues that complicate the matching
process are the following
– Insufficient schema and instance information
– Unavailability of schema documentation
– Subjectivity of matching
Subjectivity of matching:
• Finally, we need to note (and admit) that matching schema elements can be highly
subjective; two designers may not agree on a single “correct” mapping.
• This makes the evaluation of a given algorithm’s accuracy significantly difficult.
• Once schema matching is done, the correspondences between the various LCSs
have been identified.
• The next step is to create the GCS, and this is referred to as schema integration.
• This step is only necessary if a GCS has not already been defined and matching was
performed on individual LCSs.
• If the GSC was defined up-front, then the matching step would determine
correspondences between it and each of the LCSs and there would be no need for
the integration step.
• Nary integration mechanisms integrate more than two schemas at each iteration.
• One-pass integration occurs when all schemas are integrated at once, producing
the global conceptual schema after one iteration
• Once a GCS (or mediated schema) is defined, it is necessary to identify how the
data from each of the local databases (source) can be mapped to GCS (target) while
preserving semantic consistency as defined by both the source and the target
• In the case of data warehouses, schema mappings are used to explicitly extract data
from the sources, and translate them to the data warehouse schema for populating
it.
• In the case of data integration systems, these mappings are used in query
processing phase by both the query processor and the wrappers
• There are two issues related to schema mapping that we will be studying mapping
creation, and mapping maintenance.
• Mapping creation is the process of creating explicit queries that map data from a
local database to the global data.
• Mapping maintenance is the detection and correction of mapping inconsistencies
resulting from schema evolution.
• Errors in source databases can always occur, requiring cleaning in order to correctly
answer user queries.
• Data cleaning is a problem that arises in both data warehouses and data integration
systems, but in different contexts.
• In data warehouses where data are actually extracted from local operational
databases and materialized as a global database, cleaning is performed as the
global database is created.
• In the case of data integration systems, data cleaning is a process that needs to be
performed during query processing when data are returned from the source
databases
10
SSZG554 - Distributed Data Systems 12th Sep 2020 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Data Cleaning
• The errors that are subject to data cleaning can generally be broken down into
either schema-level or instance-level concerns
• Schema-level problems can arise in each individual LCS due to violations of explicit
and implicit constraints.
10
SSZG554 - Distributed Data Systems 12th Sep 2020 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Data Cleaning
• Instance level errors are those that exist at the data level.
• For example, the values of some attributes may be missing although they were
required, there could be misspellings and word transpositions or differences in
abbreviations embedded values that were erroneously placed in other fields,
duplicate values, and contradicting values
10
SSZG554 - Distributed Data Systems 12th Sep 2020 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
THANK YOU
10
SSZG554 - Distributed Data Systems 12th Sep 2020 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani Presentation
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Anil Kumar Ghadiyaram
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
T1 : M. Tamer Özsu • Patrick Valduriez Principles
of Distributed Database Systems Third Edition
T2 : Distributed Operating Systems: Concepts And
Design By Pradeep K. Sinha, PHI Learning Pvt.
Ltd
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
• Introduction • Primary Copy with Full Replication
• Consistency of Replicated Databases Transparency
• Mutual Consistency – Eager Distributed Protocols
• Mutual Consistency versus Transaction Consistency – Lazy Centralized Protocols
• Update Management Strategies • Single Master with Limited Transparency
– Eager Update Propagation • Single Master or Primary Copy with Full
– Lazy Update Propagation Replication Transparency
– Centralized Techniques – Lazy Distributed Protocols
– Distributed Techniques • Replication and Failures
• Replication Protocols • Failures and Lazy Replication
– Eager Centralized Protocols • Failures and Eager Replication
• Single Master with Limited Replication – Available copies protocol
Transparency – Distributed ROWA-A protocol
• Single Master with Full Replication • Replication Mediator Service
Transparency
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 249 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Introduction
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 250 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Consistency of Replicated Databases
• There are two issues related to consistency of a replicated database
• One is mutual Consistency that deals with the convergence of the values of
physical data items corresponding to one logical data item
• Second is transaction consistency
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 251 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency
• Mutual consistency criteria for replicated databases can either be strong or
weak
• Strong mutual consistency criteria require that all copies of a data item have
the same value at the end of the execution of an update transaction.
• This is achieved by a variety of means, but the execution of 2PC at the
commit point of an update transaction is a common way to achieve strong
mutual consistency
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 252 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency
• Weak mutual consistency criteria do not require the values of replicas of a
data item to be identical when an update transaction terminates.
• What is required is that, if the update activity ceases for some time, the
values eventually become identical.
• This is commonly referred to as eventual consistency, which refers to the fact
that replica values may diverge over time, but will eventually converge.
• It is hard to define this concept formally or precisely
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 253 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency versus Transaction Consistency
• Mutual consistency refers to the replicas converging to the same value, while
transaction consistency requires that the global execution history be
serializable.
• It is possible for a replicated DBMS to ensure that data items are mutually
consistent when a transaction commits, but the execution history may not be
globally serializable.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 254 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency versus Transaction Consistency
• Consider three sites (A, B, and C) and three data items (x;y; z ) that are
distributed as follows:
– Site A hosts x ,
– Site B hosts x;y ,
– Site C hosts x; y; z .
• We will use site identifiers as subscripts on the data items to refer to a
particular replica.
• Now consider the following three transactions:
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 255 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency versus Transaction Consistency
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 256 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency versus Transaction Consistency
• Note that T1 ’s Write has to be executed at all three sites (since x is
replicated at all three sites)
• T2 ’s Write has to be executed at B and C, and
• T3 ’s Write has to be executed only at C.
• We are assuming a transaction execution model where transactions can
read their local replicas, but have to update all of the replicas.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 257 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency versus Transaction Consistency
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 258 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Mutual Consistency versus Transaction Consistency
• The serialization order in HB is T1 -> T2
while in HC it is T2 -> T3 -> T1 .
• Therefore, the global history is not serializable. However, the database is
mutually consistent.
• Assume, for example, that initially xA =xB =xC = 10; yB =yC = 15, and zC =
7.
• With the above histories, the final values will be xA =xB =xC = 20; yB =yC =
35; zC = 3: 5.All the physical copies (replicas) have indeed converged to the
same value
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 259 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Update Management Strategies
• The replication protocols can be classified according to when the updates
are propagated to copies (eager versus lazy) and where updates are allowed
to occur (centralized versus distributed).
• These two decisions are generally referred to as update management
strategies
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 260 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Update Propagation
• The eager update propagation approaches apply the changes to all the
replicas within the context of the update transaction.
• Eager propagation may use Synchronous propagation of each update by
applying it on all the replicas at the same time when the Write is issued
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 261 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Update Propagation
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 262 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Update Propagation
• Eager techniques typically enforce strong mutual consistency criteria.
• Since all the replicas are mutually consistent at the end of an update
transaction, a subsequent read can read from any copy
• However, a Write(x) has to be applied to all
• Thus, protocols that follow eager update propagation are known as read-
one/write-all (ROWA) protocols.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 263 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Update Propagation
• The advantages of eager update propagation are threefold.
• First, they typically ensure that mutual consistency is enforced therefore,
there are no transactional inconsistencies.
• Second, a transaction can read a local copy of the data item (if a local copy
is available) and be certain that an up-to-date value is read. Thus, there is no
need to do a remote read.
• Finally, the changes to replicas are done atomically; thus recovery from
failures can be managed.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 264 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Update Propagation
• The main disadvantage of eager update propagation is that a transaction has
to update all the copies before it can terminate.
• First, the response time performance of the update transaction suffers, since
it typically has to participate in a 2PC execution, and because the update
speed is restricted by the slowest machine.
• Second, if one of the copies is unavailable, then the transaction cannot
terminate since all the copies need to be updated.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 265 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Update Propagation
• In lazy update propagation the replica updates are not all performed within
the context of the update transaction.
• The transaction does not wait until its updates are applied to all the copies
before it commits – it commits as soon as one replica is updated.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 266 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Update Propagation
• The propagation to other copies is done asynchronously from the original
transaction, by means of refresh transactions that are sent to the replica
sites some time after the update transaction commits.
• A refresh transaction carries the sequence of updates of the corresponding
update transaction
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 267 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Update Propagation
• Lazy propagation is used in those applications for which strong mutual
consistency may be unnecessary and too restrictive.
• These applications may be able to tolerate some inconsistency among the
replicas in return for better performance
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 268 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Update Propagation
• The primary advantage of lazy update propagation techniques is that they
generally have lower response times for update transactions, since an
update transaction can commit as soon as it has updated one copy.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 269 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Update Propagation
• The disadvantages are that the replicas are not mutually consistent and
some replicas may be out-of-date, and, consequently, a local read may read
stale data and does not guarantee to return the up-to-date value
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 270 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Centralized Techniques
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 271 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Centralized Techniques
• In some techniques, there is a single master for all replicated data these as
single master centralized techniques.
• In other protocols, the master copy for each data item may be different and
are typically known as primary copy centralized techniques.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 272 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Centralized Techniques
• The advantages of centralized techniques are two-fold.
• First, application of the updates is easy since they happen at only the master
site, and they do not require synchronization among multiple replica sites.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 273 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Centralized Techniques
• Second, there is the assurance that at least one site – the site that holds the
master copy – has up-to-date values for a data item.
• These protocols are generally suitable in data warehouses and other
applications where data processing is centralized at one or a few master
sites.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 274 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Centralized Techniques
• The primary disadvantage is that, as in any centralized algorithm, if there is
one central site that hosts all of the masters, this site can be overloaded and
can become a bottleneck.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 275 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Distributed Techniques
• Distributed techniques apply the update on the local copy at the site where
the update transaction originates, and then the updates are propagated to
the other replica sites.
• These are called distributed techniques since different transactions can
update different copies of the same data item located at different sites.
• They can more evenly distribute the load, and may provide the highest
system availability if coupled with lazy propagation techniques.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 276 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Distributed Techniques
• A serious complication that arises in these systems is that different replicas
of a data item may be updated at different sites (masters) concurrently.
• If distributed techniques are coupled by eager propagation methods, then the
distributed concurrency control methods can adequately address the
concurrent updates problem
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 277 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication Protocols
• We discussed two dimensions along which update management techniques
can be classified.
• These dimensions are orthogonal therefore four combinations are possible:
– Eager Centralized
– Eager Distributed
– Lazy Centralized
– Lazy Distributed
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 278 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Centralized Protocols
• In eager centralized replica control, a master site controls the operations on
a data item.
• These protocols are coupled with strong consistency techniques, so that
updates to a logical data item are applied to all of its replicas within the
context of the update transaction, which is committed using the 2PC protocol
• Consequently, once the update transaction completes, all replicas have the
same values for the updated data items
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 279 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Limited Replication Transparency
• The simplest case is to have a single master for the entire database with
limited replication transparency so that user applications know the master
site.
• In this case, global update transactions are submitted directly to the
transaction manager (TM) at the master site.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 280 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Limited Replication Transparency
• At the master, each Read(x) operation is performed on the master copy are
executed as follows:
• A read lock is obtained on data item, the read is performed, and the result is
returned to the user.
• Similarly, each Write(x) causes an update of the master copy by first
obtaining a write lock and then performing the write operation.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 281 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Limited Replication Transparency
• The master TM then forwards the Write to the slave sites either
synchronously or in a deferred fashion
• In either case, it is important to propagate updates such that conflicting
updates are executed at the slaves in the same order they are executed at
the master.
• This can be achieved by timestamping or by some other ordering scheme.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 282 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Limited Replication Transparency
(1) AWrite is applied on the master copy;
(2) Write is then propagated to the other replicas;
(3) Updates become permanent at commit time;
(4) Read-only transaction’s Read goes to any slave copy
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 283 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Full Replication Transparency
• Single master eager centralized protocols require each user application to
know the master site, and they put significant load on the master that has to
deal with the Read operations within update transactions as well as acting as
the coordinator for these transactions during 2PC execution.
• These issues can be addressed, to some extent, by involving, in the
execution of the update transactions, the TM at the site where the application
runs.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 284 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Full Replication Transparency
• Thus, the update transactions are not submitted to the master, but to the TM
at the site where the application runs
• This TM can act as the coordinating TM for both update and read-only
transactions.
• Applications can simply submit their transactions to their local TM, providing
full transparency.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 285 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Full Replication Transparency
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 286 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Primary Copy with Full Replication Transparency
• Each data item can have a different master.
• In this case, for each replicated data item, one of the replicas is designated
as the primary copy.
• Consequently, there is no single master to determine the global serialization
order
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 287 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Primary Copy with Full Replication Transparency
• In the case of fully replicated databases, any replica can be primary copy for
a data item, however for partially replicated databases, limited replication
transparency option only makes sense if an update transaction accesses
only data items whose primary sites are at the same site
• Otherwise, the application program cannot forward the update transactions
to one master; it will have to do it operation-by-operation, and, furthermore, it
is not clear which primary copy master would serve as the coordinator for
2PC execution.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 288 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Primary Copy with Full Replication Transparency
(1) Operations (Read orWrite) for each data item are routed to that data item’s master
and a Write is first applied at the master;
(2) Write is then propagated to the other replicas;
(3) Updates become permanent at commit time.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 289 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Distributed Protocols
• In eager distributed replica control, the updates can originate anywhere, and
they are first applied on the local replica, then the updates are propagated to
other replicas.
• If the update originates at a site where a replica of the data item does not
exist, it is forwarded to one of the replica sites, which coordinates its
execution.
• Again, all of these are done within the context of the update transaction, and
when the transaction commits, the user is notified and the updates are made
permanent.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 290 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Eager Distributed Protocols
(1) Two Write operations are applied on two local replicas of the same data item;
(2) The Write operations are independently propagated to the other replicas;
(3) Updates become permanent at commit time
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 291 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Centralized Protocols
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 292 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Centralized Protocols
• Consequently, if a slave site performs a Read(x) operation on its local copy, it
may read stale (non-fresh) data, since x may have been updated at the
master, but the update may not have yet been propagated to the slaves
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 293 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Limited Transparency
• In Single Master with Limited Transparency the update transactions are
submitted and executed directly at the master site (as in the eager single
master); once the update transaction commits, the refresh transaction is sent
to the slaves
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 294 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master with Limited Transparency
(1) Update is applied on the local replica;
(2) Transaction commit makes the updates permanent at the master;
(3) Update is propagate to the other replicas in refresh transactions;
(4) Transaction 2 reads from local copy.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 295 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Single Master or Primary Copy with Full Replication Transparency
• Alternatives that provide full transparency by allowing (both read and update)
transactions to be submitted at any site and forwarding their operations to
either the single master or to the appropriate primary master site.
• This involves two problems: the first is that, unless one is careful, 1SR global
history may not be guaranteed; the second problem is that a transaction may
not see its own updates.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 296 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Distributed Protocols
• Lazy distributed replication protocols are the most complex ones owing to
the fact that updates can occur on any replica and they are propagated to
the other replicas
• The operation of the protocol at the site where the transaction is submitted is
straightforward: both Read and Write operations are executed on the local
copy, and the transaction commits locally.
• Sometime after the commit, the updates are propagated to the other sites by
means of refresh transactions.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 297 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Lazy Distributed Protocols
(1) Two updates are applied on two local replicas;
(2) Transaction commit makes the updates permanent;
(3) The updates are independently propagated to the other replicas.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 298 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication and Failures
• Till now we have focused on replication protocols in the absence of any
failures.
• What happens to mutual consistency concerns if there are system failures?
• The handling of failures differs between eager replication and lazy replication
approaches.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 299 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Failures and Lazy Replication
• Dealing failures with lazy replication techniques is relatively easy since these
protocols allow divergence between the master copies and the replicas.
• Consequently, when communication failures make one or more sites
unreachable (the latter due to network partitioning), the sites that are
available can simply continue processing.
• Even in the case of network partitioning, one can allow operations to proceed
in multiple partitions independently and then worry about the convergence of
the database states upon repair using
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 300 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Failures and Lazy Replication
• Even in the case of network partitioning, one can allow operations to proceed
in multiple partitions independently and then worry about the convergence of
the database states upon repair using
• Before the merge, databases at multiple partitions diverge, but they are
reconciled at merge time.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 301 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Failures and Eager Replication
• All eager techniques implement some sort of ROWA protocol, ensuring that,
when the update transaction commits, all of the replicas have the same
value.
• ROWA family of protocols is attractive and elegant.
• It has one significant drawback even if one of the replicas is unavailable,
then the update transaction cannot be terminated.
• So, ROWA fails in meeting one of the fundamental goals of replication,
namely providing higher availability.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 302 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Failures and Eager Replication
• An alternative to ROWA which attempts to address the low availability
problem is the Read-One/Write-All Available (ROWA-A) protocol.
• The general idea is that the write commands are executed on all the
available copies and the transaction terminates.
• The copies that were unavailable at the time will have to “catch up” when
they become available.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 303 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Failures and Eager Replication
• There have been various versions of this protocol two of which are popular.
– Available copies protocol
– Distributed ROWA-A protocol
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 304 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Available copies protocol
• The coordinator of an update transaction Ti i.e., the master where the
transaction is executing sends each Wi(x) to all the slave sites where
replicas of x reside, and waits for confirmation of execution or rejection.
• If it times out before it gets acknowledgement from all the sites, it considers
those which have not replied as unavailable and continues with the update
on the available sites.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 305 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Available copies protocol
• The unavailable slave sites update their databases to the latest state when
they recover.
• These sites may not even be aware of the existence of Ti and the update to
x that Ti has made if they had become unavailable before Ti started.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 306 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Available copies protocol
• There are two complications that need to be addressed.
• The first one is the possibility that the sites that the coordinator thought were
unavailable were in fact up and running and may have already updated x
but their acknowledgement may not have reached the coordinator before its
timer ran out.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 307 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Available copies protocol
• Second, some of these sites may have been unavailable when Ti started
and may have recovered since then and have started executing transactions.
• Therefore, the coordinator undertakes a validation procedure before
committing Ti
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 308 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Distributed ROWA-A protocol
• In the distributed ROWA-A protocol each site S maintains a set, Vs , of sites
that it believes to be available this is the “view” that S has of the system
configuration.
• In particular, when a transaction Ti is submitted, its coordinator’s view
reflects all the sites that the coordinator knows to be available let it be
denoted as VC(Ti).
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 309 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Distributed ROWA-A protocol
• A Ri(x) is performed on any replica in Vc(Ti) and a Wi(x) updates all copies
in Vc(Ti) .
• The coordinator checks its view at the end of Ti , and if the view has changed
since Ti ’s start, then Ti is aborted.
• To modify V , a special atomic transaction is run at all sites, ensuring that no
concurrent views are generated.
• This can be achieved by assigning timestamps to each V when it is
generated and ensuring that a site only accepts a new view if its version
number is greater than the version number of that site’s current view.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 310 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication Mediator Service
• The replication protocols we have covered so far are suitable for tightly
integrated distributed database systems where we can insert the protocols
into each component DBMS.
• In multidatabase systems, replication support has to be supported outside
the DBMSs by mediators.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 311 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication Mediator Service
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 312 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication Mediator Service
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 313 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication Mediator Service
• This fact is exploited by NODO to overlap the total ordering of the transaction
request with the transaction execution at the master node, thus masking the
latency of total ordering.
• The protocol also executes transactions optimistically, and may abort them if
necessary.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 314 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication Mediator Service
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 315 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Replication Mediator Service
• Similar eager replication protocols have been proposed to support partial
replication , where copies can be stored at subsets of nodes
• Unlike full replication, partial replication increases access locality and
reduces the number of messages for propagating updates to replicas.
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 316 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
THANK YOU
SSZG554 - Distributed Data Systems Date : 26th Sep 2020 317 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani Presentation
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Anil Kumar Ghadiyaram
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Source Courtesy: Some of the contents of this PPT are sourced from materials provided by Publishers of T1 & T2
Text Books
T1 : M. Tamer Özsu • Patrick Valduriez Principles
of Distributed Database Systems Third Edition
T2 : Distributed Operating Systems: Concepts And
Design By Pradeep K. Sinha, PHI Learning Pvt.
Ltd
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
• Parallel Database Systems • Load Balancing
– Objectives – Parallel Execution Problems
– Functional Architecture – Intra-Operator Load Balancing
• Parallel DBMS Architectures – Inter-Operator Load Balancing
• Shared-Memory – Intra-Query Load Balancing
• Shared-Disk • Database Clusters
• Shared-Nothing
• Hybrid Architectures
– NUMA
– Cluster
• Parallel Data Placement
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 321 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Database Systems
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 322 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Database Systems
• The first kind of access is representative of On-Line Transaction Processing
(OLTP) applications while the second is representative of On-Line Analytical
Processing (OLAP) applications.
• Supporting very large databases efficiently for either OLTP or OLAP can be
addressed by combining parallel computing and distributed database
management
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 323 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Database Systems
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 324 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Objectives
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 325 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Objectives
• An important result, however, is in the general solution to the I/O bottleneck
is by increasing the I/O bandwidth through parallelism .
• For instance, if we store a database of size D on a single disk with
throughput T , the system throughput is bounded by T .
• On the contrary, if we partition the database across n disks, each with
capacity D=n and throughput T0 (hopefully equivalent to T ), we get an
ideal throughput of n * T0 that can be better consumed by multiple
processors (ideally n ).
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 326 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Objectives
• Note that the main memory database system solution which tries to maintain
the database in main memory, is complementary rather than alternative.
• In particular, the “memory access bottleneck” in main memory systems can
also be tackled using parallelism in a similar way.
• Therefore, parallel database system designers have strived to develop
software-oriented solutions in order to exploit parallel computers.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 327 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Objectives
• The objectives of parallel database systems are by those of distributed
DBMS Ideally, a parallel database system should provide the following
advantages
– High-performance
– High-availability
– Extensibility
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 328 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Functional Architecture
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 329 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 330 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Functional Architecture
Session Manager
• It plays the role of a transaction monitor, providing support for client
interactions with the server.
• In particular, it performs the connections and disconnections between the
client processes and the two other subsystems.
• Therefore, it initiates and closes user sessions which may contain multiple
transactions.
• In case of OLTP sessions, the session manager is able to trigger the
execution of pre-loaded transaction code within data manager modules.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 331 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Functional Architecture
Transaction Manager
• It receives client transactions related to query compilation and execution.
• It can access the database directory that holds all meta-information about
data and programs.
• The directory itself should be managed as a database in the server.
• Depending on the transaction, it activates the various compilation phases,
triggers query execution, and returns the results as well as error codes to the
client application.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 332 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Functional Architecture
Data Manager
• It provides all the low-level functions needed to run compiled queries in
parallel, i.e., database operator execution, parallel transaction support,
cache management, etc.
• If the transaction manager is able to compile dataflow control, then
synchronization and communication among data manager modules is
possible.
• Otherwise, transaction control and synchronization must be done by a
transaction manager module
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 333 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel DBMS Architectures
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 334 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel DBMS Architectures
• There are three basic parallel computer architectures depending on how
main memory or disk is shared:
– shared-memory
– shared-disk
– shared-nothing
• Hybrid architectures such as NUMA or cluster try to combine the benefits of
the basic architectures.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 335 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Memory
• In the shared-memory approach, any processor has access to any memory
module or disk unit through a fast interconnect
• All the processors are under the control of a single operating system.
• All shared-memory parallel database products today can exploit inter-query
parallelism to provide high transaction throughput and intra-query parallelism
to reduce response time of decision-support queries
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 336 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Memory
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 337 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Memory
• Shared-memory has two strong advantages: simplicity and load balancing.
• Since meta-information (directory) and control information (e.g., lock tables)
can be shared by all processors, writing database software is not very
different than for singleprocessor computers.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 338 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Memory
• In particular, inter-query parallelism comes for free.
• Intra-query parallelism requires some parallelization but remains rather
simple.
• Load balancing is easy to achieve since it can be achieved at run-time using
the shared-memory by allocating each new task to the least busy processor.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 339 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Memory
• Shared-memory has three problems: high cost, limited extensibility and low
availability.
• High cost is incurred by the interconnect that requires fairly complex
hardware because of the need to link each processor to each memory
module or disk.
• With faster processors (even with larger caches), conflicting accesses to the
shared-memory increase rapidly and degrade performance
• Therefore, extensibility is limited to a few tens of processors
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 340 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Memory
• Finally, since the memory space is shared by all processors, a memory fault
may affect most processors thereby hurting availability. The solution is to use
duplex memory with a redundant interconnect.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 341 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Disk
• In the shared-disk approach, any processor has access to any disk unit
through the interconnect but exclusive (non-shared) access to its main
memory.
• Each processor-memory node is under the control of its own copy of the
operating system.
• Then, each processor can access database pages on the shared disk and
cache them into its own memory.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 342 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Disk
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 343 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Disk
• Since different processors can access the same page in conflicting update
modes, global cache consistency is needed.
• This is typically achieved using a distributed lock manager
• The first parallel DBMS that used shared-disk is Oracle with an efficient
implementation of a distributed lock manager for cache consistency.
• Other major DBMS vendors such as IBM, Microsoft and Sybase provide
shared-disk implementations.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 344 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Disk
• Shared-disk has a number of advantages: lower cost, high extensibility, load
balancing, availability, and easy migration from centralized systems.
• The cost of the interconnect is significantly less than with shared-memory
since standard bus technology may be used.
• Given that each processor has enough main memory, interference on the
shared disk can be minimized.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 345 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Disk
• Thus, extensibility can be better, typically up to a hundred processors. Since
memory faults can be isolated from other nodes, availability can be higher.
• Finally, migrating from a centralized system to shared-disk is relatively
straightforward since the data on disk need not be reorganized.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 346 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Disk
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 347 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
• In the shared-nothing approach, each processor has exclusive access to its
main memory and disk unit(s).
• Similar to shared-disk, each processor memory-disk node is under the
control of its own copy of the operating system.
• Then, each node can be viewed as a local site (with its own database and
software) in a distributed database system.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 348 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 349 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 350 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 351 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
• Shared-nothing has three main virtues: lower cost, high extensibility, and
high availability.
• The cost advantage is better than that of shared-disk that requires a special
interconnect for the disks.
• By implementing a distributed database design that favors the smooth
incremental growth of the system by the addition of new nodes, extensibility
can be better (in the thousands of nodes).
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 352 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
• With careful partitioning of the data on multiple disks, almost linear speedup
and linear scaleup could be achieved for simple workloads. Finally, by
replicating data on multiple nodes, high availability can also be achieved.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 353 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
• Shared-nothing is much more complex to manage than either shared-
memory or shared-disk.
• Higher complexity is due to the necessary implementation of distributed
database functions assuming large numbers of nodes.
• In addition, load balancing is more difficult to achieve because it relies on the
effectiveness of database partitioning for the query workloads.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 354 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Shared-Nothing
• Unlike shared-memory and shared-disk, load balancing is decided based on
data location and not the actual load of the system.
• Furthermore, the addition of new nodes in the system presumably requires
reorganizing the database to deal with the load balancing issues.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 355 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hybrid Architectures
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 356 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hybrid Architectures
• There are two popular hybrid architectures:
– NUMA
– Cluster
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 357 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
NUMA
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 358 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
NUMA
• The objective of NUMA is to provide a shared-memory programming model
and all its benefits, in a scalable architecture with distributed memory.
• The term NUMA reflects the fact that an access to the (virtually) shared
memory may have a different cost depending on whether the physical
memory is local or remote to the processor
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 359 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
NUMA
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 360 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
NUMA
• The most successful class of NUMA multiprocessors is Cache Coherent
NUMA (CC-NUMA)
• With CC-NUMA, the main memory is physically distributed among the nodes
as with shared-nothing or shared-disk.
• However, any processor has access to all other processors’ memories
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 361 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
NUMA
• Most SMP manufacturers are now offering NUMA systems that can scale up
to a hundred processors.
• The strong argument for NUMA is that it does not require any rewriting of the
application software
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 362 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Cluster
• A cluster is a set of independent server nodes interconnected to share
resources and form a single system.
• The shared resources, called clustered resources, can be hardware such as
disk or software such as data management services.
• The server nodes are made of off-the-shelf components ranging from simple
PC components to more powerful SMP.
• Using many off-the-shelf components is essential to obtain the best
cost/performance ratio while exploiting continuing progress in hardware
components.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 363 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Cluster
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 364 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Cluster
• However, because each disk is directly connected to a computer via a bus,
adding or replacing cluster nodes requires disk and data reorganization.
• Shared-disk avoids such reorganization but requires disks to be globally
accessible by the cluster nodes.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 365 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Cluster
There are two main technologies to share disks in a cluster:
– Network-attached storage (NAS) and
– Storage-area network (SAN)
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 366 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Cluster
• A NAS is a dedicated device to shared disks over a network (usually TCP/IP)
using a distributed file system protocol such as Network File System (NFS).
• NAS is well suited for low throughput applications such as data backup and
archiving from PC’s hard disks.
• However, it is relatively slow and not appropriate for database management
as it quickly becomes a bottleneck with many nodes.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 367 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Cluster
• A storage area network (SAN) provides similar functionality but with a lower
level interface.
• For efficiency, it uses a block-based protocol thus making it easier to manage
cache consistency (at the block level).
• In fact, disks in a SAN are attached to the network instead to the bus as
happens in Directly Attached Storage (DAS)
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 368 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Cluster
• A cluster architecture has important advantages.
• It combines the flexibility and performance of shared-memory at each node
with the extensibility and availability of shared-nothing or shared-disk.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 369 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
• There are two important differences with the distributed database approach.
• First, there is no need to maximize local processing (at each node) since
users are not associated with particular nodes.
• Second, load balancing is much more difficult to achieve in the presence of a
large number of nodes.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 370 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
• The main problem is to avoid resource contention, which may result in the
entire system thrashing e.g., one node ends up doing all the work while the
others remain idle.
• Since programs are executed where the data reside, data placement is a
critical performance issue
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 371 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
• Data placement must be done so as to maximize system performance, which
can be measured by combining the total amount of work done by the system
and the response time of individual queries.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 372 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
• In terms of data placement, we have the following trade-off: maximizing
response time or inter-query parallelism leads to partitioning
• Whereas minimizing the total amount of work leads to clustering
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 373 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
• An alternative solution to data placement is full partitioning , whereby each
relation is horizontally fragmented across all the nodes in the system.
• There are three basic strategies for data partitioning:
– Round-Robin
– Hash
– Range partitioning
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 374 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 375 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 376 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
• Hash partitioning applies a hash function to some attribute that yields the
partition number.
• This strategy allows exact-match queries on the selection attribute to be
processed by exactly one node and all other queries to be processed by all
the nodes in parallel
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 377 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 378 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
• Range partitioning distributes tuples based on the value intervals (ranges) of
some attribute.
• In addition to supporting exact-match queries (as in hashing), it is well-suited
for range queries
• However, range partitioning can result in high variation in partition size.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 379 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Data Placement
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 380 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Load Balancing
• Good load balancing is crucial for the performance of a parallel system.
• The response time of a set of parallel operators is that of the longest one.
• Thus, minimizing the time of the longest one is important for minimizing
response time.
• Balancing the load of different transactions and queries among different
nodes is also essential to maximize throughput.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 381 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Load Balancing
• Although the parallel query optimizer incorporates decisions on how to
execute a parallel execution plan, load balancing can be hurt by several
problems incurring at execution time.
• Solutions to these problems can be obtained at the intra- and inter-operator
levels.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 382 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Execution Problems
• The principal problems introduced by parallel query execution are
– Initialization,
– Interference and
– Skew.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 383 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Execution Problems
Initialization.
• Before the execution takes place, an initialization step is necessary.
• This first step is generally sequential. It includes process (or thread) creation
and initialization communication initialization, etc.
• The duration of this step is proportional to the degree of parallelism and can
actually dominate the execution time of simple queries
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 384 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Execution Problems
Interferences
• A highly parallel execution can be slowed down by interference.
• Interference occurs when several processors simultaneously access the
same resource, hardware or software.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 385 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Parallel Execution Problems
Skew
• Load balancing problems can appear with intra-operator parallelism
(variation in partition size), namely data skew, and inter-operator parallelism
(variation in the complexity of operators).
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 386 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Intra-Operator Load Balancing
• Good intra-operator load balancing depends on the degree of parallelism
and the allocation of processors for the operator.
• For some algorithms, e.g., the parallel hash join algorithm, these parameters
are not constrained by the placement of the data.
• Thus, the home of the operator (the set of processors where it is executed)
must be carefully decided.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 387 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Intra-Operator Load Balancing
• The skew problem makes it hard for a parallel query optimizer to make this
decision statically (at compile-time) as it would require a very accurate and
detailed cost model.
• Therefore, the main solutions rely on adaptive or specialized techniques that
can be incorporated in a hybrid query optimizer.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 388 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Inter-Operator Load Balancing
• In order to obtain good load balancing at the inter-operator level, it is
necessary to choose, for each operator, how many and which processors to
assign for its execution.
• This should be done taking into account pipeline parallelism, which requires
inter operator communication.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 389 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Intra-Query Load Balancing
• Intra-query load balancing must combine intra- and inter-operator parallelism.
• To some extent, given a parallel architecture, the techniques for either intra-
or inter operator load balancing can be combined.
• In the context of hybrid systems such as NUMA or cluster, the problems of
load balancing are exacerbated because they must be addressed at two
levels, locally among the processors of each shared-memory node (SM-
node) and globally among all nodes.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 390 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Intra-Query Load Balancing
• A general solution to load balancing in hybrid systems is the execution model
called Dynamic Processing (DP)
• The fundamental idea is that the query is decomposed into self-contained
units of sequential processing, each of which can be carried out by any
processor.
• A processor can migrate horizontally (intra-operator parallelism) and
vertically (inter-operator parallelism) along the query operators.
• This minimizes the communication overhead of internode load balancing by
maximizing intra and inter-operator load balancing within shared-memory
nodes.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 391 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• Clusters of PC servers are another form of parallel computer that provides a
cost effective alternative to supercomputers or tightly-coupled
multiprocessors
• They have been used successfully in scientific computing, web information
retrieval (e.g., Google search engine) and data warehousing.
• However, these applications are typically read-intensive, which makes it
easier to exploit parallelism
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 392 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• In order to support update-intensive applications that are typical of business
data processing, full parallel database capabilities, including transaction
support, must be provided.
• This can be achieved using a parallel DBMS implemented over a cluster.
• In this case, all cluster nodes are homogeneous, under the full control of the
parallel DBMS.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 393 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• The parallel DBMS solution may be not viable for some businesses such as
Application Service Providers (ASP).
• In the ASP model, customers’ applications and databases (including data
and DBMS) are hosted at the provider site and need to be available, typically
through the Internet, as efficiently as if they were local to the customer site.
• A major requirement is that applications and databases remain autonomous,
i.e., remain unchanged when moved to the provider site’s cluster and under
the control of the customers.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 394 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• Thus, preserving autonomy is critical to avoid the high costs and problems
associated with application code modification.
• Using a parallel DBMS in this case is not appropriate as it is expensive,
requires heavy migration to the parallel DBMS and hurts database autonomy.
• A solution is to use a database cluster , which is a cluster of autonomous
databases, each managed by an off-the-shelf DBMS
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 395 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• A major difference with a parallel DBMS implemented on a cluster is the use
of a “black-box” DBMS at each node.
• Since the DBMS source code is not necessarily available and cannot be
changed to be “cluster-aware”, parallel data management capabilities must
be implemented via middleware.
• In its simplest form, a database cluster can be viewed as a multidatabase
system on a cluster.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 396 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• Clusters of PC servers are another form of parallel computer that provides a
cost effective alternative to supercomputers or tightly-coupled
multiprocessors.
• For instance, they have been used successfully in scientific computing, web
information retrieval (e.g., Google search engine) and data warehousing.
• However, these applications are typically read-intensive, which makes it
easier to exploit parallelism.
• In order to support update-intensive applications that are typical of business
data
• processing, full parallel database capabilities, including transaction support,
must be provided.
• This can be achieved using a parallel DBMS implemented over a cluster.
• In this case, all cluster nodes are homogeneous, under the full control of the
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 397 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• The parallel DBMS solution may be not viable for some businesses such as
Application Service Providers (ASP).
• In the ASP model, customers’ applications and databases (including data
and DBMS) are hosted at the provider site and need to be available, typically
through the Internet, as efficiently as if they were local to the customer site.
• A major requirement is that applications and databases remain autonomous,
i.e., remain unchanged when moved to the provider site’s cluster and under
the control of the customers.
• Thus, preserving autonomy is critical to avoid the high costs and problems
associated with application code modification.
• Using a parallel DBMS in this case is not appropriate as it is expensive,
requires heavy migrazion to the parallel DBMS and hurts database
autonomy.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 398 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Database Clusters
• A solution is to use a database cluster , which is a cluster of autonomous
databases, each managed by an off-the-shelf DBMS
• A major difference with a parallel DBMS implemented on a cluster is the use
of a “black-box” DBMS at each node.
• Since the DBMS source code is not necessarily available and cannot be
changed to be “cluster-aware”, parallel data management capabilities must
be implemented via middleware.
• In its simplest form, a database cluster can be viewed as a multidatabase
system on a cluster.
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 399 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
THANK YOU
SSZG554 - Distributed Data Systems Date : 24th Oct 2020 400 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani Presentation
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Anil Kumar Ghadiyaram
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Source Courtesy: Some of the contents of this PPT are sourced from materials provided by Publishers of T1 & T2
Text Books
T1 : M. Tamer Özsu • Patrick Valduriez Principles
of Distributed Database Systems Third Edition
T2 : Distributed Operating Systems: Concepts And
Design By Pradeep K. Sinha, PHI Learning Pvt.
Ltd
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
• Web Graph Management • Ranking and Link Analysis
• Compressing Web Graphs • Evaluation of Keyword Search
• Storing Web Graphs as S-Nodes • Web Querying
– Supernode graph • Semistructured Data Approach
– Intranode graph • Web Query Language Approach
– Positive superedge graph • Question Answering
– Negative superedge graph • Searching and Querying the Hidden Web
• Web Search • Crawling the Hidden Web
• Web Crawling
• Indexing
– Structure Index
– Text Index
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 404 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Graph Management
• The web consists of “pages” that are connected by hyperlinks, and this
structure can be modelled as a directed graph that reflects the hyperlink
structure.
• In this graph, commonly referred to as the web graph, static HTML web
pages are the nodes and the links between them are represented as directed
edges
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 405 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Graph Management
• This brings us to a discussion of the structure of the web graph, which has a
“bowtie” shape
• It has a strongly connected component (the knot in the middle) in which there
is a path between each pair of pages.
• The strongly connected component (SCC) accounts for about 28% of the
web pages.
• A further 21% of the pages constitute the “IN” component from which there
are paths to pages in SCC, but to which no paths exist from pages in SCC.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 406 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Graph Management
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 407 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Graph Management
• Symmetrically, “OUT” component has pages to which paths exists from
pages in SCC but not vice versa, and these also constitute 21% of the
pages.
• What is referred to as “tendrils” consist of pages that cannot be reached from
SCC and from which SCC pages cannot be reached either.
• These constitute about 22% of the web pages.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 408 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Graph Management
• These are pages that have not yet been “discovered” and have not yet been
connected to the better known parts of the web.
• Finally, there are disconnected components that have no links to/from
anything except their own small communities.
• This makes up about 8% of the web.
• This structure is interesting in that it determines the results that one gets
from web searches and from querying the web.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 409 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Compressing Web Graphs
• A specific proposal for compressing the web graph takes advantage of the
fact that we can attempt to find nodes that share several common out-edges,
corresponding to the case where one node might have copied links from
another node
• The main idea behind this technique is that when a new node is added to the
graph, it takes an existing page and copies some of the links from that page
to itself.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 410 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Storing Web Graphs as S-Nodes
• An alternative to compressing the web graph is to develop special storage
structures that allow efficient storage and querying.
• S-Nodes is one such structure that provides a two-level representation of the
web graph.
• In this scheme, the web graph is represented by a set of smaller directed sub
-graphs.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 411 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Storing Web Graphs as S-Nodes
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 412 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Storing Web Graphs as S-Nodes
• Given a web graph WG, the S-Node representation can be constructed as
follows.
• Let P = N1;N2; :::;Nn be a partition on the vertex set of WG.
• The following types of directed graphs can be defined
– Supernode graph
– Intranode graph
– Positive superedge graph
– Negative superedge graph
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 413 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Storing Web Graphs as S-Nodes
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 414 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Supernode graph
• A supernode graph contains n vertices, one for each partition in P.
• There is a supernode for each of the partitions N1, N2 and N3.
• Supernodes are linked using superedges.
• A superedge Eij is created from Ni to Nj if there is at least one page in Ni that
points to some page in Nj .
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 415 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Supernode graph
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 416 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Intranode graph
• Each partition Ni is associated with an intranode graph
• IntraNode that represents all the interconnections between the pages that
belong to Ni.
• For example, IntraNode1 represents the hyperlinks between pages P1 and
P2.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 417 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Intranode graph
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 418 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Positive superedge graph
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 419 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Positive superedge graph
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 420 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Negative superedge graph
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 421 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Negative superedge graph
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 422 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Storing Web Graphs as S-Nodes
• Given a partition P on the vertex set of WG, an S-Node representation
Snode (WG;P) can be constructed by using the supernode graph that points
to the intranode graph and a set of positive and negative supernode graphs.
• The decision as to whether to use the positive or the negative supernode
graph depends on which representation has the lower number of edges.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 423 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Storing Web Graphs as S-Nodes
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 424 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Search
• Web search involves finding “all” the web pages that are relevant have
content related to keyword(s) that a user specifies.
• Naturally, it is not possible to find all the pages, or even to know if one has
retrieved all the pages; thus the search is performed on a database of web
pages that have been collected and indexed.
• There are usually multiple pages that are relevant to a query, these pages
are presented to the user in ranked order of relevance as determined by the
search engine.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 425 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Search
• In every search engine the crawler plays one of the most crucial roles.
• A crawler is a program used by a search engine to scan the web on its
behalf and collect data about web pages.
• A crawler is given a starting set of pages – more accurately, it is given a set
of Uniform Resource Locators (URLs) that identify these pages.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 426 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Search
• The crawler retrieves and parses the page corresponding to that URL,
extracts any URLs in it, and adds these URLs to a queue.
• In the next cycle, the crawler extracts a URL from the queue (based on some
order) and retrieves the corresponding page.
• This process is repeated until the crawler stops.
• A control module is responsible for deciding which URLs should be visited
next.
• The retrieved pages are stored in a page repository
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 427 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 428 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Search
• The indexer module is responsible for constructing indexes on the pages
that have been downloaded by the crawler
• The ranking module is responsible for sorting the large number of results so
that those that are considered to be most relevant to the user’s search are
presented first
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 429 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• A crawler scans the web on behalf of a search engine to extract information
about the visited web pages.
• Given the size of the web, the changing nature of web pages, and the limited
computing and storage capabilities of crawlers, it is impossible to crawl the
entire web.
• Thus, a crawler must be designed to visit “most important” pages before
others.
• The issue, then, is to visit the pages in some ranked order of importance.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 430 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• The importance of a given page measures can be static, such that the
importance of a page is determined independent of retrieval queries that will
run against it, or dynamic in that they take the queries into consideration
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 431 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• The PageRank of a page Pi (denoted r(Pi) ) is simply the normalized sum of
the PageRank of all Pi ’s backlink pages (denoted as BPi ):
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 432 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• This formula calculates the rank of a page based on the backlinks, but
normalizes the contribution of each backlinking page Pj using the number of
links that Pj has to other pages.
• The idea here is that it is more important to be pointed at by pages
conservatively link to other pages than by those who link to others
indiscriminately
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 433 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• A second issue is how the crawler chooses the next page to visit once it has
crawled a particular page.
• The crawler maintains a queue in which it stores the URLs for the pages that
it discovers as it analyzes each page.
• Thus, the issue is one of ordering the URLs in this queue.
• One possibility is to visit the URLs in the order in which they were
discovered; this is referred to as the breadth-first approach
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 434 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• Another alternative is to use random ordering whereby the crawler chooses a
URL randomly from among those that are in its queue of unvisited pages.
• Other alternatives are to use metrics that combine ordering with importance
ranking such as backlink counts or PageRank
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 435 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• Since many web pages change over time, crawling is a continuous activity
and pages need to be re-visited.
• Instead of restarting from scratch each time, it is preferable to selectively re-
visit web pages and update the gathered information.
• Crawlers that follow this approach are called incremental crawlers
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 436 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Crawling
• Some search engines specialize in searching pages belonging to a particular
topic.
• These engines use crawlers optimized for the target topic, and are referred
to as focused crawlers
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 437 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Indexing
• In order to efficiently search the crawled pages and the gathered information,
a number of indexes are built.
• The two more important indexes are
– Structure (or link ) index
– Text (or content ) index
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 438 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Structure Index
• The structure index is based on the graph model with the graph representing
the structure of the crawled portion of the web.
• The efficient storage and retrieval of these pages is important
• The structure index can be used to obtain important information about the
linkage of web pages such as information regarding the neighborhood of a
page and the siblings of a page.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 439 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Text Index
• The most important and mostly used index is the text index .
• Indexes to support textbased retrieval can be implemented using any of the
access methods traditionally used to search over text document collections.
• Examples include suffix arrays, inverted files or inverted indexes and
signature files
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 440 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Text Index
• An inverted index is a collection of inverted lists, where each list is
associated with a particular word.
• In general, an inverted list for a given word is a list of document identifiers in
which the particular word occurs
• If needed, the location of the word in a particular page can also be saved as
part of the inverted list.
• This information is usually needed in proximity queries and query result
ranking
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 441 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Ranking and Link Analysis
• A typical search engine returns a large number of web pages that are
expected to be relevant to a user query.
• However, these pages are likely to be different in terms of their quality and
relevance.
• The user is not expected to browse through this large collection to find a high
quality page.
• Clearly, there is a need for algorithms to rank these pages thus higher quality
web pages appear as part of the top results.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 442 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Ranking and Link Analysis
• Link-based algorithms can be used to rank a collection of pages.
• A page that has a large number of incoming links is expected to have good
quality, and hence the number of incoming links to a page can be used as a
ranking criteria.
• This intuition is the basis of ranking algorithms
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 443 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Ranking and Link Analysis
• HITS is also a link-based algorithm.
• It is based on identifying “authorities” and “hubs”.
• A good authority page receives a high rank.
• Hubs and authorities have a mutually reinforcing relationship: a good
authority is a page that is linked to by many good hubs, and a good hub is a
document that links to many authorities.
• Thus, a page pointed to by many hubs (a good authority page) is likely to be
of high quality
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 444 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Evaluation of Keyword Search
• Keyword-based search engines are the most popular tools to search
information on the web.
• They are simple, and one can specify fuzzy queries that may not have an
exact answer, but may only be answered approximately by finding facts that
are “similar” to the keywords.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 445 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Evaluation of Keyword Search
• The limitation is that keyword search is not sufficiently powerful to express
complex queries
• A second limitation is that keyword search does not offer support for a global
view of information on the web the way that database querying exploits
database schema information
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 446 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Querying
• Declarative querying and efficient execution of queries has been a major
focus of database technology.
• It would be beneficial if the database techniques can be applied to the web.
• In this way, accessing the web can be treated, to a certain extent, similar to
accessing a large database
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 447 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Querying
• There are difficulties in carrying over traditional database querying concepts
to web data.
• Perhaps the most important difficulty is that database querying assumes the
existence of a strict schema.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 448 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Querying
• The web data are semi structured data may have some structure, but this
may not be as rigid, regular, or complete as that of databases, so that
different instances of the data may be similar but not identical
• There are, obviously, inherent difficulties in querying schemaless data.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 449 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Querying
• A second issue is that the web is more than the semi structured data
• The links that exist between web data entities (e.g., pages) are important
and need to be considered This requires links to be treated as first-class
objects.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 450 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Querying
• A third major difficulty is that there is no commonly accepted language,
similar to SQL, for querying web data
• A standardized language for XML has emerged (XQuery), and as XML
becomes more prevalent on the web
• This language is likely to become dominant and more widely used
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 451 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Semistructured Data Approach
• One way to approach querying the web data is to treat it as a collection of
semi structured data.
• Then, models and languages that have been developed for this purpose can
be used to query the data.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 452 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Semistructured Data Approach
• Semi structured data models and languages were not originally developed to
deal with web data; rather they addressed the requirements of growing data
collections that did not have as strict a schema as their relational counter
parts OEM (Object Exchange Model) is a self describing semi structured
data model. Self-describing means that each object specifies the schema
that it follows.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 453 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Semistructured Data Approach
• An OEM object is defined as a four-tuple label, type, value, oid where
– label - is a character string describing what the object represents,
– type - specifies the type of the object’s value,
– value - is obvious, and
– oid - is the object identifier that distinguishes it from other objects.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 454 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Semistructured Data Approach
• The type of an object can be atomic, in which case the
object is called an atomic object, or complex, in which
case the object is called a complex object.
• An atomic object contains a primitive value such as an
integer, a real, or a string, while a complex object
contains a set of other objects, which can themselves be
atomic or complex.
• The value of a complex object is a set of oids.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 455 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Query Language Approach
• The approaches in this category are aimed to directly address the
characteristics of web data, particularly focusing on handling links properly.
• Their starting point is to overcome the shortcomings of keyword search by
providing proper abstractions for capturing the content structure of
documents
• They combine the content-based queries (e.g., keyword expressions) and
structure-based queries (e.g., path expressions).
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 456 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Query Language Approach
• A number of languages have been proposed specifically to deal with web
data, and these can be categorized as first-generation and second
generation
• The first generation languages model the web as interconnected collection of
atomic objects.
• Consequently, these languages can express queries that search the link
structure among web objects and their textual content, but they cannot
express queries that exploit the document structure of these web objects
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 457 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Web Query Language Approach
• The second generation languages model the web as a linked collection of
structured objects, allowing them to express queries that exploit the
document structure similar to Semi structured languages.
• First generation approaches include WebSQL, W3QL and WebLog
• Second generation approaches include WebOQL and StruQL
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 458 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Question Answering
• These systems accept natural language questions that are then analyzed to
determine the specific query that is being posed.
• They, then, conduct a search to find the appropriate answer
• Question answering systems have grown within the context of IR systems
where the objective is to determine the answer to posed queries within a well
-defined corpus of documents.
• These are usually referred to as closed domain systems.
• They extend the capabilities of keyword search queries in two fundamental
ways.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 459 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Question Answering
• First, they allow users to specify complex queries in natural language that
may be difficult to specify as simple keyword search requests
• In the context of web querying, they also enable asking questions without a
full knowledge of the data organization.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 460 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Question Answering
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 461 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Question Answering
• This does not mean that they return exact answers as traditional DBMSs do,
but they may return a (ranked) list of explicit responses to the query, rather
than a set of web pages.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 462 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
General architecture of QA Systems 463
SSZG554 - Distributed Data Systems Date : 31st Aug 2020
Searching and Querying the Hidden Web
• Most general-purpose search engines only operate on the PIW while
considerable amount of the valuable data are kept in hidden databases,
either as relational data, as embedded documents, or in many other forms.
• The current trend in web searching is to find ways to search the hidden web
as well as the PIW, for two main reasons.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 464 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Searching and Querying the Hidden Web
• First is the size – the size of the hidden web (in terms of generated HTML
pages) is considerably larger than the PIW, therefore the probability of
finding answers to users’ queries is much higher if the hidden web can also
be searched.
• The second is in data quality – the data stored in the hidden web are usually
of much higher quality than those found on public web pages since they are
properly curated.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 465 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Searching and Querying the Hidden Web
• Ordinary crawlers cannot be used to search the hidden web, since there are
neither HTML pages, nor hyperlinks to crawl.
• Usually, the data in hidden databases can be only accessed through a
search interface or a special interface, requiring access to this interface.
• In most cases, the underlying structure of the database is unknown, and the
data providers are usually reluctant to provide any information about their
data that might help in the search process one has to work through the
interfaces provided by these data sources.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 466 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Crawling the Hidden Web
• One approach to address the issue of searching the hidden web is to try
crawling in a manner similar to that of the PIW.
• As already mentioned, the only way to deal with hidden web databases is
through their search interfaces.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 467 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Crawling the Hidden Web
A hidden web crawler should be able to perform two tasks:
(a) submit queries to the search interface of the database, and
(b) analyze the returned result pages and extract relevant information from
them.
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 468 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
THANK YOU
SSZG554 - Distributed Data Systems Date : 31st Aug 2020 469 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
BITS Pilani Presentation
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
Anil Kumar Ghadiyaram
BITS Pilani
Pilani | Dubai | Goa | Hyderabad
T1 : M. Tamer Özsu • Patrick Valduriez Principles
of Distributed Database Systems Third Edition
T2 : Distributed Operating Systems: Concepts And
Design By Pradeep K. Sinha, PHI Learning Pvt.
Ltd
R1 : Storage Networks Explained– by Ulf Troppens, Wolfgang Muller-
Freidt, Rainer Wolafka, IBM Storage Software Development, Germany.
Publishers: Wiley
• Hadoop Architecture
• HDFS
– Advantages
– Features
• Architecture
• HDFS Operations
• Big Data
• MapReduce
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 473 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Introduction
• Hadoop is an open-source framework that allows to store and process big
data in a distributed environment across clusters of computers using simple
programming models.
• It is designed to scale up from single servers to thousands of machines,
each offering local computation and storage.
• All the modules in Hadoop are designed with a fundamental assumption that
hardware failures are common and should be automatically handled by the
framework
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 474 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Introduction
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 475 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Introduction
• The core of Apache Hadoop consists of a storage part, known as Hadoop
Distributed File System (HDFS), and a processing part called MapReduce
• Hadoop splits files into large blocks and distributes them across nodes in a
cluster.
• To process data, Hadoop transfers packaged code for nodes to process in
parallel based on the data that needs to be processed.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 476 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Introduction
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 477 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hadoop Architecture
The base Apache Hadoop framework is composed of the following
modules:
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 478 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hadoop Architecture
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 479 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hadoop Distributed File System
• Hadoop can work directly with any mountable distributed file system such as
Local FS, HFTP FS, S3 FS, and others, but the most common file system
used by Hadoop is the Hadoop Distributed File System (HDFS).
• The Hadoop Distributed File System (HDFS) is based on the Google File
System (GFS) and provides a distributed file system that is designed to run
on large clusters (thousands of computers) of small computer machines in a
reliable, fault-tolerant manner.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 480 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hadoop Distributed File System
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 481 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Hadoop Distributed File System
• The DataNodes takes care of read and write operation with the file system.
• They also take care of block creation, deletion and replication based on
instruction given by NameNode.
• HDFS provides a shell like any other file system and a list of commands are
available to interact with the file system
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 482 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
How Does Hadoop Work?
Stage 1
• A user/application can submit a job to the Hadoop (a hadoop job client) for
required process by specifying the following items:
• The location of the input and output files in the distributed file system.
• The java classes in the form of jar file containing the implementation of map
and reduce functions.
• The job configuration by setting different parameters specific to the job.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 483 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
How Does Hadoop Work?
Stage 2
• The Hadoop job client then submits the job (jar/executable etc) and
configuration to the JobTracker which then assumes the responsibility of
distributing the software/configuration to the slaves, scheduling tasks and
monitoring them, providing status and diagnostic information to the job-client.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 484 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
How Does Hadoop Work?
Stage 3
• The TaskTrackers on different nodes execute the task as per MapReduce
implementation and output of the reduce function is stored into the output
files on the file system.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 485 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Advantages of Hadoop
• Hadoop framework allows the user to quickly write and test distributed
systems.
• It is efficient, and it automatic distributes the data and work across the
machines and in turn, utilizes the underlying parallelism of the CPU cores.
• Hadoop does not rely on hardware to provide fault-tolerance and high
availability (FTHA), rather Hadoop library itself has been designed to detect
and handle failures at the application layer.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 486 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Advantages of Hadoop
• Servers can be added or removed from the cluster dynamically and Hadoop
continues to operate without interruption.
• Another big advantage of Hadoop is that apart from being open source, it is
compatible on all the platforms since it is Java based.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 487 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS
• Hadoop File System was developed using distributed file system design.
• It is run on commodity hardware.
• Unlike other distributed systems, HDFS is highly faulttolerant and designed
using low-cost hardware.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 488 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS
• HDFS holds very large amount of data and provides easier access.
• To store such huge data, the files are stored across multiple machines.
• These files are stored in redundant fashion to rescue the system from
possible data losses in case of failure.
• HDFS also makes applications available to parallel processing.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 489 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Features of HDFS
• It is suitable for the distributed storage and processing.
• Hadoop provides a command interface to interact with HDFS.
• The built-in servers of namenode and datanode help users to easily check
the status of cluster.
• Streaming access to file system data.
• HDFS provides file permissions and authentication.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 490 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Architecture
• HDFS follows the master-slave architecture and it has the following elements
– Namenode
– Datanode
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 491 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Namenode
• The Namenode is the commodity hardware that contains the GNU/Linux
operating system and the namenode software.
• It is a software that can be run on commodity hardware.
• The system having the namenode acts as the master server and it does the
following tasks:
– Manages the file system namespace.
– Regulates client’s access to files.
– It also executes file system operations such as renaming, closing, and
opening files and directories.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 492 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Datanode
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 493 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Block
• Generally the user data is stored in the files of HDFS.
• The file in a file system will be divided into one or more segments and/or
stored in individual data nodes.
• These file segments are called as blocks. In other words, the minimum
amount of data that HDFS can read or write is called a Block. The default
block size is 64MB, but it can be increased as per the need to change in
HDFS configuration.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 494 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 495 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Goals of HDFS
Huge datasets :
• HDFS should have hundreds of nodes per cluster to manage the
applications having huge datasets.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 496 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Goals of HDFS
Hardware at data :
• A requested task can be done efficiently, when the computation takes place
near the data. Especially where huge datasets are involved, it reduces the
network traffic and increases the throughput.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 497 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
Starting HDFS
• Initially you have to format the configured HDFS file system, open namenode
(HDFS server), and execute the following command.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 498 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
• After formatting the HDFS, start the distributed file system. The following
command will start the namenode as well as the data nodes as cluster.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 499 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 500 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 501 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
Step 1
• You have to create an input directory.
Step 2
• Transfer and store a data file from local systems to the Hadoop file system
using the put command.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 502 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
Step 3
• You can verify the file using ls command.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 503 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
Step 1
• Initially, view the data from HDFS using cat command.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 504 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
Step 2
• Get the file from HDFS to the local file system using get command.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 505 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
HDFS Operations
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 506 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data
• Big data means really a big data, it is a collection of large datasets that
cannot be processed using traditional computing techniques.
• Big data is not merely a data, rather it has become a complete subject, which
involves various tools, technqiues and frameworks.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 507 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data
• Due to the advent of new technologies, devices, and communication means
like social networking sites, the amount of data produced by mankind is
growing rapidly every year.
• The amount of data produced by us from the beginning of time till 2003 was
5 billion gigabytes.
• The same amount was created in every two days in 2011, and in every ten
minutes in 2013.
• This rate is still growing enormously. Though all this information produced is
meaningful and can be useful when processed, it is being neglected.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 508 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data
• Big data involves the data produced by different devices and applications.
• Given are some of the fields that come under the umbrella of Big Data
Black Box Data :
It is a component of helicopter, airplanes, and jets, etc. It captures voices of the
flight crew, recordings of microphones and earphones, and the performance
information of the aircraft.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 509 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 510 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 511 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data
Thus Big Data includes huge volume, high velocity, and extensible
variety of data. The data in it will be of three types.
– Structured data : Relational data.
– Semi Structured data : XML data.
– Unstructured data : Word, PDF, Text, Media Logs.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 512 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Benefits of Big Data
• Using the information kept in the social network like Facebook, the marketing
agencies are learning about the response for their campaigns, promotions,
and other advertising mediums.
• Using the information in the social media like preferences and product
perception of their consumers, product companies and retail organizations
are planning their production.
• Using the data regarding the previous medical history of patients, hospitals
are providing better and quick service.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 513 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Technologies
• Big data technologies are important in providing more accurate analysis,
which may lead to more concrete decision-making resulting in greater
operational efficiencies, cost reductions, and reduced risks for the business.
• To harness the power of big data, you would require an infrastructure that
can manage and process huge volumes of structured and unstructured data
in realtime and can protect data privacy and security.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 514 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Technologies
• There are various technologies in the market from different vendors including
Amazon, IBM, Microsoft, etc., to handle big data.
• While looking into the technologies that handle big data, we examine the
following two classes of technology:
– Operational Big Data
– Analytical Big Data
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 515 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Operational Big Data
• This include systems like MongoDB that provide operational capabilities for
real-time, interactive workloads where data is primarily captured and stored.
• NoSQL Big Data systems are designed to take advantage of new cloud
computing architectures that have emerged over the past decade to allow
massive computations to be run inexpensively and efficiently.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 516 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Operational Big Data
• This makes operational big data workloads much easier to manage, cheaper,
and faster to implement.
• Some NoSQL systems can provide insights into patterns and trends based
on real-time data with minimal coding and without the need for data
scientists and additional infrastructure.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 517 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Analytical Big Data
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 518 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
• In this approach, an enterprise will have a computer to store and process big
data.
• Here data will be stored in an RDBMS like Oracle Database, MS SQL Server
or DB2 and sophisticated softwares can be written to interact with the
database, process the required data and present it to the users for analysis
purpose.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 519 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 520 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
• This approach works well where we have less volume of data that can be
accommodated by standard database servers, or up to the limit of the
processor which is processing the data.
• But when it comes to dealing with huge amounts of data, it is really a tedious
task to process such data through a traditional database server
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 521 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
Google’s Solution
• Google solved this problem using an algorithm called MapReduce.
• This algorithm divides the task into small parts and assigns those parts to
many computers connected over the network, and collects the results to form
the final result dataset
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 522 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 523 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
• Doug Cutting, Mike Cafarella and team took the solution provided by Google
and started an Open Source Project called HADOOP in 2005 and Doug
named it after his son's toy elephant.
• Now Apache Hadoop is a registered trademark of the Apache Software
Foundation.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 524 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
• Hadoop runs applications using the MapReduce algorithm, where the data is
processed in parallel on different CPU nodes.
• In short, Hadoop framework is capable enough to develop applications
capable of running on clusters of computers and they could perform
complete statistical analysis for a huge amounts of data.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 525 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Big Data Solutions
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 526 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
• MapReduce is a programming model and an associated implementation for
processing and generating large data sets with a parallel, distributed
algorithm on a cluster.
• A MapReduce program is composed of a Map() procedure (method) that
performs filtering and sorting (such as sorting students by first name into
queues, one queue for each name) and a Reduce() method that performs a
summary operation (such as counting the number of students in each queue,
yielding name frequencies).
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 527 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 528 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 529 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
• Processing can occur on data stored either in a filesystem (unstructured) or
in a database (structured).
• MapReduce can take advantage of the locality of data, processing it near the
place it is stored in order to reduce the distance over which it must be
transmitted.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 530 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
"Map" step:
Each worker node applies the "map()" function to the local data, and writes the
output to a temporary storage. A master node ensures that only one copy of
redundant input data is processed.
"Shuffle" step:
Worker nodes redistribute data based on the output keys (produced by the
"map()" function), such that all data belonging to one key is located on the
same worker node.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 531 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
"Reduce" step:
Worker nodes now process each group of output data, per key, in parallel.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 532 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
• MapReduce allows for distributed processing of the map and reduction
operations.
• Provided that each mapping operation is independent of the others, all maps
can be performed in parallel – though in practice this is limited by the number
of independent data sources and/or the number of CPUs near each source.
• Similarly, a set of 'reducers' can perform the reduction phase, provided that
all outputs of the map operation that share the same key are presented to
the same reducer at the same time, or that the reduction function is
associative
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 533 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
Another way to look at MapReduce is as a 5-step parallel and distributed
computation:
• Prepare the Map() input – the "MapReduce system" designates Map
processors, assigns the input key value K1 that each processor would work
on, and provides that processor with all the input data associated with that
key value.
• Run the user-provided Map() code – Map() is run exactly once for
each K1 key value, generating output organized by key values K2.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 534 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 535 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
An example, imagine that for a database of 1.1 billion people, one would like to
compute the average number of social contacts a person has according to age.
In SQL, such a query could be expressed as:
SELECT age, AVG(contacts)
FROM social.person
GROUP BY age
ORDER BY age
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 536 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
• Using MapReduce, the K1 key values could be the integers 1 through 1100,
each representing a batch of 1 million records, the K2 key value could be a
person’s age in years, and this computation could be achieved using the
following functions
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 537 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
function Map is
input: integer K1 between 1 and 1100, representing a batch of 1 million social.person records
for each social.person record in the K1 batch do
let Y be the person's age
let N be the number of contacts the person has
produce one output record (Y,(N,1))
repeat
end function
function Reduce is
input: age (in years) Y
for each input record (Y,(N,C)) do
Accumulate in S the sum of N*C
Accumulate in Cnew the sum of C
repeat
let A be S/Cnew
produce one output record (Y,(A,Cnew))
end function
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 538 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
• The MapReduce System would line up the 1100 Map processors, and would
provide each with its corresponding 1 million input records.
• The Map step would produce 1.1 billion (Y,(N,1)) records, with Y values
ranging between, say, 8 and 103.
• The MapReduce System would then line up the 96 Reduce processors by
performing shuffling operation of the key/value pairs due to the fact that we
need average per age, and provide each with its millions of corresponding
input records.
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 539 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
• The Reduce step would result in the much reduced set of only 96 output
records (Y,A), which would be put in the final result file, sorted by Y.
• The count info in the record is important if the processing is reduced more
than one time.
• If we did not add the count of the records, the computed average would be
wrong, for example:
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 540 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
-- map output #1: age, quantity of contacts
10, 9
10, 9
10, 9
-- map output #2: age, quantity of contacts
10, 9
10, 9
-- map output #3: age, quantity of contacts
10, 10
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 541 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
MapReduce
• If we reduce files #1 and #2, we will have a new file with an average of 9
contacts for a 10-year-old person ((9+9+9+9+9)/5):
-- reduce step #1: age, average of contacts
10, 9
• If we reduce it with file #3, we lose the count of how many records we've
already seen, so we end up with an average of 9.5 contacts for a 10-year-old
person ((9+10)/2), which is wrong.
• The correct answer is
9.166 = 55 / 6 = (9*3+9*2+10*1)/(3+2+1).
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 542 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
THANK YOU
SSZG554 - Distributed Data Systems Date : 7th Nov 2020 543 BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956