Professional Documents
Culture Documents
Unit - 1 Distributed Systems Adbt 27 Pages
Unit - 1 Distributed Systems Adbt 27 Pages
more direct where all the different components can interact directly with other components notified telling that such an event has occurred. So, if anyone is interested, that node
through a direct method call. can pull the event from the bus and use it. Sometimes these events could be data, or
even URLs to resources. So the receiver can access whatever the information is given in
the event and process accordingly.
These events occasionally carry data. An advantage in this architectural style is that,
components are loosely coupled. So it is easy to add, remove and modify components in
the system.
One major advantage is that, these heterogeneous components can contact the bus,
through any communication protocol. But an ESB or a specific bus, has the capability
to handle any type of incoming request and process accordingly.
As shown in the above image, communication between object happen as method invocations.
These are generally called Remote Procedure Calls (RPC). Some popular examples are Java
RMI, Web Services and REST API Calls. This has the following properties.
This architecture style is less structured.
component = object
connector = RPC or RMI
When decoupling these processes in space, people wanted the components to be anonymous
and replaceable. And the synchronization process needed to be asynchronous, which has led to
Data Centered Architectures and Event Based Architectures. This architectural style is based on the publisher-subscriber architecture. Between each node
Data Centered Architecture there is no direct communication or coordination. Instead, objects which are subscribed to the
As the title suggests, this architecture is based on a data center, where the primary service communicate through the event bus.
communication happens via a central data repository. The event based architecture supports, several communication styles.
This common repository can be either active or passive. Publisher-subscriber
This is more like a producer consumer problem. Broadcast
The producers produce items to a common data store, and the consumers can request Point-to-Point
data from it. The major advantage of this architecture is that the Components are decoupled in space -
This common repository could even be a simple database. But the idea is that, the loosely coupled.
communication between objects happening through this shared common storage. 4) System Level Architecture
This supports different components (or objects) by providing a persistent storage space The two major system level architectures that we use today are Client-server and Peer-to-
for those components (such as a MySQL database). peer (P2P). We use these two kinds of services in our day to day lives, but the difference
All the information related to the nodes in the system are stored in this persistent between these two is often misinterpreted.
storage. In event-based architectures, data is only sent and received by those Client Server Architecture
components who have already subscribed. The client server architecture has two major components.
Some popular examples are distributed file systems, producer consumer, and web based data The client and
services. The server.
The Server is where all the processing, computing and data handling is happening,
whereas the Client is where the user can access the services and resources given by the
Server (Remote Server).
The clients can make requests from the Server, and the Server will respond accordingly.
Generally, there is only one server that handles the remote side. But to be on the safe
side, we do use multiple servers will load balancing techniques.
3|Page 4|Page
As one common design feature, the Client Server architecture has a centralized security
database. This database contains security details like credentials and access details. Users
can't log in to a server, without the security credentials. So, it makes this architecture a bit
more stable and secure than Peer to Peer. The stability comes where the security database can
allow resource usage in a much more meaningful way. But on the other hand, the system
might get low, as the server only can handle a limited amount of workload at a given time.
Advantages:
Easier to Build and Maintain 6) Middleware in Distributed Applications
Better Security If we look at Distributed systems today, they lack the uniformity and consistency. Various
Stable heterogeneous devices have taken over the world where distributed system cater to all these
Disadvantages: devices in a common way. One way distributed systems can achieve uniformity is through a
Single point of failure common layer to support the underlying hardware and operating systems. This common layer
Less scalable is known as a middleware, where it provides services beyond what is already provided by
Peer to Peer (P2P) Operating systems, to enable various features and components of a distributed system to
The general idea behind peer to peer is where there is no central control in a distributed enhance its functionality better. This layer provides a certain data structures and operations
system. The basic idea is that, each node can either be a client or a server at a given time. If that allow processes and users on far-flung machines to inter-operate and work together in a
the node is requesting something, it can be known as a client, and if some node is providing consistent way. The image given below, depicts the usage of a middleware to inter-connect
something, it can be known as a server. In general, each node is referred to as a Peer. various kinds of nodes together.
In this network, any new node has to first join the network. After joining in, they can either
request a service or provide a service. The initiation phase of a node (Joining of a node), can
vary according to implementation of a network. There are two ways in how a new node can
get to know, what other nodes are providing. 7) Centralized vs Decentralized Architectures
Centralized Lookup Server - The new node has to register with the centralized look The two main structures that we see within distributed system overlays are Centralized and
up server and mention the services it will be providing, on the network. So, whenever Decentralized architectures. The centralized architecture can be explained by a simple client-
you want to have a service, you simply have to contact the centralized look up server server architecture where the server acts as a central unit. This can also be considered as
and it will direct you to the relevant service provider. centralized look up table with the following characteristics.
Decentralized System - A node desiring for specific services must, broadcast and ask Low overhead
every other node in the network, so that whoever is providing the service will respond. Single point of failure
5) A Comparison between Client Server and Peer to Peer Architectures
Easy to Track
Additional Overhead.
5|Page 6|Page
Mapping Function: Map the hash value to a specific node in the system
Lookup table: Return the network address of the node represented by the unique hash
value.
Unstructured P2P Systems
There is no specific structure in these systems, hence the name "unstructured networks". Due
to this reason, the scalability of the unstructured p2p systems is very high. These systems rely
on randomized algorithms for constructing an overlay network. As in structured p2p systems,
there is no specific path for a certain node. It's generally random, where every unstructured
system tried to maintain a random path. Due to this reason, the search of a certain file or node
is never guaranteed in unstructured systems.
The basic principle is that each node is required to randomly select another node, and contact
it.
Let each peer maintain a partial view of the network, consisting of n other nodes
Each node P periodically selects a node Q from its partial view
When it comes to distributed systems, we are more interested in studying more on the overlay P and Q exchange information and exchange members from their respective partial
and unstructured network topologies that we can see today. In general, the peer to peer views
systems that we see today can be separated into three unique sections. Hybrid P2P Systems
Structured P2P: nodes are organized following a specific distributed data structure Hybrid systems are often based on both client server architectures and p2p networks. A
Unstructured P2P: nodes have randomly selected neighbors famous example is Bittorrent, which we use everyday. The torrent search engines provide a
Hybrid P2P: some nodes are appointed special functions in a well-organized fashion client server architecture, where the trackers provide a structured p2p overlay. The rest of
Structured P2P Architecture nodes, which are also known as leechers and seeders, become the unstructured overlay of the
The meaning of the word structured is that the system already has a predefined structure that network, allowing it to scale itself as needed and further.
other nodes will follow. Every structured network inherently suffers from poor scalability, due
to the need for structure maintenance. In general, the nodes in a structured overlay network
are formed in a logical ring, with nodes being connected to the this ring. In this ring, certain
nodes are responsible for certain services.
A common approach that can be used to tackle the coordination between nodes, is to use
distributed hash tables (DHTs). A traditional hash function converts a unique key into a hash
value that will represent an object in the network. The hash function value is used to insert an
object in the hash table and to retrieve it.
In a DHT, each key is assigned to a unique hash, where the random hash value needs to be of a
very large address space, in order to ensure uniqueness. A mapping function is being used to
assign objects to nodes based on the hash function value. A look up based on the hash function
value, returns the network address of the node that stores the requested object.
Hash Function: Takes a key and produces a unique hash value
7|Page 8|Page
A database is an ordered collection of related data that is built for a specific purpose. A
database may be organized as a collection of multiple tables, where a table represents a real
world element or entity. Each table has several different fields that represent the
characteristic features of the entity.
For example, a company database may include tables for projects, employees, departments,
products and financial records. The fields in the Employee table may be Name, Company_Id,
Date_of_Joining, and so forth.
A database management system is a collection of programs that enables creation and
maintenance of a database. DBMS is available as a software package that facilitates
definition, construction, manipulation and sharing of data in a database. Definition of a
database includes description of the structure of a database. Construction of a database
involves actual storing of the data in any storage medium. Manipulation refers to the
retrieving information from the database, updating the database and generating reports.
Sharing of data facilitates data to be accessed by different users or programs.
Examples of DBMS Application Areas
DISTRIBUTED DATABASE CONCEPTS Automatic Teller Machines
Train Reservation System
WHAT IS DISTRIBUTED DATABASE? Employee Management System
Student Information System
A Distributed database is defined as a logically related collection of data that is shared Examples of DBMS Packages
which is physically distributed over a computer network on different sites. The Distributed MySQL
DBMS is defined as, the software that allows for the management of the distributed database Oracle
and make the distributed data available for the users. SQL Server
dBASE
FoxPro
PostgreSQL, etc.
Database Schemas
A database schema is a description of the database which is specified during database design
and subject to infrequent alterations. It defines the organization of the data, the relationships
among them, and the constraints associated with them.
Databases are often represented through the three-schema architecture or ANSISPARC
architecture. The goal of this architecture is to separate the user application from the
physical database. The three levels are −
Internal Level having Internal Schema − It describes the physical structure, details
of internal storage and access paths for the database.
9|Page 10 | P a g e
Conceptual Level having Conceptual Schema − It describes the structure of the 3.Relational DBMS
whole database while hiding the details of physical storage of data. This illustrates the In relational databases, the database is represented in the form of relations. Each relation
entities, attributes with their data types and constraints, user operations and models an entity and is represented as a table of values. In the relation or table, a row is called
relationships.
a tuple and denotes a single record. A column is called a field or an attribute and denotes a
External or View Level having External Schemas or Views − It describes the
characteristic property of the entity. RDBMS is the most popular database management
portion of a database relevant to a particular user or a group of users while hiding the
system.
rest of database.
For example − A Student Relation −
Types of DBMS
2.Network DBMS
Network DBMS in one where the relationships among data in the database are of type many-
to-many in the form of a network. The structure is generally complicated due to the existence Distributed DBMS
of numerous many-to-many relationships. Network DBMS is modelled using “graph” data
structure. A distributed database is a set of interconnected databases that is distributed over the
computer network or internet. A Distributed Database Management System (DDBMS)
manages the distributed database and provides mechanisms so as to make the databases
transparent to the users. In these systems, data is intentionally distributed among multiple
nodes so that all computing resources of the organization can be optimally used.
Operations on DBMS
The four basic operations on a database are Create, Retrieve, Update and Delete.
CREATE database structure and populate it with data − Creation of a database
relation involves specifying the data structures, data types and the constraints of the
data to be stored.
Example − SQL command to create a student table −
CREATE TABLE STUDENT (
11 | P a g e 12 | P a g e
ROLL INTEGER PRIMARY KEY, Data is physically stored across multiple sites. Data in each site can be managed by a
NAME VARCHAR2(25), DBMS independent of the other sites.
YEAR INTEGER, The processors in the sites are connected via a network. They do not have any
STREAM VARCHAR2(10) multiprocessor configuration.
); A distributed database is not a loosely connected file system.
Once the data format is defined, the actual data is stored in accordance with the A distributed database incorporates transaction processing, but it is not synonymous
format in some storage medium. with a transaction processing system.
Example SQL command to insert a single tuple into the student table −
INSERT INTO STUDENT ( ROLL, NAME, YEAR, STREAM) Distributed Database Management System
VALUES ( 1, 'ANKIT JHA', 1, 'COMPUTER SCIENCE');
A distributed database management system (DDBMS) is a centralized software system that
RETRIEVE information from the database – Retrieving information generally involves
manages a distributed database in a manner as if it were all stored in a single location.
selecting a subset of a table or displaying data from the table after some computations
have been done. It is done by querying upon the table. Features
Example − To retrieve the names of all students of the Computer Science stream, the It is used to create, retrieve, update and delete distributed databases.
following SQL query needs to be executed − It synchronizes the database periodically and provides access mechanisms by the virtue
SELECT NAME FROM STUDENT of which the distribution becomes transparent to the users.
WHERE STREAM = 'COMPUTER SCIENCE'; It ensures that the data modified at any site is universally updated.
UPDATE information stored and modify database structure – Updating a table It is used in application areas where large volumes of data are processed and accessed
involves changing old values in the existing table’s rows with new values. by numerous users simultaneously.
Example − SQL command to change stream from Electronics to Electronics and It is designed for heterogeneous database platforms.
Communications − It maintains confidentiality and data integrity of the databases.
UPDATE STUDENT
SET STREAM = 'ELECTRONICS AND COMMUNICATIONS' Factors Encouraging DDBMS
WHERE STREAM = 'ELECTRONICS';
Modifying database means to change the structure of the table. However, modification The following factors encourage moving over to DDBMS −
of the table is subject to a number of restrictions. Distributed Nature of Organizational Units − Most organizations in the current
Example − To add a new field or column, say address to the Student table, we use the times are subdivided into multiple units that are physically distributed over the globe.
following SQL command − Each unit requires its own set of local data. Thus, the overall database of the
ALTER TABLE STUDENT organization becomes distributed.
ADD ( ADDRESS VARCHAR2(50) ); Need for Sharing of Data − The multiple organizational units often need to
DELETE information stored or delete a table as a whole – Deletion of specific communicate with each other and share their data and resources. This demands
information involves removal of selected rows from the table that satisfies certain common databases or replicated databases that should be used in a synchronized
conditions. manner.
Example − To delete all students who are in 4 th year currently when they are passing Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
out, we use the SQL command − Online Analytical Processing (OLAP) work upon diversified systems which may have
DELETE FROM STUDENT common data. Distributed database systems aid both these processing by providing
WHERE YEAR = 4; synchronized data.
Alternatively, the whole table may be removed from the database. Database Recovery − One of the common techniques used in DDBMS is replication of
Example − To remove the student table completely, the SQL command used is − data across different sites. Replication of data automatically helps in data recovery if
DROP TABLE STUDENT; database in any site is damaged. Users can access data from other sites while the
damaged site is being reconstructed. Thus, database failure may become almost
A distributed database is a collection of multiple interconnected databases, which are inconspicuous to users.
spread physically across various locations that communicate via a computer network. Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
Features functionality for using the same data among different platforms.
Databases in the collection are logically interrelated with each other. Often they Advantages of Distributed Databases
represent a single logical database.
13 | P a g e 14 | P a g e
Following are the advantages of distributed databases over centralized databases. Homogeneous Distributed Databases
Modular Development − If the system needs to be expanded to new locations or new units, In a homogeneous distributed database, all the sites use identical DBMS and operating
in centralized database systems, the action requires substantial efforts and disruption in the systems. Its properties are −
existing functioning. However, in distributed databases, the work simply requires adding new
The sites use very similar software.
computers and local data to the new site and finally connecting them to the distributed
The sites use identical DBMS or DBMS from the same vendor.
system, with no interruption in current functions.
Each site is aware of all other sites and cooperates with other sites to process user
More Reliable − In case of database failures, the total system of centralized databases comes requests.
to a halt. However, in distributed systems, when a component fails, the functioning of the The database is accessed through a single interface as if it is a single database.
system continues may be at a reduced performance. Hence DDBMS is more reliable. Types of Homogeneous Distributed Database
Better Response − If data is distributed in an efficient manner, then user requests can be met There are two types of homogeneous distributed database −
from local data itself, thus providing faster response. On the other hand, in centralized Autonomous − Each database is independent that functions on its own. They are
systems, all queries have to pass through the central computer for processing, which increases integrated by a controlling application and use message passing to share data updates.
the response time. Non-autonomous − Data is distributed across the homogeneous nodes and a central
Lower Communication Cost − In distributed database systems, if data is located locally or master DBMS co-ordinates data updates across the sites.
where it is mostly used, then the communication costs for data manipulation can be Heterogeneous Distributed Databases
minimized. This is not feasible in centralized systems. In a heterogeneous distributed database, different sites have different operating systems,
DBMS products and data models. Its properties are −
Adversities of Distributed Databases
Different sites use dissimilar schemas and software.
Following are some of the adversities associated with distributed databases. The system may be composed of a variety of DBMSs like relational, network,
hierarchical or object oriented.
Need for complex and expensive software − DDBMS demands complex and often Query processing is complex due to dissimilar schemas.
expensive software to provide data transparency and co-ordination across the several Transaction processing is complex due to dissimilar software.
sites. A site may not be aware of other sites and so there is limited co-operation in processing
Processing overhead − Even simple operations may require a large number of user requests.
communications and additional calculations to provide uniformity in data across the Types of Heterogeneous Distributed Databases
sites. Federated − The heterogeneous database systems are independent in nature and
Data integrity − The need for updating data in multiple sites pose problems of data integrated together so that they function as a single database system.
integrity. Un-federated − The database systems employ a central coordinating module through
Overheads for improper data distribution − Responsiveness of queries is largely which the databases are accessed.
dependent upon proper data distribution. Improper data distribution often leads to very
slow response to user requests. Distributed DBMS Architectures
Types of Distributed Databases DDBMS architectures are generally developed depending on three parameters −
Distributed databases can be broadly classified into homogeneous and heterogeneous Distribution − It states the physical distribution of data across the different sites.
distributed database environments, each with further sub-divisions, as shown in the following Autonomy − It indicates the distribution of control of the database system and the
illustration. degree to which each constituent DBMS can operate independently.
Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.
Architectural Models
15 | P a g e 16 | P a g e
Client - Server Architecture for DDBMS This is an integrated database system formed by a collection of two or more autonomous
This is a two-level architecture where the functionality is divided into servers and clients. The database systems.
server functions primarily encompass data management, query processing, optimization and Multi-DBMS can be expressed through six levels of schemas −
transaction management. Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management. Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database.
The two different client - server architecture are − Multi-database Conceptual Level − Depicts integrated multi-database that
comprises of global logical multi-database structure definitions.
Single Server Multiple Client
Multi-database Internal Level − Depicts the data distribution across different sites
Multiple Server Multiple Client (shown in the following diagram)
and multi-database to local data mapping.
Local database View Level − Depicts public view of local data.
Local database Conceptual Level − Depicts local data organization at each site.
Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS −
Model with multi-database conceptual level.
Model without multi-database conceptual level.
Distributed databases are used for horizontal scaling, and they are designed to meet the
workload requirements without having to make changes in the database application or
vertically scale a single machine.
17 | P a g e 18 | P a g e
A distributed database represents multiple interconnected databases spread out across The following diagram shows an example of a homogeneous database:
several sites connected by a network. Since the databases are all connected, they appear as a
single database to the users.
Distributed databases utilize multiple nodes. They scale horizontally and develop a distributed
system. More nodes in the system provide more computing power, offer greater availability,
and resolve the single point of failure issue.
Different parts of the distributed database are stored in several physical locations, and the
processing requirements are distributed among processors on multiple database nodes.
Homogenous
Heterogenous
Homogeneous :
19 | P a g e 20 | P a g e
Distributed database storage is managed in two ways: There are two types of fragmentation:
Replication Horizontal fragmentation - The relation schema is fragmented into groups of rows,
Fragmentation and each group (tuple) is assigned to one fragment.
Vertical fragmentation - The relation schema is fragmented into smaller schemas,
Replication and each fragment contains a common candidate key to guarantee a lossless join.
In database replication, the systems store copies of data on different sites. If an entire Distributed Database Advantages and Disadvantages
database is available on multiple sites, it is a fully redundant database. Advantages Disadvantages
The advantage of database replication is that it increases data availability on different sites Modular development Costly software
and allows for parallel query requests to be processed.
Reliability Large overhead
However, database replication means that data requires constant updates and
synchronization with other sites to maintain an exact database copy. Any changes made on Lower communication costs Data integrity
one site must be recorded on other sites, or else inconsistencies occur.
Better response Improper data distribution
Constant updates cause a lot of server overhead and complicate concurrency control, as a lot
of concurrent queries must be checked in all available sites. What is a Distributed Transaction?
There are two possible outcomes: 1) all operations successfully complete, or 2) none
of the operations are performed at all due to a failure somewhere in the system. In
the latter case, if some work was completed prior to the failure, that work will be
reversed to ensure no net work was done. This type of operation is in compliance
with the “ACID” (atomicity-consistency-isolation-durability) principles of databases
that ensure data integrity. ACID is most commonly associated with transactions on a
single database server, but distributed transactions extend that guarantee across
multiple databases.
The operation known as a “two-phase commit” (2PC) is a form of a distributed
transaction. “XA transactions” are transactions using the XA protocol, which is one
implementation of a two-phase commit operation.
Fragmentation
The prerequisite for fragmentation is to make sure that the fragments can later be
reconstructed into the original relation without losing data.
The advantage of fragmentation is that there are no data copies, which prevents data
inconsistency.
21 | P a g e 22 | P a g e
sites where the transaction is being executed and uniformly enforce the decision. When
processing is complete at each site, it reaches the partially committed transaction state and
waits for all other transactions to reach their partially committed states. When it receives the
message that all the sites are ready to commit, it starts to commit. In a distributed system,
either all sites commit or none of them does.
The different distributed commit protocols are −
One-phase commit
Two-phase commit
Three-phase commit
Distributed one-phase commit is the simplest commit protocol. Let us consider that there is a
A distributed transaction spans multiple databases and guarantees data integrity. controlling site and a number of slave sites where the transaction is being executed. The steps
in distributed commit are −
How Do Distributed Transactions Work? After each slave has locally completed its transaction, it sends a “DONE” message to the
controlling site.
Distributed transactions have the same processing completion requirements as The slaves wait for “Commit” or “Abort” message from the controlling site. This waiting
regular database transactions, but they must be managed across multiple resources, time is called window of vulnerability.
making them more challenging to implement for database developers. The multiple When the controlling site receives “DONE” message from each slave, it makes a
resources add more points of failure, such as the separate software systems that run decision to commit or abort. This is called the commit point. Then, it sends this message
the resources (e.g., the database software), the extra hardware servers, and network to all the slaves.
failures. This makes distributed transactions susceptible to failures, which is why On receiving this message, a slave either commits or aborts and then sends an
safeguards must be put in place to retain data integrity. acknowledgement message to the controlling site.
For a distributed transaction to occur, transaction managers coordinate the
resources (either multiple databases or multiple nodes of a single database). The Distributed Two-phase Commit
transaction manager can be one of the data repositories that will be updated as part
of the transaction, or it can be a completely independent separate resource that is Distributed two-phase commit reduces the vulnerability of one-phase commit protocols. The
only responsible for coordination. The transaction manager decides whether to steps performed in the two phases are as follows −
commit a successful transaction or rollback an unsuccessful transaction, the latter of
which leaves the database unchanged. Phase 1: Prepare Phase
First, an application requests the distributed transaction to the transaction After each slave has locally completed its transaction, it sends a “DONE” message to the
manager. The transaction manager then branches to each resource, which will have controlling site. When the controlling site has received “DONE” message from all slaves,
its own “resource manager” to help it participate in distributed transactions. it sends a “Prepare” message to the slaves.
Distributed transactions are often done in two phases to safeguard against partial The slaves vote on whether they still want to commit or not. If a slave wants to commit,
updates that might occur when a failure is encountered. The first phase involves it sends a “Ready” message.
acknowledging intent to commit, or a “prepare-to-commit” phase. After all resources A slave that does not want to commit sends a “Not Ready” message. This may happen
acknowledge, they are then asked to run a final commit, and then the transaction is when the slave has conflicting concurrent transactions or there is a timeout.
completed.
Phase 2: Commit/Abort Phase
After the controlling site has received “Ready” message from all the slaves −
o The controlling site sends a “Global Commit” message to the slaves.
o The slaves apply the transaction and send a “Commit ACK” message to the
COMMIT PROTOCOLS controlling site.
o When the controlling site receives “Commit ACK” message from all the slaves, it
In a local database system, for committing a transaction, the transaction manager has to only considers the transaction as committed.
convey the decision to commit to the recovery manager. However, in a distributed system, the After the controlling site has received the first “Not Ready” message from any slave −
transaction manager should convey the decision to commit to all the servers in the various
23 | P a g e 24 | P a g e
o The controlling site sends a “Global Abort” message to the slaves. Locking-based concurrency control systems can use either one-phase or two-phase locking
o The slaves abort the transaction and send a “Abort ACK” message to the protocols.
controlling site. 1. One-phase Locking Protocol: In this method, each transaction locks an item
o When the controlling site receives “Abort ACK” message from all the slaves, it before use and releases the lock as soon as it has finished using it. This locking
considers the transaction as aborted. method provides for maximum concurrency but does not always enforce
serializability.
Distributed Three-phase Commit
2. Two-phase Locking Protocol: In this method, all locking operations precede
The steps in distributed three-phase commit are as follows − the first lock-release or unlock operation. The transaction comprise of two
phases. In the first phase, a transaction only acquires all the locks it needs and
Phase 1: Prepare Phase do not release any lock. This is called the expanding or the growing phase. In
The steps are same as in distributed two-phase commit. the second phase, the transaction releases the locks and cannot request any
new locks. This is called the shrinking phase.
Phase 2: Prepare to Commit Phase
Every transaction that follows two-phase locking protocol is guaranteed to be serializable.
The controlling site issues an “Enter Prepared State” broadcast message. However, this approach provides low parallelism between two conflicting transactions.
The slave sites vote “OK” in response.
Phase 3: Commit / Abort Phase Timestamp Concurrency Control Algorithms: Timestamp-based concurrency
control algorithms use a transaction’s timestamp to coordinate concurrent access to a data
The steps are same as two-phase commit except that “Commit ACK”/”Abort ACK” message is item to ensure serializability. A timestamp is a unique identifier given by DBMS to a
not required. transaction that represents the transaction’s start time.
These algorithms ensure that transactions commit in the order dictated by their timestamps.
An older transaction should commit before a younger transaction, since the older transaction
enters the system before the younger one.
Timestamp-based concurrency control techniques generate serializable schedules such that
the equivalent serial schedule is arranged in order of the age of the participating transactions.
Optimistic Concurrency Control Algorithm : In systems with low conflict rates, the task
of validating every transaction for serializability may lower performance. In these cases, the
test for serializability is postponed to just before commit. Since the conflict rate is low, the
probability of aborting transactions which are not serializable is also low. This approach is
called optimistic concurrency control technique.
In this approach, a transaction’s life cycle is divided into the following three phases −
CONCURRENCY CONTROL Execution Phase − A transaction fetches data items to memory and performs
Concurrency control in distributed system is achieved by a program which is operations upon them.
called scheduler. Scheduler help to order the operations of transaction in such a way that the Validation Phase − A transaction performs checks to ensure that committing its
resulting logs is serializable. There have two type of the concurrency control that are locking changes to the database passes serializability test.
approach and non-locking approach. Commit Phase − A transaction writes back modified data item in memory to the disk.
27 | P a g e