Distributed Catalog Management

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

FACULTY OF ENGINEERING & TECHNOLOGY (CO

EDU.)
3RD SEMESTER
M.TECH. 2023 24
ADVANCE DATBASE MANAGEMENT
SYSTEM
Faculty: Dr. Savita Patil

Seminar Topic
DISTRIBUTED CATALOG MANAGEMENT
Presented By: Sandesh Mathpati
USN: SG22ADS014
TOPICS COVERED
• Fragmentation and Replication Intro
• What is DCM
• Why DCM is Used
• Naming Objects
• Catalog Structure
• Distributed data Independency
Fragmentation and replication
FRAGMENTATION
The process of dividing the database into smaller multiple parts
or sub−tables is called fragmentation. The smaller parts or
sub−tables are called fragments and are stored at different
locations.
REPLICATION
Data replication means a replica is made i. e. data is copied at
multiple locations to improve the availability of data. It is used
to remove inconsistency between the same data which result in a
distributed database so that users can do their task without
interrupting the work of other users.
What is distributed
catalog system?
Catalogs are referred to as database systems that contain information about
objects present in the database itself or the database itself that contains
metadata of a distributed database.

Distributed catalog management knows the data distribution across the


sites. If any fragmentation and replication of relation occur, then with the
help of a distributed catalog, we can uniquely find the replica of each
fragment.
Why Distributed Catalog Management Is Used?
L O C AT I O N T R A N S PA R E N C Y
D ATA D I S T R I B U T I O N
Users and applications interacting with the
In a distributed database, data is not stored in a
distributed database shouldn't be concerned with the
single, centralized location but is distributed across
physical location of the data. The catalog provides a
different nodes or servers. The catalog helps keep
layer of abstraction, allowing users to access data
track of where each piece of data is located.
without worrying about where it is stored

LOAD BALANCING
Q U E RY O P T I M I Z AT I O N
Distributed catalog management is essential for load
The catalog plays a role in query optimization by
balancing. It helps in distributing the workload
providing information about the structure of the
across different nodes by providing information
database and the distribution of data.
about the current state and usage of each node.
IMPORTANT FACTORS

NAMING OBJECTS C ATA L O G S T R U C T U R E D I S T R I B U T E D D ATA


INDEPENDENCE
If a relation is fragmented and In a distributed database system, the
replicated, we must be able to catalog structure is designed to Distributed data independence
uniquely identify each replica of manage information about the means that users should be able to
each fragment structure and distribution of data write queries without regard to
across multiple nodes or locations. how a relation is fragmented or
replicated; it is the responsibility
of the DBMS to compute the
relation as needed
A local name field, which is the name assigned locally at the site where
the relation is created. Two objects at different sites could possibly
have the same local name, but two objects at a given site cannot have
the same local name.
NAMING
A birth site field, which identifies the site where the relation was
OBJECTS created, and where information is maintained about all fragments and
replicas of the relation.

• These two fields identify a relation uniquely; we call the combination a global relation
name.

• we take the global relation name and add a replica-id field; we call the combination a
global replica name.

If we use a global name-server to assign globally unique names,


local autonomy is compromised; we want (users at) each site to
be able to assign names to local objects without reference to
names systemwide.
Centralized refers to the single site on which the whole data of the catalog is
stored. This makes it easy to use and understand. The benefits of dependability,
availability, autonomy, and processing load distribution, on the other hand, are
negatively impacted. The required catalog data is locked at the central site and
then transmitted to the requesting site for read operations from noncentral
locations.

CATALOG Fully Replicated Catalogs Each location site in this plan has identical copies of
the whole catalog. Questions may be answered locally under this system and

STRUCTURE reading can go more quickly. All changes must be distributed across all websites.

Partially Replicated Catalog Site autonomy is limited by the centralized and


completely duplicated systems since they are required to maintain a consistent
global picture of the catalog. Each site in the partially replicated method has a
comprehensive catalog of the data that is locally stored at that site. Additionally,
each site is allowed to cache data acquired from other sites. These cached copies
might not always be the most recent and updated versions, though
This property implies that users should not have to specify the full name for the
data objects accessed while evaluating a query.

Let us see how users can be enabled to access relations without considering how
the relations are distributed

Local name of relation =


DISTRIBUTED user name(DBMS adds) + user defined relation name

DATA When a user writes a program or SQL statement that refers to a relation, he or she simply uses
the relation name

INDEPENDENCE
Global relation name =
local name + user’s site-id ( birth site )

By looking up the global relation name—in the local catalog if it is cached there or in the
catalog at the birth site—the DBMS can locate replicas of the relation.

You might also like