Professional Documents
Culture Documents
DDBMS Pastpaper Solve by M.noman Tariq
DDBMS Pastpaper Solve by M.noman Tariq
DDBMS Pastpaper Solve by M.noman Tariq
• Multi-database View Level − Depicts multiple user views comprising of subsets of the integrated
distributed database.
• Multi-database Internal Level − Depicts the data distribution across different sites and multi-
database to local data mapping.
• Local database Conceptual Level − Depicts local data organization at each site.
• Local database Internal Level − Depicts physical data organization at each site.
In unstructured P2P networks, nodes connect randomly to other nodes, making it suitable
for applications like file-sharing (e.g., BitTorrent). Search queries may require flooding the
network, making it less efficient for structured searches.
2. Structured P2P:
Structured P2P networks impose a specific topology or structure, such as a Distributed Hash
Table (DHT). This enables efficient data retrieval and indexing, commonly used in distributed
databases or content delivery networks.
3. Hybrid P2P:
Hybrid P2P networks combine elements of both structured and unstructured models to
leverage the strengths of each. They aim to strike a balance between efficient search and
scalability.
Data Volume:
Large volumes of data can strain integration processes, impacting performance and scalability.
Data Variety:
Dealing with diverse data formats, such as structured, semi-structured, and unstructured data, can be
challenging.
Data Velocity:
Real-time or near-real-time data integration requirements can put pressure on systems and processes.
Data Governance:
Ensuring data security, compliance, and privacy is crucial.
Data Profiling:
Use data profiling tools to understand the structure and quality of data before integration.
Data Standardization:
Establish data standards and conventions to normalize data from different sources.
Change Management:
Implement effective change management practices to ensure that data integration processes adapt to
evolving business needs.
Query Transformation:
The query may undergo transformations to simplify or optimize it. This can involve rewriting subqueries,
using materialized views, or other techniques to enhance performance.
Query Rewrite:
Further query rewriting may be performed to optimize execution, such as converting subqueries into
joins or using available indexes.
Query Optimization:
This is the core phase where the DBMS generates various candidate execution plans. These plans
consider different access methods, join strategies, and operation orders. The goal is to estimate the cost
of each plan and choose the one with the lowest estimated cost.
Cost Estimation:
During query optimization, the DBMS estimates the cost of executing each candidate plan. The cost
includes factors like I/O cost, CPU cost, and memory usage. Accurate statistics about the data and
indexes are crucial for making these cost estimates.
Plan Execution:
The chosen execution plan is used to execute the query, which may involve accessing tables, performing
joins, and applying filter conditions.
Here Is Query
π(ENAME, SAL) ( σ((BUDGET ≥ 200000) ∨ (DUR > 24)) ( EMP ⨝ (ENO = ENO) ASG ⨝ (PNO = PNO) PROJ
⨝ (PNAME = PNAME) PAY ) )
Distributed transaction management is the process of ensuring the consistency, isolation, durability, and
atomicity (often referred to as the ACID properties) of transactions that involve multiple resources or
databases in a distributed system. In a distributed environment, transactions may involve operations on
different nodes or databases, and coordinating these operations to maintain data integrity is a complex
task.
concurrent transactions
Concurrency control is a fundamental concept in database management systems (DBMS) that deals with
managing the execution of multiple transactions simultaneously while maintaining data consistency and
integrity. In a multi-user environment, where multiple transactions can be executing concurrently,
concurrency control mechanisms are employed to prevent conflicts and ensure that the final state of the
database remains consistent.
Features a Transaction
Atomicity:
A transaction is atomic, which means it is treated as a single, indivisible unit of work. Either all the
operations within a transaction are completed successfully, or none of them are. If any part of a
transaction fails, the entire transaction is rolled back, ensuring data consistency.
Consistency:
Transactions must ensure that the database transitions from one consistent state to another. This means
that the database should adhere to predefined rules and constraints, maintaining data integrity.
Isolation:
Transactions should be isolated from each other to prevent interference or conflicts. Multiple
transactions can run concurrently, but they should not be aware of each other. Isolation mechanisms like
locking and concurrency control are used to manage this.
Concurrency Control:
In a system with concurrent transactions, concurrency control mechanisms are vital. These mechanisms
manage access to shared resources (e.g., data) to prevent conflicts and maintain data consistency.
Techniques like locking and timestamps help control concurrency.
Serializability:
Transactions should be serializable, meaning that their execution should be equivalent to some
sequential order of execution. This ensures that the final state of the database is consistent, regardless of
the order in which transactions are executed.
Document databases
A document database stores data in JSON, BSON, or XML documents. Documents in the database can be
nested. Particular elements can be indexed for faster querying. We can access, store, and retrieve
documents from your network in a form that is much closer to the data objects used in applications,
which means less translation is required to use and access the data in an application. SQL data must
often be assembled and disassembled when moving between applications, storage, or more than one
network.
Key-value stores
This is the simplest type of NoSQL database. Every element is stored as a key-value pair consisting of an
attribute name ("key") and a value. This database is like an RDBMS with two columns: the attribute name
(such as "state") and the value (such as "Alaska").
Column-oriented databases
While an RDBMS stores data in rows and reads it row by row, column-oriented databases are organized
as a set of columns. When you want to run analytics on a small number of columns in the network, you
can read those columns directly without consuming memory with unwanted data. Columns are of the
same type and benefit from more efficient compression, making reads even faster.
Features of NoSql
Flexible data models
NoSQL databases typically have very flexible schemas. A flexible schema allows you to easily make
changes to your database as requirements change. You can iterate quickly and continuously integrate
new application features to provide value to your users faster.
Horizontal scaling
Most SQL databases require you to scale-up vertically (migrate to a larger, more expensive server) when
you exceed the capacity requirements of your current server. Conversely, most NoSQL databases allow
you to scale-out horizontally, meaning you can add cheaper commodity servers whenever you need to.
Fast queries
Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically
normalized, so queries for a single object or entity require you to join data from multiple tables. As your
tables grow in size, the joins can become expensive. However, data in NoSQL databases is typically stored
in a way that is optimized for queries. The rule of thumb when you use MongoDB is data that is accessed
together should be stored together. Queries typically do not require joins, so the queries are very fast.
• Authentication
• Access rights
• Integrity constraints
Authentication
In a distributed database system, authentication is the process through which only legitimate users can
gain access to the data resources.
Integrity Control
Semantic integrity control defines and enforces the integrity constraints of the database system.
3. Distributed Locking:
Distributed Locking involves acquiring locks on data items to control access by multiple transactions.Lock
managers at each node coordinate lock requests and releases.
Tasks are performed with a more Tasks are performed with a less
3. speedy process. speedy process.
What are the main functions of DDBMS? Explain its network based
functions.
Data Transparency:
DDBMS provides data transparency to users and applications. This means that users can access and
manipulate distributed data as if it were a single, centralized database without needing to know the
physical location of the data.
Query Processing:
DDBMS supports distributed query processing, allowing users to execute queries that involve data from
multiple sites. The system optimizes query execution to minimize data transfer and processing overhead.
Transaction Management:
DDBMS ensures the consistency and integrity of data across distributed sites by supporting distributed
transactions. It manages distributed transactions through techniques like two-phase commit and ensures
that either all changes made by a transaction are applied at all sites or none at all (atomicity).
Concurrency Control:
DDBMS provides mechanisms for handling concurrent access to distributed data to avoid conflicts and
ensure data consistency. This includes techniques like locking and timestamp-based concurrency control.
Data Communication:
The DDBMS relies heavily on network communication to exchange data and information between
distributed sites. This includes sending queries, updates, and transaction control messages across the
network.
Scalability:
Network-based functions facilitate the addition of new sites or nodes to the distributed database. The
DDBMS can dynamically adapt to changes in the network topology and scale its resources as needed.
• Master-Master Replication
• Snapshot Replication
• Transactional
• Merge
• Peer to Peer
• Lazy Replication
Master-Slave Replication:
In this technique, one database server (the master) is designated as the primary source of data, and one
or more other servers (slaves) replicate data from the master. The master handles write operations,
while the slaves handle read operations. This can improve read performance and provide fault tolerance.
However, updates to the master need to be replicated to the slaves to maintain consistency.
Master-Master Replication:
In this approach, multiple database servers act as both masters and slaves. Each server can handle both
read and write operations, which can improve both read and write performance. However, it introduces
complexities in managing data conflicts and maintaining consistency between multiple masters.
Snapshot Replication:
This technique involves periodically taking snapshots of the entire database or specific portions of it and
replicating those snapshots to other servers. Snapshots capture a point-in-time view of the data, and
replication occurs by copying the snapshots to other nodes. This is useful for data warehousing and
reporting purposes.
Transactional Replication:
In this method, changes made to the data (transactions) on one server are replicated in real-time to
other servers. This ensures that the data on all replicas is consistent and up-to-date. It's commonly used
in scenarios where data consistency is critical.
Merge Replication:
Merge replication is used in scenarios where multiple nodes can make updates to the data. Changes are
tracked, and at predefined intervals, these changes are merged across nodes to ensure that each node
has the most recent data.
Lazy Replication:
Also known as asynchronous replication, this technique allows for a certain delay between the time a
change is made on the master node and when it's replicated to the slave nodes. This can improve
performance by reducing the immediate overhead of replication.
What are the integrity rules? Give its types. Explain with examples.
Integrity rules are needed to inform the DBMS about certain constraints in the real world. Specific
integrity rules apply to one specific database.
Referential Integrity
This defines all procedures and rules enforced to provide that data is stored and used consistently. This is
the notion of foreign keys.
Domain Integrity
Domain integrity is a sequence of rules and procedures that provide all data items pertain to the correct
domains. For instance, if a user types a birth date in a street address area, the system will display an
error message that will avoid the user from filling that field with wrong information.
Physical integrity
Physical integrity define the safeguarding of data's completeness and precision during storage and
retrieval. Physical integrity is at risk when natural disasters appears, electricity goes out, or hackers
interrupt database functions.
1. Hierarchical databases.
2. Network databases.
3. Relational databases.
4. Object-oriented databases.
1. Hierarchical databases
It is one of the oldest database model developed by IBM for information Management System. In a
hierarchical database model, the data is organized into a tree-like structure. In simple language we can
say that it is a set of organized data in tree structure. This type of Database model is rarely used
nowadays. Its structure is like a tree with nodes representing records and branches representing fields.
3. Relational Database
A relational database is developed by E. F. Codd in 1970. The various software systems used to maintain
relational databases are known as a relational database management system (RDBMS). In this model,
data is organised in rows and column structure i.e., two-dimensional tables and the relationship is
maintained by storing a common field. It consists of three major components.
4. Object-oriented databases
An object database is a system in which information is represented in the form of objects as used in
object-oriented programming. Object oriented databases are different from relational databases which
Conceptual Design:
Create a high-level conceptual model of the database, often represented using Entity-Relationship
Diagrams (ERDs). This step focuses on defining entities, their attributes, and the relationships between
them.
Normalization:
Normalize the conceptual model to reduce data redundancy and improve data integrity. This involves
breaking down entities and attributes into more granular tables to minimize data duplication.
Schema Design:
Design the schema for the database, specifying the tables, their attributes, data types, and relationships.
This can be represented using a schema diagram.
Physical Design:
Determine the physical storage structure of the database, including indexing strategies, file organization,
and access paths.
Implementation:
Create the database schema in the chosen database management system (DBMS). This step involves
writing SQL scripts or using a graphical interface to create tables, indexes, and other database objects.
Data Loading:
Populate the database with initial data. This may involve data migration from existing sources or manual
data entry.
Application Development:
Develop applications or software systems that will interact with the database. This includes writing code
for data insertion, retrieval, and manipulation.
Testing:
Thoroughly test the database and the applications that use it to ensure they meet the specified
requirements. This involves unit testing, integration testing, and performance testing.
Training:
Provide training to users and administrators who will interact with the database and its applications.
Deployment:
Deploy the database and associated applications in the production environment.
Optimization:
Identify and address performance bottlenecks, optimize queries, and refine the database design as
necessary to improve efficiency.
Scaling:
If the database grows or experiences increased usage, scale it by adding hardware resources or using
database scaling techniques.
End-user Feedback:
Gather feedback from end-users and stakeholders to make improvements and adjustments to the
database and its applications.
Distribution transparency helps the user to recognize the database as a single thing or a logical entity,
and if a DDBMS displays distribution data transparency, then the user does not need to know that the
data is fragmented.
Fragmentation transparency
In this type of transparency, the user doesn’t have to know about fragmented data and, due to which, it
leads to the reason why database accesses are based on the global schema. This is almost somewhat like
users of SQL views, where the user might not know that they’re employing a view of a table rather than
the table itself.
Location transparency
If this type of transparency is provided by DDBMS, then it is necessary for the user to know how the data
has been fragmented, but knowing the location of the data is not necessary.
Replication transparency-
In replication transparency, the user does not know about the copying of fragments. Replication
transparency is related to concurrency transparency and failure transparency. Whenever a user modifies
a data item, the update is reflected altogether in the copies of the table. However, this operation
shouldn’t be known to the user.
a) Global Query:
In a Distributed Database Management System (DDBMS), a global query refers to a database query that
is executed across multiple distributed databases as if they were a single, unified database. Unlike a
query in a centralized database management system (DBMS), where all data is stored in one location, a
global query in a DDBMS involves retrieving or manipulating data that is distributed across multiple
nodes or locations.
Global queries in DDBMS often require a mechanism to coordinate and distribute the query to the
relevant database nodes, retrieve results, and consolidate them into a coherent response for the user or
application. This coordination is necessary because each node in the distributed database may have its
own schema and data, and the DDBMS must handle the complexities of routing and aggregating the
query results.
b) Global Dictionary
In DDBMS, a global dictionary, sometimes referred to as a global schema or directory, serves as a central
repository of metadata and information about the structure and location of data in the distributed
database. It provides a standardized way to reference and access data distributed across different
database nodes. Key functions of a global dictionary in DDBMS include:
Schema Mapping:
It maintains mappings between the global schema (the way data is logically organized across the
distributed database) and the local schemas of individual database nodes.
Location Transparency:
It keeps track of the physical locations of data within the distributed system, allowing the DDBMS to
route queries to the appropriate nodes.
Data Description:
It stores metadata about tables, attributes, relationships, and other schema-related information, helping
users and applications understand the structure of the distributed database.
Query Optimization:
The global dictionary can be used by the DDBMS to optimize query execution by determining the most
efficient way to access and retrieve data across the distributed nodes.
Already Explained