Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

LECTURE 1

a. Define a distributed database management system (DDBMS) and explain how it differs from a
traditional centralized database management system.

A Distributed Database Management System (DDBMS) is software that manages a


distributed database. This essentially means the data isn't stored in one central
location, but rather scattered across different computers or servers connected by a
network. Despite this physical separation, the DDBMS cleverly presents a unified view
of the data to users, making it appear like everything is in one place. Users can then
access and manipulate this data seamlessly.
Now, how does this differ from a traditional, centralized DBMS? Let's break it down:
• Data Location: Centralized systems keep everything in one spot, while distributed
systems spread data across various locations.
• Access: Centralized systems require users to access data from a single point.
Distributed systems offer access from various locations on the network.
• Scalability: Scaling up a centralized system can be tricky. Distributed systems,
however, are highly scalable – you can simply add more machines to the network as
your data needs grow.
• Availability: A single point of failure can bring down a centralized system. Distributed
systems are more available because a failure at one site doesn't affect others.
• Complexity: Centralized systems are generally simpler to manage. Distributed systems
involve managing data distribution and synchronization, making them more complex.
In short, distributed databases offer benefits like scalability, improved availability, and
potentially better performance for geographically spread users. However, they come
with the trade-off of increased complexity and cost.

b. Discuss the benefits and challenges associated with implementing a distributed DBMS in
organizations.

Organizations contemplating a switch to a distributed database management system


(DDBMS) should carefully evaluate both the potential benefits and the inherent
challenges. Here's a breakdown of what to consider:
Benefits:
• Scalability: Distributed databases shine in their ability to grow seamlessly. As data
volumes increase, adding more servers to the network is a breeze. This makes them
perfect for organizations anticipating massive data growth or already managing large
datasets.
• Availability: Say goodbye to single points of failure! By replicating data across multiple
locations, distributed systems ensure continued operation even if one server encounters
an issue. Users can still access and manage data even during outages at a specific site.
• Performance Boost: Distributed databases can significantly improve user experience,
especially for geographically dispersed teams. Data can be accessed from the closest
server, minimizing latency and wait times.
• Location Transparency: Distributed systems eliminate the need for users to be
physically close to the data. Users can access and manage data from any authorized
location within the network, fostering collaboration and remote work.
• Data Organization Flexibility: Unlike centralized systems, distributed databases allow
for data organization based on specific needs. For example, customer data for a
particular region can be stored on a nearby server for faster access and improved local
regulations compliance.
• Potential Cost Savings: While the initial setup might seem expensive, distributed
databases can be cost-effective in the long run. You avoid the limitations of a single
server that might require frequent upgrades to handle growing data demands.
Challenges:
• Complexity Overload: Managing a distributed system is a whole new ball game
compared to a centralized one. The DDBMS needs to constantly track data location and
ensure all copies across servers are synchronized. This requires a skilled IT team with
expertise in managing complex systems and robust management procedures.
• Cost Considerations: Be prepared for potentially higher initial costs. The hardware,
software, and skilled personnel needed for a distributed system can be more expensive
than their centralized counterparts. Ongoing maintenance can also add to the overall
cost.
• Data Consistency Tango: Maintaining consistency across multiple data copies can be
a juggling act. The DDBMS needs to ensure all copies reflect any updates made,
introducing complexities with concurrency control mechanisms that prevent conflicting
edits.
• Security Tightrope Walk: Distributing data across locations opens up new security
concerns. Organizations need to implement robust security measures like encryption
and access controls to protect sensitive data across the network.
• Network Dependence: A distributed system is only as good as its network. Network
outages or slowdowns can significantly impact system performance and user
experience. Organizations need to invest in a reliable and high-performance network
infrastructure.
The Takeaway
Distributed DBMS can be a game-changer for organizations with large, geographically
distributed data needs. However, the decision to implement one requires careful
consideration of the associated complexities and costs. Organizations should weigh the
benefits against their specific needs and technical capabilities before making the switch.
It's crucial to ensure they have the resources and expertise to manage the additional
complexities introduced by a distributed database system.

c. Identify and discuss the key design issues and challenges involved in building a distributed
database management system.

Building a Distributed Database Management System (DDBMS) comes with a unique


set of design issues and challenges compared to its centralized counterpart. Here's a
breakdown of some key areas to consider:
Data Fragmentation and Replication:
• Fragmentation: Dividing the database into smaller, more manageable units (fragments)
spread across different servers.
• Challenge: Determining the optimal fragmentation strategy to ensure efficient data
access and minimize redundant data movement across the network.
• Replication: Copying specific data fragments across multiple servers for improved
availability and performance.
• Challenge: Managing the trade-off between consistency and availability. Fully
replicated data ensures consistency but comes at a storage cost. Partially replicated
data offers better performance but can lead to inconsistencies if updates aren't
synchronized effectively.
Data Consistency:
• Challenge: Ensuring all copies of a data fragment remain consistent across all servers,
especially when concurrent updates occur.
• Solutions: Implementing concurrency control mechanisms like locking or timestamps to
prevent conflicting modifications. Choosing between strong consistency (all nodes
reflect updates immediately) or eventual consistency (updates eventually propagate
across all nodes) based on application needs.
Transaction Management:
• Challenge: Ensuring ACID (Atomicity, Consistency, Isolation, Durability) properties of
transactions even when data is distributed across multiple servers.
• Solutions: Distributed transaction protocols like two-phase commit (2PC) or Paxos
algorithm ensure coordinated updates across all involved servers.
Concurrency Control:
• Challenge: Preventing conflicts when multiple users attempt to access and modify the
same data fragment concurrently on different servers.
• Solutions: Implementing locking mechanisms (pessimistic or optimistic) to ensure
exclusive access during updates and prevent inconsistencies.
Query Processing and Optimization:
• Challenge: Efficiently processing queries that might involve data fragments stored
across multiple servers.
• Solutions: Developing query optimization techniques that minimize data movement
across the network and leverage the distributed nature for parallel processing where
possible.
Security:
• Challenge: Protecting sensitive data spread across multiple servers and ensuring
secure communication channels across the network.
• Solutions: Implementing robust security measures like encryption, access controls, and
intrusion detection systems to safeguard data integrity and confidentiality.
Failure Management:
• Challenge: Ensuring the system remains operational and data remains available even if
a server or network link fails.
• Solutions: Implementing fault tolerance mechanisms like data replication, failover
techniques, and data recovery procedures to minimize downtime and data loss.
These challenges highlight the need for careful design considerations when
building a DDBMS. The chosen solutions will depend on factors like the type of data,
access patterns, consistency requirements, and performance needs of the specific
application.

Using books and the Internet resources, explain how design issues such as data
distribution, concurrency control, data replication, query optimization, and security are
addressed in distributed DBMS architectures.
Here's a deeper dive into how distributed DBMS architectures address key design
issues:
1. Data Distribution and Fragmentation:
• Concepts:
o Data Distribution: Spreading the database across multiple servers in a network.
o Fragmentation: Dividing the database into logical units (fragments) for efficient storage
and retrieval.
• Addressing the Challenge:
o Horizontal Fragmentation: Divides tables based on rows, often by a specific range of
values in a key column. This is useful for queries that involve filtering based on that key.
(Reference: https://www.youtube.com/watch?v=nuR0p-ZF3tU)
o Vertical Fragmentation: Divides tables based on columns, separating them into
related subsets based on access patterns. This reduces data movement for specific
queries that only require a subset of columns.
(https://docs.oracle.com/en/industries/communications/enterprise-communications-
broker/3.3.0/userguide/ip-fragment-packet-flow-ecb.html)
o Hashing: Assigns data fragments to servers based on a hash function applied to a key
attribute. This ensures even distribution and efficient retrieval for queries involving that
key. (https://docs.oracle.com/en/database/oracle/oracle-
database/19/sqlrf/ORA_HASH.html)
o Replication: Creating copies of frequently accessed data fragments on multiple servers
for improved availability and performance.
2. Concurrency Control:
• Challenge: Preventing conflicts when multiple users attempt to modify the same data
fragment concurrently on different servers.
• Solutions:
o Locking Mechanisms:
▪ Pessimistic Locking: Acquires a lock on a data fragment before any modification,
preventing other users from accessing it until the lock is released. (Reference:
https://docs.oracle.com/middleware/1212/toplink/TLJPA/q_pessimistic_lock.htm)
▪ Optimistic Locking: Allows concurrent access but validates modifications before
committing. Conflicts are detected and resolved if necessary.
(https://docs.oracle.com/cd/B14099_19/web.1012/b15901/dataaccs008.htm)
o Timestamp Ordering: Assigns timestamps to transactions and ensures they are
executed in that order across all servers, preventing conflicts.
(https://web.mit.edu/6.1800/www/recitations/r18.pdf)
3. Data Replication:
• Challenge: Balancing consistency and availability when replicating data fragments
across servers.
• Solutions:
o Synchronous Replication: Updates are committed on all replicas only after successful
completion on the primary server. Ensures strong consistency but can impact
performance. (https://docs.oracle.com/cd/E19359-01/819-6148-10/chap2.html)
o Asynchronous Replication: Updates are propagated to replicas eventually. Offers
high availability but can lead to temporary inconsistencies.
(https://docs.oracle.com/cd/E19359-01/819-6148-10/chap2.html)
o Hybrid Approaches: Combine synchronous and asynchronous replication for specific
data fragments based on their consistency requirements.
4. Query Optimization:
• Challenge: Optimizing query execution when data fragments are spread across
multiple servers.
• Solutions:
o Query Decomposition: Breaking down complex queries into smaller subqueries that
can be executed on individual servers holding relevant data fragments. Results are then
combined to form the final answer. (https://www.geeksforgeeks.org/query-processing-in-
distributed-dbms/)
o Cost-based Optimization: The DDBMS estimates the cost (network traffic, processing
time) of executing a query on different server combinations. It then chooses the most
efficient execution plan. (https://cs186berkeley.net/notes/note10/)
o Parallel Processing: Distributing subqueries or operations involved in a complex query
across multiple servers for faster execution and improved scalability.
(https://techcommunity.microsoft.com/t5/azure-database-support-blog/lesson-learned-
487-identifying-parallel-and-high-volume-queries/ba-p/4141248)
5. Security:
• Challenge: Protecting sensitive data spread across multiple servers and ensuring
secure communication across the network.
• Solutions:
o Data Encryption: Encrypting data at rest (stored on servers) and in transit (transmitted
across the network) to prevent unauthorized access.
(https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=908084)
o Access Control: Implementing user authentication and authorization mechanisms to
restrict access to sensitive data based on user roles and permissions.
(https://www.microsoft.com/en-us/microsoft-365/access)
o Secure Communication Protocols: Using secure protocols like HTTPS for
communication between the DDBMS and client applications, ensuring data
confidentiality and integrity. (https://en.wikipedia.org/wiki/HTTPS)
o Intrusion Detection Systems (IDS): Monitoring network traffic and server activity for
suspicious behavior to identify and prevent potential security breaches.
(https://en.wikipedia.org/wiki/Intrusion_detection_system)
These are just some of the key design considerations and solutions employed in
distributed database architectures. The specific techniques used will vary depending on
the chosen DBMS software and the unique requirements of the application.
For further in-

Describe the architecture of a distributed database management system, including its


components, layers, and functionalities. Compare it with the multidatabase systems
integration
Distributed Database Management System (DDBMS) Architecture

A Distributed Database Management System (DDBMS) manages a database that's


physically scattered across multiple servers in a network. Here's a breakdown of its
architecture:
Components:
• Client Applications: These are applications that users interact with to access and
manipulate data in the distributed database.
• Database Servers: These are individual computers or servers that store specific
portions (fragments) of the distributed database. They run software specifically
designed to manage and access their assigned data fragments.
• Distributed Database Manager (DDBM): This software layer sits on top of the
database servers and coordinates communication and data access across the entire
distributed system. It performs functions like:
o Query Processing: Breaks down user queries and distributes them to the relevant
database servers for execution.
o Data Fragmentation and Replication Management: Determines how data is
distributed and replicated across servers.
o Concurrency Control: Ensures data consistency by preventing conflicts when multiple
users attempt to modify the same data concurrently.
o Transaction Management: Manages the execution of transactions (series of database
operations) across multiple servers, ensuring ACID properties (Atomicity, Consistency,
Isolation, Durability).
Layers:
• User Interface Layer: Provides users with a way to interact with the distributed
database through client applications.
• Application Logic Layer: Handles the business logic of the application and interacts
with the DDBMS to access and manipulate data.
• Data Access Layer: Translates user requests or application logic actions into queries
that the DDBMS can understand and execute.
• Distributed Database Manager Layer: The core layer responsible for coordinating
communication and data access across all database servers.
• Database Server Layer: The layer where the actual data storage and retrieval occur on
individual servers.
Functionalities:
• Data Distribution and Management: The DDBMS fragments and distributes data
across multiple servers based on chosen strategies. It manages data movement,
replication, and access across the network.
• Query Processing: The DDBMS breaks down user queries and distributes them to
relevant servers for execution. It then combines the results and presents them to the
user as if they came from a single location.
• Concurrency Control: The DDBMS ensures that multiple users accessing and
modifying the same data concurrently do not cause inconsistencies. It uses locking
mechanisms or other techniques to prevent conflicts.
• Transaction Management: The DDBMS manages the execution of transactions across
multiple servers, ensuring that all operations within a transaction are completed
successfully or rolled back if any part fails (ACID properties).
• Security: The DDBMS implements security measures like access control, data
encryption, and secure communication protocols to safeguard sensitive data across the
distributed system.

Multidatabase Systems Integration vs. Distributed Database Systems

While both deal with accessing data from multiple sources, there's a key distinction:
• Distributed Database Systems (DDBMS): Here, data is logically interrelated and
physically stored across multiple servers under a single, unified management system
(the DDBMS). Users see and interact with the data as if it were all in one place.
• Multidatabase Systems Integration: This involves integrating data from multiple,
independent databases that may use different schemas and management systems.
Users often need to interact with each database system separately or through a
complex middleware layer that translates queries and manages data access across the
different systems.
Here's a table summarizing the key differences:

Feature Distributed Database (DDBMS) Multidatabase Systems Integration

Physically distributed across Independent databases, possibly


Data Storage
multiple servers different locations

Data Unified management by


Independent management systems
Management DDBMS

User View Unified view of all data Separate views for each database
More complex to set up and Less complex, but requires
Complexity
manage middleware

Scalability Highly scalable Limited scalability

Requires additional effort to


Consistency Ensured by DDBMS
maintain

In essence, DDBMS offers a more tightly integrated and cohesive data management
experience, while multidatabase system integration provides a way to connect and
access data from disparate sources.

Discuss the role of distributed data storage, query processing, concurrency control
mechanisms, transaction management, and data replication in distributed DBMS
architectures
In a distributed database management system (DDBMS), where data is scattered
across multiple servers, these key functionalities work together to ensure efficient and
reliable data management:
1. Distributed Data Storage:
• Function: Breaks down the logical database into smaller chunks (fragments) and
distributes them strategically across various servers in the network. This allows for
efficient storage, retrieval, and scalability.
• Benefits: Improves performance by enabling parallel processing of queries and
reducing network traffic. Allows for easier horizontal scaling by adding more servers to
handle growing data volumes.
2. Query Processing:
• Function: Takes user queries and intelligently breaks them down into subqueries that
can be executed on the relevant servers holding the required data fragments. The
DDBMS then combines the results and presents them to the user as if they came from a
single location.
• Importance: Ensures efficient execution of queries, even when data is spread across
multiple servers. Optimizes the query execution plan to minimize network traffic and
processing time.
3. Concurrency Control Mechanisms:
• Function: Prevents conflicts that could arise when multiple users attempt to modify the
same data fragment concurrently on different servers. This ensures data consistency
and prevents corrupted results.
• Methods: Techniques like locking (pessimistic or optimistic) or timestamp ordering are
used to serialize access to data fragments and prevent conflicts during updates.
4. Transaction Management:
• Function: Coordinates and manages the execution of complex operations
(transactions) that involve multiple data fragments potentially located on different
servers. Ensures the ACID properties (Atomicity, Consistency, Isolation, Durability) of
transactions.
• Importance: Guarantees that either all operations within a transaction are completed
successfully (committed) or none are (rolled back). Prevents inconsistencies and
ensures data integrity across the distributed system.
5. Data Replication:
• Function: Creates copies of frequently accessed or critical data fragments on multiple
servers. This enhances availability by ensuring data remains accessible even if a server
fails. Can also improve performance by allowing users to access data from the closest
replica.
• Trade-off: Increases storage requirements and introduces complexity in maintaining
consistency across replicas. The DDBMS needs to ensure all replicas reflect updates
made to the original data fragment.
These functionalities work in concert within the DDBMS architecture:
1. Data is distributed strategically using distributed data storage.
2. Queries are processed by breaking them down and executing them on relevant servers.
3. Concurrency control mechanisms prevent conflicts during concurrent access.
4. Transaction management ensures the integrity of complex operations.
5. Data replication improves availability and potentially performance.
By working together, they ensure that a distributed database can function efficiently,
reliably, and offer a unified view of data to users despite its physical separation across
multiple servers.

i. Using books and the Internet resources, expound on the following characteristics
of the distributed database systems. NB: characteristics define the inherent
qualities or attributes of distributed database systems
data distribution, autonomy, heterogeneity, transparency, concurrency control, data
replication, scalability, fault tolerance, consistency and consensus, and security.

Sure, based on the information you've provided and my understanding of distributed


database management systems (DDBMS), here's a detailed explanation of the
characteristics of distributed database systems:
Data Distribution
• Concept: In a DDBMS, the database is not stored in one central location but rather
divided and scattered across multiple servers in a network. These divisions are called
fragments or partitions.
• Benefits:
o Scalability: As data grows, you can easily add more servers to the network to
accommodate the increased storage needs.
o Performance: Distributing data allows for parallel processing of queries, potentially
improving response times for geographically dispersed users who can access data from
the closest server.
o Availability: If one server fails, other servers can still handle requests, improving
overall system availability.
Autonomy
• Concept: Individual servers in a distributed database system can have a certain degree
of autonomy. This means they may have their own local management systems and may
be able to operate independently to some extent, even if the central DDBMS is
unavailable for a short period.
• Benefits:
o Flexibility: Local autonomy allows for individual server management based on specific
requirements.
o Fault Tolerance: Even if the DDBMS fails, some level of data access might still be
possible on individual servers.
Heterogeneity
• Concept: A distributed database system can potentially accommodate data stored in
different formats or managed by different database management systems (DBMS) on
different servers.
• Challenges: Managing a heterogeneous system can be complex due to the need to
ensure compatibility and data integrity across diverse platforms. Additional software or
tools might be required to bridge these differences.
Transparency
• Concept: A DDBMS strives to provide users with a unified view of the data, regardless
of its physical location across different servers. Users should be able to interact with the
data as if it were all stored in a single central location.
• Benefits: Transparency simplifies application development and data access for users.
They don't need to be aware of the underlying data distribution details.
Concurrency Control
• Concept: In a multi-user environment, concurrency control mechanisms prevent
conflicts that can arise when multiple users try to modify the same data fragment
concurrently on different servers. These mechanisms ensure data consistency.
• Common Techniques:
o Locking: Acquiring a temporary lock on a data fragment before modifying it prevents
other users from making changes until the lock is released. There are two main types of
locking:
▪ Pessimistic Locking: Locks are acquired before any modifications, potentially reducing
concurrency.
▪ Optimistic Locking: Allows concurrent access but checks for conflicts before
committing changes.
o Timestamp Ordering: Assigns timestamps to transactions and ensures they are
executed in that order across all servers.
Data Replication
• Concept: Creating copies of frequently accessed data fragments or critical data on
multiple servers. This improves availability by ensuring continued data access even if
the server storing the original data fragment fails. Replication can also enhance
performance by allowing users to access data from the closest replica.
• Trade-offs:
o Increased Storage Requirements: Maintaining replicas requires additional storage
space.
o Maintaining Consistency: The DDBMS needs to ensure all replicas are updated
consistently to reflect changes made to the original data fragment.
Scalability
• Concept: The ability of a distributed database system to grow and adapt to increasing
data volumes and user demands. Horizontal scalability is a key advantage, allowing you
to add more servers to the network as needed.
Fault Tolerance
• Concept: The ability of a distributed database system to withstand failures of individual
servers or network components and continue operating with minimal data loss or
downtime. Techniques like data replication and failover mechanisms contribute to fault
tolerance.
Consistency and Consensus
• Concept: Maintaining data consistency across all replicas in a distributed system is
crucial. This ensures that all users see the same consistent data, regardless of which
server they access. Consensus algorithms are used to ensure all participants in the
distributed system agree on the current state of the data, especially when replicating
data or coordinating updates.
Security
• Concept: Distributed database systems require robust security measures to protect
sensitive data spread across multiple servers and secure communication channels
across the network. This includes user authentication, authorization, data encryption,
and intrusion detection systems.
These characteristics all play a vital role in how a distributed database system functions.
By carefully considering these factors during design and implementation, you can create
a robust, scalable, and secure system that meets the specific needs of your
organization.

LECTURE 2
a. Define vertical fragmentation and explain its significance in
distributed database management.
Vertical Fragmentation in Distributed Databases

Vertical fragmentation, also known as attribute partitioning, is a technique used in


distributed database management systems (DDBMS) to split a table (relation) vertically
by its columns (attributes). This means instead of storing all the columns of a table on a
single server, you distribute them across multiple servers based on specific criteria.
Here's a breakdown of vertical fragmentation and its significance in DDBMS:
• Concept: Imagine a table storing customer information, including customer ID, name,
address, phone number, and order history. With vertical fragmentation, you could:
o Create one fragment storing customer ID, name, and address on one server.
o Create another fragment storing phone number and order history on a different server.
• Benefits:
o Improved Performance: By separating frequently accessed columns from less
frequently accessed ones, you can optimize query processing. Users accessing
customer names and addresses won't need to retrieve the entire table, including order
history, which might be less frequently accessed.
o Enhanced Security: Sensitive data, like order history, can be stored on separate
servers with stricter access controls, improving overall data security.
o Reduced Data Movement: When queries only involve specific columns, only the
relevant fragment needs to be accessed, reducing network traffic and improving
performance.
• Significance in DDBMS:
o Vertical fragmentation allows for a more flexible and efficient data distribution strategy in
a DDBMS. It complements horizontal fragmentation (splitting tables by rows) to further
optimize data access and storage based on specific access patterns.
o By strategically distributing data vertically, DDBMS can improve query performance,
enhance data security, and optimize network traffic across the distributed system.
Here are some additional points to consider:
• Vertical fragmentation can introduce complexity in managing joins between fragments
during query processing. The DDBMS needs to combine data from multiple fragments
to fulfill user requests.
• The decision to vertically fragment a table depends on the specific access patterns and
data security requirements of your application. Careful analysis is needed to determine
the optimal fragmentation strategy.
In conclusion, vertical fragmentation is a valuable technique for optimizing data
distribution and access in distributed database systems. It offers improved performance,
enhanced security, and reduced data movement, contributing to the overall efficiency
and scalability of your DDBMS.
b. Discuss the advantages and challenges associated with vertical
fragmentation compared to other fragmentation techniques,
such as horizontal fragmentation.

Vertical Fragmentation vs. Horizontal Fragmentation: Advantages and Challenges

Both vertical and horizontal fragmentation are techniques used in distributed database
management systems (DDBMS) to optimize data storage and access patterns. Let's
explore their advantages and challenges to understand when each might be a better
choice.
Vertical Fragmentation
Advantages:
• Improved Performance: By separating frequently accessed columns from less
frequently used ones, queries that only involve specific columns can be executed faster,
reducing network traffic and processing time.
• Enhanced Security: Sensitive data can be stored on separate servers with stricter
access controls, improving overall data security. Users who don't need access to
sensitive information (e.g., order history) won't have access to the fragment containing
it.
• Reduced Data Redundancy: If certain columns are not frequently used together,
vertical fragmentation can eliminate redundancy and reduce storage requirements.
Challenges:
• Increased Query Complexity: Joins across multiple fragments might be required to
fulfill user queries that involve columns from different fragments. This can add
complexity to query processing and potentially impact performance.
• Data Integrity Maintenance: Ensuring data consistency across multiple fragments
requires additional effort during updates or deletes. The DDBMS needs to coordinate
updates across relevant fragments to maintain referential integrity.
• Limited Scalability: Vertical fragmentation might not be as scalable as horizontal
fragmentation for very large tables. Adding new columns might require further
fragmentation or data redistribution across existing fragments.
Horizontal Fragmentation
Advantages:
• Improved Scalability: Horizontal fragmentation allows for easier scaling by adding
more servers to distribute data rows. This is particularly beneficial for very large tables
that would become cumbersome to manage on a single server.
• Reduced Network Traffic: Queries involving specific ranges of data (e.g., customer
records for a particular region) only need to access the relevant fragment, minimizing
data movement across the network.
• Simplified Query Processing: Since relevant data is already grouped together in a
fragment, queries often don't require complex joins, simplifying query processing.
Challenges:
• Data Redundancy: Certain columns might be duplicated across multiple fragments to
maintain referential integrity and avoid complex joins. This can increase storage
requirements.
• Potential Performance Impact: If a query involves data from multiple horizontal
fragments, it might require accessing and joining data from several servers, potentially
impacting performance.
• Limited Security Benefits: Horizontal fragmentation by itself doesn't offer significant
security advantages compared to vertical fragmentation.
Choosing the Right Technique:
The optimal fragmentation technique depends on several factors:
• Access Patterns: Analyze how users typically access data. Vertical fragmentation is
better suited when queries frequently involve specific subsets of columns.
• Data Security Requirements: If certain data needs stricter access controls, vertical
fragmentation allows for separate storage with appropriate security measures.
• Data Volume and Growth: Horizontal fragmentation is more scalable for very large
tables with anticipated growth.
In conclusion:
Vertical fragmentation is ideal for optimizing performance, enhancing data security, and
reducing data movement for queries involving specific column subsets. However, it can
introduce complexity with joins and might not scale well for very large tables.
Horizontal fragmentation excels in scalability, reducing network traffic for specific data
ranges, and simplifying queries. However, it can lead to data redundancy and doesn't
offer inherent security advantages.
For optimal results, consider a hybrid approach combining both vertical and horizontal
fragmentation based on your specific data access patterns, security needs, and
scalability requirements.

c. Explore real-world scenarios or application domains where


vertical fragmentation can provide significant benefits in terms
of data management, query optimization, and scalability.

Here are some real-world scenarios and application domains where vertical
fragmentation can offer significant benefits:
1. E-commerce Platform:
• Scenario: A large online retail store has a "Customer" table containing details like
name, address, email, phone number, order history, and product reviews.
• Benefits of Vertical Fragmentation:
o Data Management and Security:
▪ Fragment the table into two:
▪ Customer Details Fragment: Stores name, address, email, and phone number.
▪ Order and Review Fragment: Stores order history and product reviews.
▪ This allows for stricter access controls on the Order and Review fragment, potentially
containing sensitive purchase information.
o Query Optimization:
▪ Most queries will likely involve customer details (name, address) for order processing or
marketing campaigns.
▪ Vertical fragmentation allows faster retrieval of frequently accessed customer details
without needing to access the entire table, including order history, which might be less
frequently accessed.
o Scalability:
▪ Vertical fragmentation might not directly improve horizontal scalability for the Customer
table itself. However, it can potentially improve scalability for the Order and Review
fragment, which could grow significantly with increasing order volume. This fragment
can be further horizontally fragmented (e.g., by year) as needed.
2. Healthcare Management System:
• Scenario: A hospital information system stores patient data, including demographics,
medical history, allergies, medications, treatment plans, and billing information.
• Benefits of Vertical Fragmentation:
o Data Management and Security:
▪ Fragment the table into three:
▪ Patient Demographics Fragment: Stores basic information like name, address, and
date of birth.
▪ Medical History Fragment: Stores allergies, medical history, and treatment plans.
▪ Billing Fragment: Stores billing information and insurance details.
▪ This allows for stricter access controls on the Medical History and Billing fragments
containing sensitive patient information. Only authorized personnel (e.g., doctors) would
have access to the complete medical history.
o Query Optimization:
▪ Different user groups (doctors, nurses, billing department) typically access different
subsets of data.
▪ Vertical fragmentation allows for faster retrieval of relevant data fragments. Doctors
primarily need medical history, nurses might need demographics and current treatment
plans, and billing needs access to the Billing fragment.
o Scalability:
▪ While vertical fragmentation might not directly improve horizontal scalability for the
overall patient data, it can be beneficial for specific fragments. For example, the Medical
History fragment could be further horizontally fragmented by department (cardiology,
oncology) to improve scalability as medical records grow.
3. Educational Institution Management System:
• Scenario: A university management system stores student data, including personal
information, academic records (grades, courses taken), financial aid details, and
disciplinary records.
• Benefits of Vertical Fragmentation:
o Data Management and Security:
▪ Fragment the table into three:
▪ Student Information Fragment: Stores name, address, and contact details.
▪ Academic Records Fragment: Stores grades, courses taken, and transcripts.
▪ Financial Aid and Disciplinary Fragment: Stores financial aid information and
disciplinary records.
▪ This allows for stricter access controls on sensitive data like financial aid and
disciplinary records. Only authorized personnel (financial aid department, disciplinary
committee) would have access to the relevant fragment.
o Query Optimization:
▪ Different user groups (faculty, advisors, financial aid) typically access different data
subsets.
▪ Vertical fragmentation allows for faster retrieval of relevant data. Faculty need academic
records for grading, advisors need student information and academic records, and
financial aid needs access to the dedicated fragment.
o Scalability:
▪ Similar to other examples, vertical fragmentation can improve scalability for specific
fragments. The Academic Records fragment could be further horizontally fragmented by
semester or year to handle increasing student enrollment and course data.
These are just a few examples. Vertical fragmentation can be beneficial in any scenario
where data access patterns involve specific subsets of columns frequently, security
considerations require separation of sensitive data, and potential scalability
improvements are desired for specific data segments within a larger table.
a. Select a specific scenario or application domain (e.g., e-
commerce, healthcare, finance) for which vertical fragmentation
is suitable.
Scenario: Online Food Ordering Platform

An online food ordering platform like DoorDash or Grubhub manages a large amount of
data related to restaurants, menus, customer orders, and deliveries. Here's how vertical
fragmentation can be beneficial:
Tables and Data:
• Restaurants: Restaurant ID, Name, Cuisine, Location, Average Rating, Reviews
• MenuItems: Menu Item ID, Restaurant ID (foreign key), Name, Description, Price,
Category (Appetizer, Main Course, Dessert)
• Customer Orders: Order ID, Customer ID (foreign key), Restaurant ID (foreign key),
Order Date/Time, Items Ordered (list of menu item IDs), Delivery Address
• Delivery: Delivery ID, Order ID (foreign key), Driver ID (foreign key), Delivery Time,
Status (Pending, In Progress, Delivered)
Benefits of Vertical Fragmentation:
1. Improved Performance and Scalability:
o Restaurant Fragment: Stores Restaurant ID, Name, Cuisine, Location, and Average
Rating.
▪ This fragment is frequently accessed for browsing restaurants and doesn't change
frequently. Vertical fragmentation allows for faster retrieval and potential horizontal
fragmentation by location for scalability as the platform expands to new regions.
o Menu Fragment: Stores Menu Item ID, Restaurant ID, Name, Description, Price, and
Category.
▪ This fragment is accessed when viewing menus and might change more frequently with
seasonal updates or new offerings. Vertical fragmentation allows for faster retrieval of
menu items based on restaurant ID and potential horizontal fragmentation by category
for scalability as menus grow.
2. Enhanced Data Security:
o Customer Order Fragment: Stores Order ID, Customer ID, Restaurant ID, Order
Date/Time, and Delivery Address.
▪ This fragment contains sensitive customer information like delivery address. Vertical
fragmentation allows for storing it on a separate server with stricter access controls.
o Delivery Fragment: Stores Delivery ID, Order ID, Driver ID, Delivery Time, and Status.
▪ This fragment contains driver information and might have different security needs than
customer data. Vertical separation allows for appropriate access controls.
3. Optimized Query Processing:
o Most user queries will likely involve browsing restaurants (Restaurant fragment) or
viewing menus (Menu fragment) without needing full order details.
o Vertical fragmentation reduces the data retrieved for these frequent queries, improving
performance.
o Queries involving specific orders or delivery status can access the relevant Customer
Order or Delivery fragment without needing to access the entire dataset.
Overall:
Vertical fragmentation in this scenario offers significant advantages in terms of
performance, scalability, data security, and optimized query processing for a large
online food ordering platform.

b. Identify a relation schema relevant to the chosen scenario and


propose a vertical fragmentation strategy based on the
attributes of the relation.

Scenario: Online Food Ordering Platform

Relation Schema: Customer Orders


Attributes:
• OrderID (Primary Key)
• CustomerID (Foreign Key referencing Customers table)
• RestaurantID (Foreign Key referencing Restaurants table)
• OrderDate (Date and Time)
• ItemsOrdered (List of Menu Item IDs) - This could be a separate table with a foreign key
referencing CustomerOrders.OrderID
• DeliveryAddress (Text)
• Payment Method (e.g., Credit Card, Cash)
• OrderStatus (e.g., Pending, Preparing, Delivered)
• Total Amount
Vertical Fragmentation Strategy:
Based on access patterns and data security considerations, we propose the following
vertical fragmentation strategy for the Customer Orders relation:
1. Order Summary Fragment:
o Attributes: OrderID (Primary Key)
o Attributes: CustomerID (Foreign Key referencing Customers table)
o Attributes: RestaurantID (Foreign Key referencing Restaurants table)
o Attributes: OrderDate (Date and Time)
o Attributes: OrderStatus (e.g., Pending, Preparing, Delivered)
o Attributes: Total Amount
o Rationale: This fragment stores core order details frequently accessed for order
tracking, history, and overall order management. It doesn't contain sensitive customer
information like address or payment method.
2. Order Details Fragment:
o Attributes: OrderID (Foreign Key referencing Customer Orders table) (Primary Key in
this fragment)
o Attributes: ItemsOrdered (List of Menu Item IDs) - This could be a separate table within
this fragment
o Attributes: DeliveryAddress (Text)
o Attributes: Payment Method (e.g., Credit Card, Cash)
o Rationale: This fragment stores potentially sensitive information like delivery address
and payment method. It can be stored on a separate server with stricter access
controls. It also allows for efficient retrieval of order details without accessing the entire
order summary.
Benefits:
• Improved Performance: Separating frequently accessed order summary information
from potentially less frequently accessed details like delivery address and payment
method can optimize query processing.
• Enhanced Data Security: Sensitive customer information is stored on a separate
fragment with potentially stricter access controls.
• Scalability: The Order Summary fragment can be horizontally fragmented by date or
customer ID for scalability as the number of orders grows. Similarly, the Order Details
fragment could be horizontally fragmented by OrderID for better scalability.
Note: This is a sample fragmentation strategy. Depending on the specific needs of the
platform, further fragmentation or adjustments might be required.

c. Justify your fragmentation strategy by analyzing the specific


requirements, access patterns, and scalability considerations of
the chosen scenario.

The proposed vertical fragmentation strategy for the "Customer Orders" relation in the
online food ordering platform scenario is justified based on the following factors:
Specific Requirements:
• Performance: Fast retrieval of order summaries for order tracking, history, and
management is crucial.
• Data Security: Protecting sensitive customer information like delivery address and
payment method is essential.
• Scalability: The platform needs to handle a growing number of orders efficiently.
Access Patterns:
• Users (customers, restaurant staff, platform administrators) frequently access order
summaries (OrderID, CustomerID, RestaurantID, OrderDate, OrderStatus, Total
Amount) to track orders, manage order flow, and view order history.
• Delivery personnel and customer support might need access to delivery address and
potentially payment method for specific orders.
• Order details like specific menu items ordered are likely accessed less frequently
compared to core order summaries.
Scalability Considerations:
• The number of orders is expected to grow as the platform gains popularity.
• Horizontally fragmenting the Order Summary fragment by date or customer ID allows for
efficient scaling to accommodate a larger number of orders.
• Order Details fragmentation by OrderID ensures efficient retrieval of details for specific
orders without impacting the scalability of the Order Summary fragment.
Justification for Fragmentation:
By vertically fragmenting the "Customer Orders" relation, we achieve the following:
• Improved Performance: Separating frequently accessed order summaries (Order
Summary fragment) from potentially less frequently accessed details (Order Details
fragment) optimizes query processing. Users can retrieve order summaries faster
without needing to access the entire relation, including potentially sensitive information.
• Enhanced Data Security: The Order Details fragment, containing sensitive data like
delivery address and payment method, can be stored on a separate server with stricter
access controls. This reduces the risk of unauthorized access to this sensitive
information.
• Scalability: Both fragments can be further horizontally fragmented to handle a growing
number of orders efficiently. The Order Summary fragment can be fragmented by date
or customer ID, while the Order Details fragment can be fragmented by OrderID.
This fragmentation strategy balances performance, security, and scalability
requirements effectively. It ensures fast retrieval of frequently accessed order
information while safeguarding sensitive customer data and allowing the platform to
handle increasing order volumes efficiently.
1. Define in SQL-like syntax a view of the engineering database
V(ENO, ENAME, PNO, RESP), where the duration is 24. Is view V
updatable? Assume that relations EMP and ASG are horizontally
fragmented based on access frequencies as follows:
Site 1 Site 2 Site 3
EMP1 EMP2
ASG1 ASG2
where
EMP1=σTITLE ≠ “Engineer” (EMP)
EMP1=σTITLE = “Engineer” (EMP)
ASG1=σ 0<DUR<36 (ASG)
ASG2= σDUR≥36 (ASG)
At which site(s) should the definition of V be stored without
being fully replicated, to increase locality of reference?

Here's the solution for the scenario:


1. View Definition:
SQL
CREATE VIEW V AS
SELECT e.ENO, e.ENAME, e.PNO, a.RESP
FROM EMP1 e -- Since title is not 'Engineer' in EMP1, use it directly
INNER JOIN ASG1 a ON e.ENO = a.ENO -- Join with ASG1 for assignments with
duration < 36
WHERE a.DUR = 24; -- Filter for assignments with duration exactly 24

2. Updatability of View V:
This view is likely not updatable. Updatable views typically require the underlying
tables to be updatable and have a straightforward mapping between the view definition
and the base tables.
In this case, the view definition involves a join with filtering on a specific duration (24) in
the ASG1 table. Updates to the view would require modifying data in both EMP1 and
ASG1 while maintaining the join condition and duration filter. This becomes complex
and might not be efficiently supported by most database systems.
3. Locality of Reference:
Since the view definition uses EMP1 and ASG1 tables (both located at Site 1 based on
the information provided), storing the view definition at Site 1 would improve locality of
reference. This means data access for queries involving view V would be localized to
Site 1, minimizing network traffic and improving performance.
Here's the reasoning:
• EMP1 and ASG1 are located at Site 1, which aligns with the data required by the view
definition (no engineers and duration less than 36).
• Users querying view V would benefit from faster access to the underlying data at the
same site (Site 1).
• Replicating the view definition at all sites wouldn't be necessary since the data it
represents is already available at Site 1.
Additional Considerations:
• If write operations (updates) on view V are a requirement, you might need to investigate
alternative approaches like materialized views or triggers on the underlying tables, but
these can introduce additional complexity.
• Depending on the specific database system, there might be ways to define updatable
views with certain limitations. Consult your database system's documentation for details.

2. Express the following query: names of employees in view V who


work on the CAD project.

Here's the query to find the names of employees in view V who work on the CAD
project:
SQL
SELECT e.ENAME
FROM V
INNER JOIN PROJECT p ON V.PNO = p.PNO -- Join view V with PROJECT table
WHERE p.PNAME = 'CAD'; -- Filter for projects named 'CAD'

Explanation:
1. Base Table: Since view V doesn't explicitly include the project name, we need to join it
with the PROJECT table to access project information.
2. Join Condition: The join condition V.PNO = p.PNO links employees (ENO in V) to their
projects (PNO in PROJECT) based on project number.
3. Filter: The WHERE clause filters the results to include only employees from view V who
are assigned to the project named 'CAD' by checking the p.PNAME attribute.
This query retrieves the employee names (ENAME) from view V who are associated
with the 'CAD' project through the join with the PROJECT table.

Consider the multilevel relation PROJ** bellow. Assuming that


there are only two classification levels for attributes (S and C),
propose an allocation of PROJ** on two sites using
fragmentation and replication that avoids covert channels on
read queries. Discuss the constraints on updates for this
allocation to work
1. Apply the dynamic query optimization algorithm in the query
given below, and illustrate the successive detachments and
substitutions by giving the monorelation subqueries generated.
SELECT EMP.ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO=ASG.ENO
AND ASG.PNO=PROJ.PNO
AND PNAME="CAD/CAM"

Here's the application of the dynamic query optimization algorithm to the provided
query, illustrating the detachments and substitutions with monorelation subqueries:
Original Query:
SQL
SELECT EMP.ENAME
FROM EMP, ASG, PROJ
WHERE EMP.ENO=ASG.ENO
AND ASG.PNO=PROJ.PNO
AND PNAME="CAD/CAM"

Step 1: Detachment (Projection Pushdown):


We can push down the projection (selecting ENAME) from the outer query to the EMP
table. This reduces the amount of data transferred after the join operations.
Monorelation Subquery 1:
SQL
sigma_ENAME(pi_ENAME(EMP)) // Selection on ENAME followed by projection
of ENAME from EMP

Step 2: Detachment and Substitution (Join Pushdown):


We can push down the join between ASG and PROJ based on the PNO attribute. This
reduces the number of tuples to join with the EMP table in the next step.
Monorelation Subquery 2:
SQL
pi_ENO(sigma_PNAME="CAD/CAM"(PROJ)) JOIN sigma_ENO=PNO(ASG) // Selection
on PNAME followed by projection of ENO from PROJ, then join with ASG
selecting tuples where ENO=PNO

Step 3: Substitute Subquery and Apply Remaining Join:


We substitute the result of Monorelation Subquery 2 (containing ENO values from
matching ASG and PROJ) into the original WHERE clause to join with the EMP table.
Optimized Query:
SQL
SELECT ENAME
FROM sigma_ENAME(pi_ENAME(EMP)) // Selection on ENAME followed by
projection of ENAME from EMP
WHERE ENO IN ( // Substitute ENO values from Monorelation Subquery 2
pi_ENO(sigma_PNAME="CAD/CAM"(PROJ)) JOIN sigma_ENO=PNO(ASG)
)

Explanation:
1. We first push down the projection of ENAME to the EMP table, resulting in Monorelation
Subquery 1.
2. Then, we push down the join between ASG and PROJ, creating Monorelation Subquery
2. This subquery selects ENO values from ASG that match projects with the name
"CAD/CAM" in the PROJ table.
3. Finally, we substitute the result of Monorelation Subquery 2 (containing matching ENO
values) into the original WHERE clause to join with the EMP table. This optimized query
performs selections and joins earlier in the process, potentially reducing the amount of
data processed and improving performance.
Note:
The specific implementation details of dynamic query optimization algorithms might vary
depending on the database system used. However, the general principles of pushing
down selections and joins to reduce data movement remain consistent.
2. Consider the join graph of Fig 4-1 next slide and the following
information: size(EMP) = 100, size(ASG) = 200, size(PROJ) = 300,
size(EMP 1 ASG) = 300, and size(ASG 1 PROJ) = 200. Describe an
optimal join program based on the objective function of total
transmission time
i. EMP → site 2; Site 2 computes EMP’= EMP1⋈ASG; EMP’ → site
3; site 3 computes EMP’ ⋈ PROJ.
ii. ASG → site 1; Site 1 computes EMP’= EMP ⋈ ASG; EMP’ → site 3;
Site 3 computes EMP’ ⋈ PROJ.

I can help you with that!


Analyzing the Options:
The scenario presents two possible join programs (i and ii) for the given join graph. Our
objective is to determine the optimal program based on the total transmission time,
considering the following information:
• Table sizes:
o EMP = 100
o ASG = 200
o PROJ = 300
o EMP ⋈ ASG = 300 (size of the join result between EMP and ASG)
o ASG ⋈ PROJ = 200 (size of the join result between ASG and PROJ)
• Transmission cost assumed to be constant per unit of data size.
Option i:
1. EMP (100) -> Site 2
2. Site 2: EMP ⋈ ASG (300)
3. EMP' (300) -> Site 3
4. Site 3: EMP' ⋈ PROJ (unknown size)
Option ii:
1. ASG (200) -> Site 1
2. Site 1: EMP ⋈ ASG (300)
3. EMP' (300) -> Site 3
4. Site 3: EMP' ⋈ PROJ (unknown size)
Calculating Transmission Costs:
For both options, steps 3 and 4 involve transmitting EMP' (size 300) to Site 3 and
performing a join with PROJ. Since the size of the final join result is unknown, we can't
calculate the exact transmission cost for those steps. However, we can focus on the
initial transmissions (steps 1 and 2) to determine the optimal program.
Option i:
• Transmission cost = Size(EMP) * Transmission cost per unit + Size(EMP' ⋈ ASG) *
Transmission cost per unit
• Transmission cost = 100 + 300 (assuming constant transmission cost per unit)
• Transmission cost = 400
Option ii:
• Transmission cost = Size(ASG) * Transmission cost per unit + Size(EMP' ⋈ ASG) *
Transmission cost per unit
• Transmission cost = 200 + 300 (assuming constant transmission cost per unit)
• Transmission cost = 500
Optimal Program:
Based on the calculated transmission costs, Option i (EMP -> Site 2, then EMP' -> Site
3) has a lower cost (400) compared to Option ii (ASG -> Site 1, then EMP' -> Site 3)
with a cost of 500.
Therefore, the optimal join program to minimize total transmission time is Option i.
Explanation:
In Option i, transmitting the smaller table (EMP) first followed by the joined result (EMP'
⋈ ASG) minimizes data movement across sites. This approach leads to a lower overall
transmission cost compared to Option ii, where the larger table (ASG) is transmitted
initially.
Note:
This analysis considers only the initial data transmission costs. The actual processing
time at each site for the join operations might also be a factor depending on the specific
database system. However, in most cases, minimizing data movement between sites is
a crucial factor for optimizing overall query performance.

iii. ASG → site 3; Site 3 computes ASG’= ASG ⋈PROJ; ASG’ → site 1;
Site 1 computes ASG’ ⋈ EMP.
iv. PROJ → site 2; Site 2 computes PROJ’= PROJ⋈ASG; PROJ’→ site
1; Site1 computes PROJ’⋈EMP.

Following the same approach as before, let's analyze options iii and iv for the join
program:
iii. ASG -> Site 3; Site 3 computes ASG' = ASG ⋈ PROJ; ASG' -> Site 1; Site 1
computes ASG' ⋈ EMP
Transmission Costs:
1. ASG (200) -> Site 3
2. Site 3: ASG ⋈ PROJ (200) (assuming the join result size is the same as the smaller
table)
3. ASG' (200) -> Site 1
4. Site 1: ASG' ⋈ EMP (unknown size)
Total Cost (ignoring unknown size):
• Cost = 200 + 200 (assuming constant transmission cost per unit)
• Cost = 400
iv. PROJ -> Site 2; Site 2 computes PROJ' = PROJ ⋈ ASG; PROJ' -> Site 1; Site1
computes PROJ' ⋈ EMP
Transmission Costs:
1. PROJ (300) -> Site 2
2. Site 2: PROJ ⋈ ASG (unknown size)
3. PROJ' -> Site 1 (unknown size)
4. Site 1: PROJ' ⋈ EMP (unknown size)
Total Cost (entirely unknown):
Here, the size of the join result (step 2) and the final result (step 3) are both unknown.
Therefore, calculating a precise cost comparison isn't possible.
Comparison and Conclusion:
Based on the available information:
• Option iii (ASG -> Site 3, then ASG' -> Site 1) has a calculable cost of 400 (similar to
option i).
• Option iv (PROJ -> Site 2, then PROJ' -> Site 1) has entirely unknown transmission
costs due to unknown join result sizes.
Considering the following points:
• We know the size of ASG (200) but not PROJ (300).
• In most cases, the join result size is likely to be smaller or equal to the size of the
smaller participating table.
It's likely that Option iii will still have a lower or comparable cost to Option iv.
However, without knowing the exact join result sizes in Option iv, we cannot definitively
conclude which option is optimal.
Additional Considerations:
• The actual processing time at each site for the join operations might influence the final
decision.
• If the join selectivity between PROJ and ASG is very low (meaning a small result set),
Option iv might become more favorable.
Recommendation:
• Based on the available data and typical join behavior, Option iii (ASG -> Site 3, then
ASG' -> Site 1) appears to be a strong candidate for the optimal program.
• If the join selectivity between PROJ and ASG is known to be very low, further analysis
of Option iv might be needed.
In a real-world scenario, you might need to collect statistics about join selectivities and
potentially test different join programs to determine the most efficient one for your
specific data and workload.
3. Summarize the layers of query processing

Query processing involves a series of steps to transform a user's query expressed in a


high-level language (like SQL) into machine-readable instructions and retrieve the
desired data from a database system. Here's a summary of the key layers involved:
1. Parsing and Translation:
• This layer takes the user's query written in a high-level language like SQL and parses it
into a tree-like structure representing the query's syntax and meaning.
• It then translates this internal representation into a lower-level format understandable by
the query optimizer.
2. Optimization:
• The optimizer analyzes the translated query and the database schema to determine the
most efficient way to execute it.
• It considers factors like available indexes, join order, and filtering conditions to choose
an optimal execution plan.
• This might involve techniques like rewriting queries, pushing down operations closer to
the data source, and choosing appropriate join algorithms.
3. Code Generation:
• Based on the optimized plan, this layer generates the actual code that the database
system can execute.
• This code might involve instructions for accessing data files, performing joins, applying
filters, and sorting results.
• The specific code format depends on the database system being used.
4. Evaluation:
• The generated code is then executed by the database engine.
• It interacts with the storage manager to access the relevant data files and performs the
necessary operations as specified in the code.
Additional Layer (Distributed Systems):
• In distributed database systems, an additional layer might be present for query
decomposition and optimization across multiple sites.
• This layer breaks down the query into smaller sub-queries that can be executed on
individual sites holding relevant data fragments.
• It then coordinates the execution of these sub-queries and combines the results to
provide the final answer to the user's original query.

4. Provide at list four examples of Iterative algorithms for global


query optimization. Next to these algorithms, define brute-force
search and provide two example of the algorithm involved

You might also like