DBMS 1202

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 43

SHADAN COLLEGE OF ENGG.

&
TECH.

NAME : ABDUL SULEMAN


DBMS PPT ROLLNO : 22081A1202
BRANCH : I.T. II YR. – II SEM.
Q1. Explain the difference between 3NF & BCNF

Third Normal Form (3NF)


A relation is said to be in Third Normal Form (3NF), if it is in 2NF and when no
non-key attribute is transitively dependent on the primary key i.e., there is no
transitive dependency. Also, it should satisfy one of the below-given conditions.
For the function dependency C->D:

 C should be a super key and,


 D should be a prime attribute i.e., D should be a part of the candidate key.

3NF is used to reduce data duplication and to attain data integrity.


Example of 3NF

For the relation R(L, M, N, O, P) with functional dependencies as {L->M, MN->P, PO->L}:
The candidate keys will be : {LNO, MNO, NOP}
as the closure of LNO = {L, M, N, O, P}
closure of MNO = {L, M, N, O, P}
closure of NOP = {L, M, N, O, P}
This relation is in 3NF as it is already in 2NF and has no transitive dependency. Also there is
no non prime attribute that is deriving a non prime attribute.

Advantages of 3NF

• Reduce Data Redundancy


• Enhance Data Integrity
• Provide Data Flexibility
• Improve Data Organization.

Disadvantages of 3NF

• Increased Complexity
• Potential Performance Issues
• Overhead for Maintenance
Boyce-Codd Normal Form (BCNF)
BCNF stands for Boyce-Codd normal form and was made by R.F Boyce and E.F Codd in 1974.A
functional dependency is said to be in BCNF if these properties hold:

• It should already be in 3NF.


• For a functional dependency say P->Q, P should be a super key.
BCNF is an extension of 3NF and it is has more strict rules than 3NF. Also, it is considered to be
more stronger than 3NF.

Example of BCNF

For the relation R(A, B, C, D) with functional dependencies as {A->B, A->C, C->D, C->A}:

The candidate keys will be : {A, C}


as the closure of A = {A, B, C, D}
closure of C = {A, B, C, D}
This relation is in BCNF as it is already in 3Nf (there is no prime attribute deriving no prime
attribute) and on the left hand side of the functional dependency there is a candidate key.

Advantages of BCNF

• Eliminates Anomalies More Effectively


• Higher Data Integrity
• Ensures that all functional dependencies are represented clearly

Disadvantages of BCNF:

• Greater Complexity
• Performance Trade-offs
• Increased Design and Maintenance Cost
S.No
3NF BCNF
3NF stands for Third Normal Form. BCNF stands for Boyce Codd Normal Form.
1.

In 3NF there should be no transitive dependency In BCNF for any relation A->B, A should be a
2. that is no non prime attribute should be super key of relation.
transitively dependent on the candidate key.

It is less stronger than BCNF. It is comparatively more stronger than 3NF.


3.

In 3NF the functional dependencies are already In BCNF the functional dependencies are
4. in 1NF and 2NF. already in 1NF, 2NF and 3NF.

The redundancy is high in 3NF. The redundancy is comparatively low in


5. BCNF.

In 3NF there is preservation of all functional In BCNF there may or may not be
6. dependencies. preservation of all functional dependencies.

It is comparatively easier to achieve. It is difficult to achieve.


7.
S.No. 3NF BCNF

Lossless decomposition can be achieved by Lossless decomposition is hard to


8. 3NF. achieve in BCNF.

The table is in 3NF if it is in 2NF and for The table is in BCNF if it is in 3rd
9. each functional dependency X->Y at least normal form and for each relation X-
following condition hold: >Y X should be super key.
(i) X is a super key,
(ii) Y is prime attribute of table.

3NF can be obtained without sacrificing all Dependencies may not be preserved
10. dependencies. in BCNF.

3NF can be achieved without losing any For obtaining BCNF we may lose
11. information from the old table. some information from old table.
Q2. What is Locking Protocol. Describe
the Strict two-phase locking protocol.
Locking Protocol :
A locking protocol is a mechanism used to manage concurrent access to shared resources, such as data in a
database, to ensure consistency and prevent conflicts between multiple transactions.

Strict Two-Phase Locking Protocol (2PL): In the strict two-phase locking protocol, transactions
follow these rules :

1. Growing Phase (Locking Phase):


 Transactions can acquire locks (shared or exclusive) on data items as needed during execution.
 Once a transaction releases a lock, it cannot acquire any new locks.
 This phase ensures that a transaction acquires all the locks it needs before releasing any locks,
preventing deadlock and ensuring serializability.
2. Shrinking Phase (Unlocking Phase):

• Transactions can release locks but cannot acquire any new locks after releasing any lock.

• This phase ensures that a transaction only relinquishes locks and does not interfere with other
transactions acquiring locks.

Importance and Functionality:

• Serializability: Ensures that the outcome of concurrent transactions is equivalent to some


serial execution of those transactions.

• Deadlock Prevention: By enforcing a strict order of acquiring and releasing locks, the protocol
reduces the likelihood of deadlock, where transactions are waiting indefinitely for resources
held by each other.

• Consistency: Prevents issues like lost updates and dirty reads by ensuring that transactions do
not interfere with each other improperly.
Q3. Explain Multiple Granularity
concurrency control scheme.

The Multiple Granularity Locking (MGL) concurrency control scheme is designed to


optimize locking in database systems by allowing locks to be taken on various
levels of granularity within a hierarchy of data items. Here’s an explanation of
how it works:
Hierarchy of Data Items:
In many database systems, data items are organized in a hierarchical structure.
For example, in a relational database, data might be organized into tables,
pages, and individual rows. The MGL scheme takes advantage of this hierarchy to
allow locks to be taken at different levels:
1. Coarse-granularity Locks: The locks cover large portions of the database, such
as entire tables or large segments of data. They are efficient for operations that
affect large parts of the database.
2. Fine-granularity Locks: These locks cover smaller portions of the database, such as individual
rows or small group of rows. They are useful for operations that affect specific data items.

Key Concepts of Multiple Granularity Lock:

• Lock Compatibility: Just like in other locking schemes, MGL ensures that locks are
compatible to avoid conflicts between transactions. For example, a transaction holding a
shared lock at a higher level in the hierarchy can coexist with transactions holding locks on
lower levels, but exclusive locks at any level conflict with all other locks.

• Lock Conversion: Transactions can upgrade or downgrade locks within the hierarchy as
needed. For example, a transaction holding a shared lock at a higher level might upgrade it to
an exclusive lock at a lower level to perform an update.

• Hierarchical Navigation: Transactions must navigate the hierarchy correctly when acquiring
and releasing locks. For instance, a transaction might need to acquire a table-level lock before
acquiring finer-granularity locks on specific rows within that table.
Advantages of MGL:

• Reduced Locking Overhead: By allowing transactions to lock at various levels of


granularity, MGL can reduce contention and improve concurrency compared to simpler
locking schemes that only allow locks on a single granularity level

• Flexible and Efficient: MGL provides flexibility in choosing the appropriate level of
granularity for locks, allowing transactions to balance between fine-grained locking for
precise control and coarse-grained locking for efficiency

Implementation Considerations

Implementing MGL requires careful management of the lock hierarchy and ensuring that
transactions correctly acquire and release locks in the appropriate order to prevent deadlocks
and ensure data consistency.
In summary, the Multiple Granularity Locking scheme is a sophisticated approach to
concurrency control in database systems, leveraging hierarchical data structures to optimize
locking efficiency and maintain data integrity in multi-user environments.
Q4. Explain the ACID properties of
transactions.
In database management systems (DBMS), the ACID properties are fundamental
characteristics that guarantee reliability and consistency of transactions. Here’s an
explanation of each ACID property:

ACID Properties:
1. Atomicity:
 Definition: Atomicity ensures that a transaction is treated as a single unit of work,
which either completes fully or not at all.
 Key Points:
 If any part of a transaction fails (due to an error, system crash, etc.), the entire
transaction is aborted (rolled back), and any changes made by the transaction are
undone.
 Atomicity ensures that transactions maintain consistency in the face of failures and
errors.
2. Consistency:

• Definition: Consistency ensures that a transaction transforms the database from one
consistent state to another consistent state.

• Key Points:
• A transaction must preserve all integrity constraints and business rules before and after its
execution.
• Consistency ensures that the database remains in a valid state at all times.

3. Isolation:
• Definition: Isolation ensures that the execution of concurrent transactions results in a
system state that would be obtained if transactions were executed sequentially, one after
another.

• Key Points:
• Transactions should operate independently of each other, as if no other transactions are
concurrently executing.
• Isolation prevents transactions from affecting each other’s intermediate states, ensuring they
maintain data integrity and consistency.
4. Durability:
• Definition: Durability guarantees that once a transaction is committed, its changes are
permanent and survive system failures.

• Key Points:
• Committed changes are stored permanently in non-volatile storage (e.g., disk) and are not lost
even in the event of a system crash or restart.
• Durability ensures that the effects of committed transactions persist, allowing for recovery
and restoration of the database to its last consistent state after a failure.

Importance of ACID Properties:

• Data Integrity: ACID properties ensure that transactions maintain the correctness and
reliability of data operations.
• Concurrency Control: Isolation and atomicity prevent issues like lost updates and dirty reads
that can occur when multiple transactions access and modify data concurrently.
• Recovery and Fault Tolerance: Durability guarantees that committed changes are resilient
against system failures, allowing for reliable recovery mechanisms.
Q5. What is log file? Explain the following log-based recovery
schema
(i) Deferred DB modifications
(ii) Immediate DB modifications
Log File
A log file, often referred to as a transaction log or redo log, is a crucial component of
database systems used to ensure durability and support recovery mechanisms. It records all
changes made to the database, typically including:
 Transaction Begin and End: Recording when transactions start and commit or abort.
 Data Modifications: Details of changes (inserts, updates, deletes) made by
transactions.
 System Changes: Metadata changes, such as schema modifications.
 Checkpoints: Points in the log that signify stable states of the database.
The log file allows the database management system to recover the database to a
consistent state after a crash or failure, using techniques like redoing committed
transactions and undoing incomplete transactions.
Log-Based Recovery Schema
In log-based recovery, the system uses the log file to ensure that all committed transactions are
durably stored (through redoing) and that any incomplete transactions are rolled back (through
undoing) after a crash or failure. There are two main strategies for handling database
modifications in the log-based recovery schema:

(i) Deferred DB Modifications


• Explanation:

• Deferred Updates: In this approach, the modifications to the database (actual data) by a
transaction are deferred until the transaction commits.
• Log Record Structure: The log records contain both the old and new values of modified data
items before they are actually changed in the database.
• Commit Phase: When a transaction commits, the log records are flushed to disk to ensure
durability. After this, the actual database modifications (writes) are performed.
• Rollback Phase: If a transaction aborts (due to failure or rollback request), the system uses the
log to undo the changes by applying the old values recorded in the log records.
Advantages:

• Provides higher concurrency since database modifications are deferred.


• Reduces the need for immediate disk writes, which can improve performance.

Disadvantages:

• Requires careful management of transaction states and log records to ensure consistency.
• Recovery might be slower since undoing changes is necessary after a crash.

(ii) Immediate DB Modifications


• Explanation:

• Immediate Updates: In this approach, database modifications are applied (written to


disk) as soon as a transaction performs them.
• Log Record Structure: The log records primarily contain the before image (old value) of
modified data items.
• Commit Phase: When a transaction commits, its log records are flushed to disk for
durability. The changes made to the database are already present in the database file.
• Rollback Phase: If a transaction aborts, the system uses the log to revert the changes by
applying the before images stored in the log records.
Advantages:

• Simplifies recovery process since there’s no need to undo changes that were never applied.
• Allows for simpler management of transaction states and recovery logs.

Disadvantages:

• May reduce concurrency because database modifications are immediately applied.


• Requires more frequent disk writes, which can impact performance.

Summary

Both deferred and immediate database modification strategies within a log-based recovery
schema ensure that transactions adhere to ACID properties and that the database remains
consistent and recoverable in case of failures. The choice between deferred and immediate
modifications often depends on performance requirements, concurrency considerations, and
the recovery time objectives of the database system.
Q6. Explain about cluster index, primary
index, secondary index with examples.
Clustered Index
A clustered index is a type of index in which the rows of the table are physically organized in the same
order as the index key. This means that the data rows are stored on disk in a sorted order based on the
clustered index key. Each table can have only one clustered index because the data rows themselves can
only be sorted in one order.
Example:
Consider a table Employees with columns EmployeeID (primary key) and LastName. If EmployeeID is chosen
as the clustered index, then the rows in the Employees table will be physically sorted by EmployeeID.
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY CLUSTERED,
LastName VARCHAR(50),
FirstName VARCHAR(50),
...
);
Primary Index
A primary index is an index that is automatically created when a primary key constraint is
defined on a table. It enforces uniqueness for the primary key column(s) and provides fast
access to data rows based on the primary key values.

Example:

In the Employees table example above, EmployeeID is specified as the primary key. This
automatically creates a primary index on EmployeeID. The primary index structure would
typically store a mapping of EmployeeID values to the physical locations (disk addresses) of the
corresponding data rows in the table.
Secondary Index

A secondary index is an index created on a non-key column(s) of a table to improve the


efficiency of queries that do not search on the primary key. Unlike a clustered index, a
secondary index does not dictate the physical order of rows on disk.

Example:

Continuing with the Employees table example, let's say we frequently search employees by their
LastName. We can create a secondary index on the LastName column to speed up such queries:

CREATE INDEX IX_LastName ON Employees(LastName);

This IX_LastName index will contain entries for each LastName value, pointing to the disk
locations of corresponding rows in the Employees table.
• Key Differences:

• Clustered Index: Defines the physical order of data rows on disk based on the index key.
There can only be one clustered index per table.
• Primary Index: Automatically created when a primary key constraint is defined. It enforces
uniqueness and provides fast access to data rows based on the primary key.
• Secondary Index: Created on non-key columns to improve the performance of queries that
do not use the primary key. It does not affect the physical order of data rows on disk.

• Summary

Clustered indexes are directly tied to the physical order of data rows, primary indexes enforce
uniqueness and are automatically created with a primary key constraint, and secondary indexes
provide efficient access paths based on non-key columns. Each serves specific purposes in
optimizing data retrieval and maintaining data integrity in database systems.
Q7. Explain deletion & insertion
operations in B+ Trees?
Insertion in B+ Trees in DBMS

1. Search for the Correct Leaf Node:


 Start from the root and traverse the tree to find the appropriate leaf
node where the new key should be inserted. This traversal is guided
by comparing the new key with the keys in the internal nodes to
decide the path.
2. Insert the Key in the Leaf Node:
 Insert the new key into the appropriate position in the leaf node to
maintain the sorted order.
3. Handle Overflow:

• If the leaf node becomes full (i.e., it exceeds the maximum number of keys it can hold), it
must be split:
• Create a new leaf node.
• Distribute the keys evenly between the original leaf node and the new leaf node.
• Update the linked list pointers of the leaf nodes.
• Promote the middle key (median) to the parent node to act as a separator.

4. Split the Parent Node (if necessary):


• If the parent node also overflows due to the promoted key, split it similarly and promote
the median key to the next higher level.
• Repeat this process until no overflow occurs or a new root is created.

Example:

Consider inserting key 30 into a B+ tree where the leaf node can hold a maximum of 4 keys.
• Before insertion: [10, 20, 40, 50]
• Insert 30, resulting in: [10, 20, 30, 40, 50]
• (overflow)Split into two nodes: [10, 20] and[30, 40, 50]
• Promote 30 to the parent node.
Deletion in B+ Trees in DBMS
1. Search for the Key:
• Traverse the tree from the root to find the leaf node containing the key to be deleted.

2. Delete the Key from the Leaf Node:


• Remove the key from the leaf node.

3. Handle Underflow:
• If the leaf node has fewer keys than the minimum required, it must be rebalanced:
• Redistribution (Borrowing): If a sibling node has more than the minimum number of
keys, borrow a key from the sibling. Adjust the keys and pointers accordingly.
• Merging: If redistribution is not possible, merge the underflowed node with a sibling. Remove
the separator key from the parent node and merge the two nodes' keys.

4. Adjust the Parent Node:


• If the parent node has fewer keys than required due to the removal of the separator key, it
might also need rebalancing.
• If the root node becomes empty, replace it with its only child, reducing the tree's height.
Example:
Consider deleting key 30 from a B+ tree where the leaf node can hold a minimum of 2 keys.
• Before deletion: [10, 20] and [30, 40, 50]
• Delete 30 from the second node, resulting in: [10, 20] and [40, 50] (no underflow)
• If 40 was deleted instead, [10, 20] and [30, 50] (underflow)
• Merge [30] into [10, 20, 50]
Q8. Explain Time-based protocol?

Time-based protocols are concurrency control mechanisms used in


database management systems (DBMS) to ensure that transactions are
executed in a way that maintains the consistency and isolation
properties of the database. One of the most well-known time-based
protocols is the Timestamp Ordering Protocol. This protocol assigns
timestamps to transactions to order their operations in a manner that
avoids conflicts and ensures serializability.
Timestamp Ordering Protocol
In the Timestamp Ordering Protocol, each transaction is assigned a
unique timestamp when it begins. This timestamp determines the order
in which transactions should logically occur. The protocol ensures that
transactions are executed in a way that respects these timestamps.
Key Components
Timestamps:
• Each transaction 𝑇𝑖T i​ is assigned a unique timestamp 𝑇𝑆(𝑇𝑖)TS(T i​) at the start.
• These timestamps are typically generated using the system clock or a logical counter
to ensure uniqueness and monotonicity.
Read and Write Timestamps:
• Each data item 𝑄Q in the database has two timestamps:𝑅𝑇𝑆(𝑄)R T​S(Q):
• The largest timestamp of any transaction that has successfully read 𝑄Q.
• 𝑊𝑇𝑆(𝑄)W T​S(Q): The largest timestamp of any transaction that has successfully
written 𝑄Q.

Rules for Operation Execution


1. Read Operation:
• When a transaction 𝑇𝑖T i​ issues a read Q:
• If 𝑇𝑆(𝑇𝑖)<𝑊𝑇𝑆(𝑄)TS(T i​)<W T​S(Q): The read operation is rejected, and 𝑇𝑖T i​ is rolled
back because a younger transaction has already written to 𝑄Q, violating the
timestamp order.
• If 𝑇𝑆(𝑇𝑖)≥𝑊𝑇𝑆(𝑄)TS(T i​)≥W T​S(Q): The read operation is allowed, and 𝑅𝑇𝑆(𝑄)R T​
S(Q) is updated to max⁡(𝑅𝑇𝑆(𝑄),𝑇𝑆(𝑇𝑖))max(R T​S(Q),TS(T i​)).
2. Write Operation:
• When a transaction 𝑇𝑖T i​ issues a write Q:
• If 𝑇𝑆(𝑇𝑖)<𝑅𝑇𝑆(𝑄)TS(T i​)<R T​S(Q): The write operation is rejected, and 𝑇𝑖T i​ is rolled back
because a younger transaction has already read 𝑄Q, violating the timestamp order.
• If 𝑇𝑆(𝑇𝑖)<𝑊𝑇𝑆(𝑄)TS(T i​)<W T​S(Q): The write operation is rejected, and 𝑇𝑖T i​ is rolled
back because a younger transaction has already written to 𝑄Q.
• If 𝑇𝑆(𝑇𝑖)≥𝑅𝑇𝑆(𝑄)TS(T i​)≥R T​S(Q) and 𝑇𝑆(𝑇𝑖)≥𝑊𝑇𝑆(𝑄)TS(T i​)≥W T​S(Q): The write
operation is allowed, and 𝑊𝑇𝑆(𝑄)W T​S(Q) is updated to 𝑇𝑆(𝑇𝑖)TS(T i​).
Advantages
• Ensures Serializability: The protocol guarantees that the schedule of transactions is
serializable, meaning it is equivalent to some serial execution of transactions.
• No Deadlocks: Since transactions are rolled back instead of waiting for locks, deadlocks
are avoided.
Disadvantages
• Cascading Rollbacks: If a transaction is rolled back, it may cause other transactions to be
rolled back as well, leading to cascading rollbacks.
• Resource Intensive: Frequent rollbacks and the need to manage timestamps can be
resource-intensive, especially in high-concurrency environments.
• Starvation: Transactions with lower timestamps may be repeatedly rolled back if they
conflict with transactions with higher timestamps, leading to starvation.
Q9. Explain different types of file
operation methods
In a Database Management System (DBMS), file operations are fundamental to managing and
accessing the stored data. These operations ensure that data is stored efficiently, retrieved
quickly, and modified safely. Here are the different types of file operations typically performed in
a DBMS:
1. File Organization
 Heap (Unordered) Files: Records are placed in the file in no particular order. Retrieval requires
scanning the entire file, but insertion is efficient.
 Sequential (Ordered) Files: Records are stored in a specific order based on a key field. This
allows for efficient range queries and sequential access but can be slow for insertions and
deletions.
 Hashed Files: Records are placed based on a hash function applied to a key field. This allows
for fast access for equality searches but is not efficient for range queries.
 Clustered Files: Related records from different tables are stored together to improve the
performance of join operations.
2. File Operations

• Creation: Creating a new file involves allocating space for the file and setting up the necessary
metadata.
• Insertion: Adding a new record to a file. The method of insertion depends on the file organization.
• Deletion: Removing a record from a file. This may involve marking the record as deleted or physically
removing it.
• Modification: Updating an existing record in a file.
• Retrieval: Accessing records from a file based on some criteria (e.g., key, range).Scanning: Reading all
records in a file. This is often used for operations like generating reports or performing backups.
• Sorting: Arranging records in a file in a specific order based on one or more fields.

3. File Access Methods

• Sequential Access: Reading records one by one in the order they are stored. Efficient for processing
entire files but not for random access.
• Random Access: Accessing records directly using an index or a key. Useful for quick retrieval but may
require additional structures like indexes.
• Indexed Access: Using an index to locate records quickly. The index can be based on primary or
secondary keys and can be implemented using structures like B+ trees or hash tables.
4. File Indexing

• Single-Level Indexing: A single index file is maintained to map keys to the locations of the
records.
• Primary Index: Based on the primary key of the file.
• Secondary Index: Based on non-primary key attributes.
• Multi-Level Indexing: A hierarchy of indexes is used to improve access times, especially for
large datasets.
• Dense Index: An index entry for every record in the data file.
• Sparse Index: Index entries only for some records. This reduces the size of the index but
may require more time to locate a record.

5. File Allocation Methods

• Contiguous Allocation: All the blocks of a file are contiguous. Simple and fast for sequential
access but can lead to fragmentation.
• Linked Allocation: Each file block points to the next block. This allows for flexible file sizes
but can be slower for random access.
• Indexed Allocation: An index block contains pointers to the actual data blocks of the file.
This combines the benefits of contiguous and linked allocation.
6. File Maintenance

• Defragmentation: Reorganizing the file to place records contiguously, reducing


fragmentation.
• Compaction: Removing deleted records and compressing the file to free up space.
• Backup and Recovery: Regularly copying files to backup storage and restoring them in case
of data loss.

7. Transaction Logging

• Write-Ahead Logging (WAL): Ensuring that all modifications are logged before they are
applied to the database. This helps in recovering the database to a consistent state after a
failure.
• Checkpointing: Periodically saving the current state of the database to reduce the amount
of work needed during recovery.
Q10. What is Serializability? Explain
conflict and view serializability
Serializability
Serializability is a concept in database systems used to ensure that the concurrent execution
of transactions results in a database state that is consistent with the state obtained by some
serial execution of the same transactions. In other words, even though transactions are
executed concurrently, the final outcome should be the same as if the transactions had been
executed one after the other in some order.

Conflict Serializability
Conflict serializability is a type of serializability that is based on the concept of conflicts
between operations. Two operations are said to be in conflict if:
1. They belong to different transactions.
2. They access the same data item.
3. At least one of the operations is a write.
For a schedule (sequence of operations) to be conflict-serializable, it must be conflict-
equivalent to some serial schedule. Conflict equivalence means that the order of conflicting
operations is the same in both schedules. The steps to check for conflict serializability are:

1. Construct a Precedence Graph (Serialization Graph):

• Nodes represent transactions.


• Directed edges represent conflicts between transactions.

2. Check for Cycles:

• If the precedence graph has no cycles, the schedule is conflict-serializable.


• If there is a cycle, the schedule is not conflict-serializable.
View Serializability
View serializability is a broader concept than conflict serializability. A schedule is view-
serializable if it is view-equivalent to some serial schedule. Two schedules are view-equivalent
if:

1. Initial Read:
• For each data item, the transaction that reads the initial value of the data item is the same
in both schedules.

2. Final Write:
• For each data item, the transaction that performs the final write is the same in both
schedules.

3. Read-From:
• For each data item, if a transaction reads a value written by another transaction in one
schedule, it must read the same value written by the same transaction in the other
schedule.
Q11. Compare and contrast the hash
based indexing and tree based indexing?
Hash-Based Indexing vs. Tree-Based Indexing
Both hash-based indexing and tree-based indexing are common methods used in database management
systems to improve the speed of data retrieval. Here’s a detailed comparison of the two:
Hash-Based Indexing
1. Structure:
 Uses a hash function to convert a key into a location in an array (hash table).
 Each entry in the hash table points to a bucket containing records with keys that hash to the same
value.
2. Search Efficiency:
 Equality Searches: Extremely fast (O(1) average time complexity) because the hash function directly
maps keys to locations.
 Range Searches: Inefficient because hash functions do not preserve order. To perform a range search, a
full table scan is often required.
3. Insertion/Deletion:
• Generally fast (O(1) average time complexity) since it involves
calculating a hash and adding/removing a record from a
bucket.
Collisions (two keys hashing to the same value) require additional
handling, such as chaining or open addressing, which can affect
performance.

4. Space Utilization:
• Can be inefficient if the hash table is sparse or if many
collisions occur, requiring additional storage for collision
resolution.

5. Complexity:
• Simpler to implement compared to tree-based indexes.
Tree-Based Indexing
1. Structure:
• Uses balanced tree structures, commonly B-trees or B+ trees, to maintain sorted order of
keys.
• Nodes contain key-value pairs and pointers to child nodes.
2. Search Efficiency:
• Equality Searches: Efficient (O(log n) time complexity) as the tree structure allows binary
search.
• Range Searches: Very efficient because the sorted nature of trees allows for in-order
traversal.
3. Insertion/Deletion:
• Involves rebalancing the tree to maintain its balanced property (O(log n) time complexity).
This can be slower compared to hash-based indexing but ensures that search operations
remain efficient.
4. Space Utilization:
• Typically more efficient in terms of space as it maintains a balanced structure with fewer
empty nodes compared to hash tables with many collisions.
5. Complexity:
• More complex to implement due to the need to maintain balance and handle various tree
operations (insert, delete, split, merge).
Feature Hash-Based Indexing Tree-Based Indexing

Data Structure Hash table B-trees, B+ trees

Search Efficiency O(1) for quality, O(n) for O(log n) for equality and
range range

Insertion/Deletion O(1) average, may vary O(log n)


with collisions

Space Utilization May have high overhead Typically more efficient


due to collisions

Complexity Simpler More complex

Rest Use Case Equality searches Range queries and


ordered data
When to Use Which?
Hash-Based Indexing is best suited for situations where you have a large number of
equality searches and relatively few range queries. For example, looking up user records
by unique user ID in a large user database.

Tree-Based Indexing is ideal for scenarios that require frequent range queries or
ordered traversals. For example, searching for records within a certain date range in a log
file.

By understanding the differences and appropriate use cases for hash-based and tree-based
indexing, database designers can choose the best indexing method to optimize
performance for their specific applications.
THANK YOU

You might also like