Professional Documents
Culture Documents
DBMS 1210(1)
DBMS 1210(1)
DBMS 1210(1)
For the relation R(L, M, N, O, P) with functional dependencies as {L->M, MN->P, PO->L}:
The candidate keys will be : {LNO, MNO, NOP}
as the closure of LNO = {L, M, N, O, P}
closure of MNO = {L, M, N, O, P}
closure of NOP = {L, M, N, O, P}
This relation is in 3NF as it is already in 2NF and has no transitive dependency. Also there is
no non prime attribute that is deriving a non prime attribute.
Advantages of 3NF
Disadvantages of 3NF
• Increased Complexity
• Potential Performance Issues
• Overhead for Maintenance
Boyce-Codd Normal Form (BCNF)
BCNF stands for Boyce-Codd normal form and was made by R.F Boyce and E.F Codd in 1974.A
functional dependency is said to be in BCNF if these properties hold:
Example of BCNF
For the relation R(A, B, C, D) with functional dependencies as {A->B, A->C, C->D, C->A}:
Advantages of BCNF
Disadvantages of BCNF:
• Greater Complexity
• Performance Trade-offs
• Increased Design and Maintenance Cost
S.No
3NF BCNF
3NF stands for Third Normal Form. BCNF stands for Boyce Codd Normal Form.
1.
In 3NF there should be no transitive dependency In BCNF for any relation A->B, A should be a
2. that is no non prime attribute should be super key of relation.
transitively dependent on the candidate key.
In 3NF the functional dependencies are already In BCNF the functional dependencies are
4. in 1NF and 2NF. already in 1NF, 2NF and 3NF.
In 3NF there is preservation of all functional In BCNF there may or may not be
6. dependencies. preservation of all functional dependencies.
The table is in 3NF if it is in 2NF and for The table is in BCNF if it is in 3rd
9. each functional dependency X->Y at least normal form and for each relation X-
following condition hold: >Y X should be super key.
(i) X is a super key,
(ii) Y is prime attribute of table.
3NF can be obtained without sacrificing all Dependencies may not be preserved
10. dependencies. in BCNF.
3NF can be achieved without losing any For obtaining BCNF we may lose
11. information from the old table. some information from old table.
Q2. What is Locking Protocol. Describe the
Strict two-phase locking protocol.
Locking Protocol :
A locking protocol is a mechanism used to manage concurrent access to shared resources, such as data in a
database, to ensure consistency and prevent conflicts between multiple transactions.
Strict Two-Phase Locking Protocol (2PL): In the strict two-phase locking protocol, transactions
follow these rules :
1. Growing Phase (Locking Phase):
• Transactions can acquire locks (shared or exclusive) on data items as needed during execution.
• Once a transaction releases a lock, it cannot acquire any new locks.
• This phase ensures that a transaction acquires all the locks it needs before releasing any locks, preventing
deadlock and ensuring serializability.
2. Shrinking Phase (Unlocking Phase):
• Transactions can release locks but cannot acquire any new locks after releasing any lock.
• This phase ensures that a transaction only relinquishes locks and does not interfere with other
transactions acquiring locks.
• Deadlock Prevention: By enforcing a strict order of acquiring and releasing locks, the protocol
reduces the likelihood of deadlock, where transactions are waiting indefinitely for resources
held by each other.
• Consistency: Prevents issues like lost updates and dirty reads by ensuring that transactions do
not interfere with each other improperly.
Q3. Explain Multiple Granularity concurrency
control scheme.
The Multiple Granularity Locking (MGL) concurrency control scheme is designed to
optimize locking in database systems by allowing locks to be taken on various
levels of granularity within a hierarchy of data items. Here’s an explanation of
how it works:
Hierarchy of Data Items:
In many database systems, data items are organized in a hierarchical structure.
For example, in a relational database, data might be organized into tables,
pages, and individual rows. The MGL scheme takes advantage of this hierarchy to
allow locks to be taken at different levels:
1. Coarse-granularity Locks: The locks cover large portions of the database, such
as entire tables or large segments of data. They are efficient for operations that
affect large parts of the database.
2. Fine-granularity Locks: These locks cover smaller portions of the database, such as individual
rows or small group of rows. They are useful for operations that affect specific data items.
• Lock Compatibility: Just like in other locking schemes, MGL ensures that locks are
compatible to avoid conflicts between transactions. For example, a transaction holding a
shared lock at a higher level in the hierarchy can coexist with transactions holding locks on
lower levels, but exclusive locks at any level conflict with all other locks.
• Lock Conversion: Transactions can upgrade or downgrade locks within the hierarchy as
needed. For example, a transaction holding a shared lock at a higher level might upgrade it to
an exclusive lock at a lower level to perform an update.
• Hierarchical Navigation: Transactions must navigate the hierarchy correctly when acquiring
and releasing locks. For instance, a transaction might need to acquire a table-level lock before
acquiring finer-granularity locks on specific rows within that table.
Advantages of MGL:
• Flexible and Efficient: MGL provides flexibility in choosing the appropriate level of
granularity for locks, allowing transactions to balance between fine-grained locking for
precise control and coarse-grained locking for efficiency
Implementation Considerations
Implementing MGL requires careful management of the lock hierarchy and ensuring that
transactions correctly acquire and release locks in the appropriate order to prevent deadlocks
and ensure data consistency.
In summary, the Multiple Granularity Locking scheme is a sophisticated approach to
concurrency control in database systems, leveraging hierarchical data structures to optimize
locking efficiency and maintain data integrity in multi-user environments.
Q4. Explain the ACID properties of
transactions.
In database management systems (DBMS), the ACID properties are fundamental
characteristics that guarantee reliability and consistency of transactions. Here’s
an explanation of each ACID property:
ACID Properties:
1. Atomicity:
• Definition: Atomicity ensures that a transaction is treated as a single unit of
work, which either completes fully or not at all.
• Key Points:
• If any part of a transaction fails (due to an error, system crash, etc.), the entire
transaction is aborted (rolled back), and any changes made by the transaction
are undone.
• Atomicity ensures that transactions maintain consistency in the face of failures
and errors.
2. Consistency:
• Definition: Consistency ensures that a transaction transforms the database from one
consistent state to another consistent state.
• Key Points:
• A transaction must preserve all integrity constraints and business rules before and after its
execution.
• Consistency ensures that the database remains in a valid state at all times.
3. Isolation:
• Definition: Isolation ensures that the execution of concurrent transactions results in a
system state that would be obtained if transactions were executed sequentially, one after
another.
• Key Points:
• Transactions should operate independently of each other, as if no other transactions are
concurrently executing.
• Isolation prevents transactions from affecting each other’s intermediate states, ensuring they
maintain data integrity and consistency.
4. Durability:
• Definition: Durability guarantees that once a transaction is committed, its changes are
permanent and survive system failures.
• Key Points:
• Committed changes are stored permanently in non-volatile storage (e.g., disk) and are not lost
even in the event of a system crash or restart.
• Durability ensures that the effects of committed transactions persist, allowing for recovery
and restoration of the database to its last consistent state after a failure.
• Data Integrity: ACID properties ensure that transactions maintain the correctness and
reliability of data operations.
• Concurrency Control: Isolation and atomicity prevent issues like lost updates and dirty reads
that can occur when multiple transactions access and modify data concurrently.
• Recovery and Fault Tolerance: Durability guarantees that committed changes are resilient
against system failures, allowing for reliable recovery mechanisms.
Q5. What is log file? Explain the following log-based recovery
schema
(i) Deferred DB modifications
(ii) Immediate DB modifications
Log File
A log file, often referred to as a transaction log or redo log, is a crucial component of
database systems used to ensure durability and support recovery mechanisms. It records all
changes made to the database, typically including:
• Transaction Begin and End: Recording when transactions start and commit or abort.
• Data Modifications: Details of changes (inserts, updates, deletes) made by transactions.
• System Changes: Metadata changes, such as schema modifications.
• Checkpoints: Points in the log that signify stable states of the database.
The log file allows the database management system to recover the database to a
consistent state after a crash or failure, using techniques like redoing committed
transactions and undoing incomplete transactions.
Log-Based Recovery Schema
In log-based recovery, the system uses the log file to ensure that all committed transactions are
durably stored (through redoing) and that any incomplete transactions are rolled back (through
undoing) after a crash or failure. There are two main strategies for handling database
modifications in the log-based recovery schema:
• Deferred Updates: In this approach, the modifications to the database (actual data) by a
transaction are deferred until the transaction commits.
• Log Record Structure: The log records contain both the old and new values of modified data
items before they are actually changed in the database.
• Commit Phase: When a transaction commits, the log records are flushed to disk to ensure
durability. After this, the actual database modifications (writes) are performed.
• Rollback Phase: If a transaction aborts (due to failure or rollback request), the system uses the
log to undo the changes by applying the old values recorded in the log records.
Advantages:
Disadvantages:
• Requires careful management of transaction states and log records to ensure consistency.
• Recovery might be slower since undoing changes is necessary after a crash.
• Simplifies recovery process since there’s no need to undo changes that were never applied.
• Allows for simpler management of transaction states and recovery logs.
Disadvantages:
Summary
Both deferred and immediate database modification strategies within a log-based recovery
schema ensure that transactions adhere to ACID properties and that the database remains
consistent and recoverable in case of failures. The choice between deferred and immediate
modifications often depends on performance requirements, concurrency considerations, and
the recovery time objectives of the database system.
Q6. Explain about cluster index, primary
index, secondary index with examples.
Clustered Index
A clustered index is a type of index in which the rows of the table are physically organized in the same order as the
index key. This means that the data rows are stored on disk in a sorted order based on the clustered index key. Each
table can have only one clustered index because the data rows themselves can only be sorted in one order.
Example:
Consider a table Employees with columns EmployeeID (primary key) and LastName. If EmployeeID is chosen as the
clustered index, then the rows in the Employees table will be physically sorted by EmployeeID.
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY CLUSTERED,
LastName VARCHAR(50),
FirstName VARCHAR(50),
...
);
Primary Index
A primary index is an index that is automatically created when a primary key constraint is
defined on a table. It enforces uniqueness for the primary key column(s) and provides fast
access to data rows based on the primary key values.
Example:
In the Employees table example above, EmployeeID is specified as the primary key. This
automatically creates a primary index on EmployeeID. The primary index structure would
typically store a mapping of EmployeeID values to the physical locations (disk addresses) of the
corresponding data rows in the table.
Secondary Index
Example:
Continuing with the Employees table example, let's say we frequently search employees by their
LastName. We can create a secondary index on the LastName column to speed up such queries:
This IX_LastName index will contain entries for each LastName value, pointing to the disk
locations of corresponding rows in the Employees table.
• Key Differences:
• Clustered Index: Defines the physical order of data rows on disk based on the index key.
There can only be one clustered index per table.
• Primary Index: Automatically created when a primary key constraint is defined. It enforces
uniqueness and provides fast access to data rows based on the primary key.
• Secondary Index: Created on non-key columns to improve the performance of queries that
do not use the primary key. It does not affect the physical order of data rows on disk.
• Summary
Clustered indexes are directly tied to the physical order of data rows, primary indexes enforce
uniqueness and are automatically created with a primary key constraint, and secondary indexes
provide efficient access paths based on non-key columns. Each serves specific purposes in
optimizing data retrieval and maintaining data integrity in database systems.
Q7. Explain deletion & insertion operations in
B+ Trees?
• If the leaf node becomes full (i.e., it exceeds the maximum number of keys it can hold), it
must be split:
• Create a new leaf node.
• Distribute the keys evenly between the original leaf node and the new leaf node.
• Update the linked list pointers of the leaf nodes.
• Promote the middle key (median) to the parent node to act as a separator.
Example:
Consider inserting key 30 into a B+ tree where the leaf node can hold a maximum of 4 keys.
• Before insertion: [10, 20, 40, 50]
• Insert 30, resulting in: [10, 20, 30, 40, 50]
• (overflow)Split into two nodes: [10, 20] and[30, 40, 50]
• Promote 30 to the parent node.
Deletion in B+ Trees in DBMS
1. Search for the Key:
• Traverse the tree from the root to find the leaf node containing the key to be deleted.
3. Handle Underflow:
• If the leaf node has fewer keys than the minimum required, it must be rebalanced:
• Redistribution (Borrowing): If a sibling node has more than the minimum number of
keys, borrow a key from the sibling. Adjust the keys and pointers accordingly.
• Merging: If redistribution is not possible, merge the underflowed node with a sibling. Remove
the separator key from the parent node and merge the two nodes' keys.
• Creation: Creating a new file involves allocating space for the file and setting up the necessary
metadata.
• Insertion: Adding a new record to a file. The method of insertion depends on the file organization.
• Deletion: Removing a record from a file. This may involve marking the record as deleted or physically
removing it.
• Modification: Updating an existing record in a file.
• Retrieval: Accessing records from a file based on some criteria (e.g., key, range).Scanning: Reading all
records in a file. This is often used for operations like generating reports or performing backups.
• Sorting: Arranging records in a file in a specific order based on one or more fields.
• Sequential Access: Reading records one by one in the order they are stored. Efficient for processing
entire files but not for random access.
• Random Access: Accessing records directly using an index or a key. Useful for quick retrieval but may
require additional structures like indexes.
• Indexed Access: Using an index to locate records quickly. The index can be based on primary or
secondary keys and can be implemented using structures like B+ trees or hash tables.
4. File Indexing
• Single-Level Indexing: A single index file is maintained to map keys to the locations of the
records.
• Primary Index: Based on the primary key of the file.
• Secondary Index: Based on non-primary key attributes.
• Multi-Level Indexing: A hierarchy of indexes is used to improve access times, especially for
large datasets.
• Dense Index: An index entry for every record in the data file.
• Sparse Index: Index entries only for some records. This reduces the size of the index but
may require more time to locate a record.
• Contiguous Allocation: All the blocks of a file are contiguous. Simple and fast for sequential
access but can lead to fragmentation.
• Linked Allocation: Each file block points to the next block. This allows for flexible file sizes
but can be slower for random access.
• Indexed Allocation: An index block contains pointers to the actual data blocks of the file.
This combines the benefits of contiguous and linked allocation.
6. File Maintenance
7. Transaction Logging
• Write-Ahead Logging (WAL): Ensuring that all modifications are logged before they are
applied to the database. This helps in recovering the database to a consistent state after a
failure.
• Checkpointing: Periodically saving the current state of the database to reduce the amount
of work needed during recovery.
Q10. What is Serializability? Explain conflict
and view serializability
Serializability
Serializability is a concept in database systems used to ensure that the concurrent execution of
transactions results in a database state that is consistent with the state obtained by some serial
execution of the same transactions. In other words, even though transactions are executed concurrently,
the final outcome should be the same as if the transactions had been executed one after the other in
some order.
Conflict Serializability
Conflict serializability is a type of serializability that is based on the concept of conflicts between
operations. Two operations are said to be in conflict if:
1. They belong to different transactions.
2. They access the same data item.
3. At least one of the operations is a write.
For a schedule (sequence of operations) to be conflict-serializable, it must be conflict-
equivalent to some serial schedule. Conflict equivalence means that the order of conflicting
operations is the same in both schedules. The steps to check for conflict serializability are:
1. Initial Read:
• For each data item, the transaction that reads the initial value of the data item is the same
in both schedules.
2. Final Write:
• For each data item, the transaction that performs the final write is the same in both
schedules.
3. Read-From:
• For each data item, if a transaction reads a value written by another transaction in one
schedule, it must read the same value written by the same transaction in the other
schedule.
Q11. Compare and contrast the hash based
indexing and tree based indexing?
Hash-Based Indexing vs. Tree-Based Indexing
Both hash-based indexing and tree-based indexing are common methods used in database management
systems to improve the speed of data retrieval. Here’s a detailed comparison of the two:
Hash-Based Indexing
1. Structure:
• Uses a hash function to convert a key into a location in an array (hash table).
• Each entry in the hash table points to a bucket containing records with keys that hash to the same
value.
2. Search Efficiency:
• Equality Searches: Extremely fast (O(1) average time complexity) because the hash function directly
maps keys to locations.
• Range Searches: Inefficient because hash functions do not preserve order. To perform a range search,
a full table scan is often required.
3. Insertion/Deletion:
• Generally fast (O(1) average time complexity) since it involves
calculating a hash and adding/removing a record from a
bucket.
Collisions (two keys hashing to the same value) require additional
handling, such as chaining or open addressing, which can affect
performance.
4. Space Utilization:
• Can be inefficient if the hash table is sparse or if many
collisions occur, requiring additional storage for collision
resolution.
5. Complexity:
• Simpler to implement compared to tree-based indexes.
Tree-Based Indexing
1. Structure:
• Uses balanced tree structures, commonly B-trees or B+ trees, to maintain sorted order of
keys.
• Nodes contain key-value pairs and pointers to child nodes.
2. Search Efficiency:
• Equality Searches: Efficient (O(log n) time complexity) as the tree structure allows binary
search.
• Range Searches: Very efficient because the sorted nature of trees allows for in-order
traversal.
3. Insertion/Deletion:
• Involves rebalancing the tree to maintain its balanced property (O(log n) time complexity).
This can be slower compared to hash-based indexing but ensures that search operations
remain efficient.
4. Space Utilization:
• Typically more efficient in terms of space as it maintains a balanced structure with fewer
empty nodes compared to hash tables with many collisions.
5. Complexity:
• More complex to implement due to the need to maintain balance and handle various tree
operations (insert, delete, split, merge).
Feature Hash-Based Indexing Tree-Based Indexing
Search Efficiency O(1) for quality, O(n) for O(log n) for equality and
range range
Tree-Based Indexing is ideal for scenarios that require frequent range queries or
ordered traversals. For example, searching for records within a certain date range in a log
file.
By understanding the differences and appropriate use cases for hash-based and tree-based
indexing, database designers can choose the best indexing method to optimize
performance for their specific applications.
THANK YOU