Professional Documents
Culture Documents
DD Final
DD Final
Ques: Write down the steps of transform the query to a normalized form in a query
decomposition process.
Ans:
1. Normalization: Transform query to a normalized form
2. Analysis: Detect and reject ”incorrect” queries; possible only for a subset of relational
calculus
3. Elimination of redundancy: Eliminate redundant predicate
4. Rewriting: Transform Query into RA and optimize query
Ques: Write down the properties of transaction. [OR]
Ques: Write down the ACID properties of Transaction.
Ans:
➢ Atomicity − This property states that a transaction is an atomic unit of processing, that is,
either it is performed in its entirety or not performed at all. No partial update should exist.
➢ Consistency − A transaction should take the database from one consistent state to another
consistent state. It should not adversely affect any data item in the database.
➢ Isolation − A transaction should be executed as if it is the only one in the system. There
should not be any interference from the other concurrent transactions that are
simultaneously running.
➢ Durability − If a committed transaction brings about a change, that change should be
durable in the database and not lost in case of any failure.
Ques: Why do we need data localization and mention the issues also.
Ans: We need data localization to
- Apply data distribution information to the algebra operations and determine which
fragments are involved
- Substitute global query with queries on fragments
- Optimize the global query
Data Localization Issues
➢ Various more advanced reduction techniques are possible to generate simpler and
optimized queries.
➢ Reduction of horizontal fragmentation (HF)
– Reduction with selection
– Reduction with join
➢ Reduction of vertical fragmentation (VF)
– Find empty relations
➢ Reduction with selection for HF
– Can be applied if fragmentation predicate is inconsistent with the query selection
predicate
➢ Reduction with join for HF
– Joins on horizontally fragmented relations can be simplified when the joined relations
are fragmented according to the join attributes
➢ Reduction with join for derived HF
– The horizontal fragmentation of one relation is derived from the horizontal
fragmentation of another relation by using semijoins.
➢ Reduction for Vertical Fragmentation
– Recall, VF distributes a relation based on projection, and the reconstruction operator
is the join.
– Similar to HF, it is possible to identify useless intermediate relations, i.e., fragments
that do not contribute to the result.
Ques: Write down the steps of Two-Phase Locking Protocol.
Ans:
Ques: What do you mean by Local Recovery Management (LRM) in DDBMS? [OR]
Ques: Discuss about the architecture of the Local Recovery Management System.
Ans:
Ques: How many ways of LRM to deal with update/ write operation? Describe it.
Ans:
Out-place Update
Ques: Is there any difference between Two-Phase “Locking” and Two-Phase “Commit”?
Ans: They are largely unrelated; it just happens that they have two-phase in their name.
2PL is a scheme for acquiring locks for records in a transaction; it is useful in both non-distributed
and distributed settings.
2PC is a scheme to execute a transaction across multiple machines, where each machine has some
of the records used in the transaction.
Ques: What do you know about the data localization?
Ans: Data localization takes as input the decomposed query on global relations and applies data
distribution information to the query in order to localize its data.
To increase the locality of reference and/or parallel execution, relations are fragmented and then
stored in disjoint subsets, called fragments, each being placed at a different site.
Data localization determines which fragments are involved in the query and thereby transforms
the distributed query into a fragment query.
Ques: What do you know about query optimization?
Ans: Query Optimization
➢ Query optimization is a crucial and difficult part of the overall query processing
➢ Objective of query optimization is to minimize the following cost function:
I/O cost + CPU cost + communication cost
➢ Two different scenarios are considered:
- Wide area networks
∗ Communication cost dominates: Low bandwidth, low speed, and high protocol
overhead
∗ Most algorithms ignore all other cost components
- Local area networks
∗ Communication cost not that dominant
∗ Total cost function should be considered
Ques: How do we identify and reject the type incorrect or semantically incorrect queries in
query decomposition.
Ans:
Type Incorrect
– Checks whether the attributes and relation names of a query are defined in the global schema
– Checks whether the operations on attributes do not conflict with the types of the attributes, e.g.,
a comparison > operation with an attribute of type string
Semantically Incorrect
– Checks whether the components contribute in any way to the generation of the result
– Only a subset of relational calculus queries can be tested for correctness, i.e., those that do not
contain disjunction and negation
– Typical data structures used to detect the semantically incorrect queries are:
∗ Connection graph (query graph)
∗ Join graph
Ques: What do you know about Transaction?
Ans: A transaction consists of a series of operations performed on a database. The important issue
in transaction management is that if a database was in a consistent state prior to the initiation of a
transaction, then the database should return to a consistent state after the transaction is completed.
Ans: A dirty read occurs when a transaction reads data that has not yet been committed.
For example, suppose transaction 1 updates a row. Transaction 2 reads the updated row before
transaction 1 commits the update. If transaction 1 rolls back the change, transaction 2 will have
read data that is considered never to have existed.
Ques: Write down the properties of Strict 2PL (Two Phase Locking) protocol.
Ans:
Ques: What do you know about Time-Stamp Ordering?
Ans:
A series of operation from one transaction to another transaction is known as schedule. It is used
to preserve the order of the operation in each of the individual transaction.
Example of Schedules
Ques: Write down the difference between in-place and out-place update.
Ans:
Ques: What is the concept of conceptual design and logical design?
Ans: Data Warehousing Conceptual Design
➢ It is the first step towards the design of a Data Warehouse
➢ It starts from the documentation related to the integrated database and consists of:
1. Facts definition
2. For each fact:
- attribute tree definition
- attribute tree editing
- dimensions definition
- measures definition
- hierarchies’ definition
- fact schemata creation
- glossary definition
Data Warehousing Logical Design
➢ Starting from the conceptual design it is necessary to determine the logical schema of data
➢ We use ROLAP (Relational On-Line Analytical Processing) model to represent
multidimensional data
➢ ROLAP uses the relational data model, which means that data is stored in relations
➢ Given the DFM representation of multidimensional data, two schemas are used:
- star schema
- snowflake schema
Ques: What is the difference between star schema and snowflake schema?
Ans: Difference between Star Schema and Snowflake Schema
Parameters Star Schema Snowflake Schema
Definition and meaning A star schema contains both A snowflake schema contains
dimension tables and fact all three- dimension tables,
tables in it. fact tables, and sub-
dimension tables.
Type of Model It is a top-down model type. It is a bottom-up model type.
Space Occupied It makes use of more allotted It makes use of less allotted
space. space.
Time Taken for Queries With the Star Schema, the With the Snowflake Schema,
process of execution of the process of execution of
queries takes less time. queries takes more time.
Use of Normalization The Star Schema does not The Snowflake Schema
make use of normalization. makes use of both
Denormalization as well as
Normalization.
Complexity of Design The design of a Star Schema The designing of a Snowflake
is very simple. Schema is very complex.
Query Complexity It is very low in the case of a It is comparatively much
Star Schema. higher in the case of a
Snowflake Schema.
Complexity of Understanding It is very easy to understand a It is comparatively more
Star Schema. difficult to understand a
Snowflake Schema.
Total Number of Foreign The total number of foreign The total number of foreign
Keys keys is less in the case of a keys is more in the case of a
Star Schema. Snowflake Schema.
Data Redundancy Data redundancy is Data redundancy is
comparatively higher in Star comparatively lower in
Schema. Snowflake Schema.
Ques: Why and when will we use the Three Phase Commit Protocol (3PC)?
Ans: There is a problem with 2PC Protocol. It is blocking which means:
- Ready implies that the participant waits for the coordinator
- If coordinator fails, site is blocked until recovery; independent recovery is not
possible
- The problem is that sites might be in both: commit and abort phase
To overcome this problem, Three Phase Commit Protocol (3PC) is used.
Ques: What do you know about the anomalies. Explain different types of anomalies with
example.
Ques: What do you know about the elimination of redundancy in Query Optimization?
Ans:
➢ The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
➢ It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
➢ A non-serial schedule will be serializable if its result is equal to the result of its transactions
executed serially.
Example:
Commit Protocols
Centralized Two Phase Commit Protocol
Data Warehouse
➢ A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection
of data in support of management’s decision-making process.
➢ The four keywords: subject-oriented, integrated, time-variant, and non-volatile distinguish
data warehouses from other data repository systems such as relational database systems,
transaction processing systems and file systems.
- Subject-oriented: Focuses on modeling and analysis of data for decision makers
[exclude data that are not useful]
- Integrated: Constructed by integrating multiple heterogenous sources such as
relational databases, flat files, and online transaction.
- Time-variant: Data are stored to provide information from an historic perspective
[e.g., the past 5-10 years]
- Nonvolatile: Doesn’t require transaction processing, recovery, and concurrency
control
Data Warehousing Schemas
1. Star schema: The most common modeling paradigm is the star schema, in which the data
warehouse contains a large central table (fact table) and a set of smaller attendant tables
(dimension tables).
2. Snowflake schema: The snowflake schema is a variant of the star schema model, where
some dimension tables are normalized, thereby further splitting the data into additional
tables. The resulting schema graph forms a shape similar to a snowflake.
3. Fact constellation: Sophisticated applications may require multiple fact tables to share
dimension tables. This kind of schema can be viewed as a collection of stars, and hence is
called a galaxy schema or a fact constellation.
Dimension of Data Warehouse