Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

CHAPTER 8

DATABASES

Syllabus: ER model, relational model (relational algebra, tuple calculus), database design (integrity constraints,
normal forms), query languages (SQL), file structures (sequential files, indexing, B and B+ trees), transactions and
concurrency control.

8.1 INTRODUCTION 8.1.1 Traditional File Processing Approach

File processing approach is generally more accurate


Database is a collection of organized information so and faster than the manual database system. Each
that it can easily be searched, retrieved, managed and user is responsible for the defining and ­implementation
updated. A database management system (DBMS) is of the files required for the specific application. The
a set of programs designed to manage a database. It ­implementation of required files sometimes creates
enables users to store, retrieve and modify information redundancy of data, for example, one user keeps records
in a database with utmost efficiency along with ­security for the savings account of the customer and another user
features. DBMS is applicable to various day-to-day may create the loan account of the same customer. This
fields such as transactions in banking; airline/railway/ causes the duplicity of the records of the same customer.
hotel reservations; maintenance of student information So, this practice is not feasible for real time applications.
in schools/universities; online retails; marketing and There is a need of centralized management of data. The
sales etc. It also allows its users to create their own data is created once and then accessed by different users.
­databases. Different types of DBMS are available such The data should be shared for different transactions. It
as ­hierarchical, network, relational and object oriented. should be self-describing in nature, which means the

Chapter 8.indd 403 4/9/2015 10:03:29 AM


404 Chapter 8: Databases

database system contains not only data but it describes ••Operating system software like Microsoft Win-
the description of the database structure. dows, Linux OS, Mac OS.
••DBMS software such as Oracle 8I, MySQL,
8.1.2 Database Management System ­Access (Jet, MSDE), SQL Server etc.
••Network softwares are used for sharing share the
A database is a collection of related data. Data is a col- data of database among multiple users.
lection of raw facts or figures, processed to form infor- ••Application programs are developed like C++,
mation. Database management system is a collection of
VB, dotnet etc. are used to access database in
programs for the creation and maintenance of database.
dbms. These are used to access and manipulate
It is an efficient and reliable approach to retrieve data
the data in the database.
for many users. It provides various functions such as:
1. Redundancy control: It provides redundancy 2. Hardware: It consists of all system’s physical
by removing duplicity of data by following rules of devices such as computers, storage devices, I/O
normalization. channels, electromechanical devices etc. It also
2. Data independence: It provides independence includes peripherals, such as, keyboard, mouse,
to application programs from details of data repre- modems, printers, etc.
sentation and storage. It also provides an abstract 3. Data: It is the collection of facts. The database
view of the data to insulate application code from contains the data and the metadata.
such details. 4. Procedures: There are the instructions and rules
3. Data integrity: It promotes and enforces some to design and use the database system. These
integrity rules for reducing data redundancy and includes the following:
increasing data consistency. ••Steps for the installation of DBMS
4. Concurrency control: It supports sharing of ••Steps to use the DBMS or application program
data, so, it has to provide an approach for ­managing
••Steps for the backup of DBMS
concurrent access of the database. Hence, preserv-
ing the inconsistent state and integrity of the data. ••Steps to change the structure of DBMS
5. Transaction management: It provides an ••Steps for the generation of reports.
ap­proach to ensure that either all the updates for a
5. Data access language: The users can use it to
given transaction will execute or that none of them
access the data to and from the database. The func-
would execute.
tion of data access language is the entry of new data,
6. Backup and recovery: It provides mechanisms
manipulation of the existing data and the retrieval of
for backing up data periodically and recovering
the existing data in the database. The most popular
from different types of failures, thus, preventing
database access language is SQL (Structured Query
loss of data.
Language). Users can perform these functions with
7. Non-Procedural query language: It provides
the help of commands. The role of administrator is
with query language for retrieval and manipulation
to access, to create and to maintain the database.
of data.
6. People: Persons involved to access, to create and
8. Security: It protects unauthorized access in the
to maintain the database are called users. These
database. It ensures the access to authorized users.
are of various types according to the role performed
by them (Fig. 8.1). These are as follows:
8.2 COMPONENTS OF DATABASE ••System Administrator: The role of system
SYSTEMS a­ dministrator is to supervise the general opera-
tions of DBMS.
••Database Administrator: The role of database
DBMS consists of several components, namely software,
hardware, data, procedures and data access language. administrator (DBA) is to manage the DBMS.
These components are responsible for the definition, col- ••Database Designer: The role of database
lection, management and use of data within the envi- ­designer is to design the structure of the database.
ronment. Figure 8.1 shows the components of database ••Application Programmer: The role of
system. The description of each component is as follows: ­application programmer is to create the data
1. Software: It is the collection of programs used ­entry forms, reports and procedures.
by the computers within the database system. It is ••End User: The role of end user is to use the
used to handle, control and manage the database. ­application programs by entering new data and
It includes the following software: manipulating and accessing existing data.

Chapter 8.indd 404 4/9/2015 10:03:29 AM


8.4 DATA MODELS 405

1. Internal schema or physical level: It is also


End Database Application
administrator programmer called Database Tier. Database exists in this tier. It
user
also includes query processing languages, all relations
and their corresponding constraints. It describes the
physical storage structure of the database.
2. Conceptual schema or logical level: It is also
Query called Application Tier. Server and program exists
processing in this tier. It also describes the structure of the
database.
3. External schema or view level: It is also called
Database Presentation Tier. End user exists in this tier. An
manager
end user is capable of multiple views of database.
It also includes all views generated by applications.
File
manager 8.3.2 Data Independence

Data independence is defined, as the change at one level


does not affect the higher level. It is of two types:
Database Metadata
1. Logical data independence: It is defined, as
the change in conceptual schema does not affect
Figure 8.1 |   Components of database systems. the external schema. For example, if the format of
the table changes, the data lies in the table should
not be changed.
2. Physical data independence: It is defined, as
8.3 DBMS ARCHITECTURE
the change in internal schema does not affect the
conceptual schema. Thus, does not affect the exter-
It is an approach to outlook the database by users. It is nal schema. For example, if the storage system has
means for the representation of data in an understand- been changed, then it does not affect the logical
able way to the users. DBMS architecture can be used structure of the database.
to divide the whole system to related and independent
modules. It can be of 1-tier, 2-tier 3-tier or n-tier. 8.4 DATA MODELS
8.3.1 3-Tier Architecture
Data model is a collection of tools, which describes data,
DBMS can be most widely used as 3-tier architecture. In relationships, constraints and semantics. It gives the logi-
this architecture, the database is divided into three tiers cal structure of the database. It describes the relationship
depending upon the kind of users (see Fig. 8.2). of data in the database. Data models are of various types:
1. Relational Model: It is a collection of relations
(tables). In relational model, each table is stored as
a separate file.
View level 2. Entity—Relationship data model: This model
is based on the notion of real-world entities and
relationship among them.
3. Object-Based data model: It defines the
­database as objects, its properties and its ­operations.
Logical level In this model, objects with the ­similar structures
comprise a class. The classes are ­organized into
hierarchies. The operations on these classes are
performed through methods.
4. Semi-Structured data model: It is also known
Physical level as XML model. This model is used to exchange
data over the web. It uses hierarchical tree struc-

Figure 8.2 |   The 3-tier architecture of DBMS.


tures. In this model, data can be represented as
elements by using tags.

Chapter 8.indd 405 4/9/2015 10:03:30 AM


406 Chapter 8: Databases

5. Network model: In this model, data is repre- Table 8.1 |   Notations of fundamental operations
sented as record types. The data in this model has of relational algebra
many-to-many relationship.
6. Hierarchical model: In this model, the data is Fundamental Operation Symbol
represented as a hierarchical tree structure. Select s
Project p
8.4.1 Relational Model
Union ∪
The database in relational model is represented as a col- Intersection ∩
lection of relations (tables). A relation is a kind of set. It Set difference −
is also a subset of a Cartesian product of an unordered
set of ordered tuples. Relational model was proposed by Cartesian product ×
E. F. Codd, which stores data in a tabular form. It con- Rename r
sists of a table where rows represent records and columns Natural join 
represent the attributes. It has various terminologies as
follows:
Fundamental operations of relational algebra are as
1. Tuple: It represents a single row of a table, which follows (see Table 8.1):
contains a single record for that relation.
1. Select: It is used to select rows from a relation. It
2. Relation instance: It represents a finite set of
is denoted by s.
tuples in the relational database system.
3. Relation schema: It represents the relation Syntax of select s p (r), r is relation and p is prepo-
name, that is, table name, attributes and their sitional logic.
names. p uses connectors and operators Ù, Ú, =, ≠,<,>, £, ³.
4. Relation key: It represents the unique key for For example, s empname = "John" (emp).
the relation or table. Each row has one or more
attributes, which can identify the row in the table 2. Project: It is used to project columns in a ­relation.
uniquely. It is denoted by p . The duplicate tuples are auto-
5. Attribute domain: It represents the predefined matically eliminated.
value scope of each attribute. Syntax of projects p A (r), r is relation and A is the
attribute name in a relation.
8.4.1.1 Constraints in Relational Model For example, p empname, sal (emp)

Constraints are the restrictions that one wishes to apply 3. Union: It returns a relation instance, which con-
on database. The following constraints are applied on tains all tuples occurring in the first relation or in
relational model. the second relation. It is denoted as R∪S , where R
and S are two relations. The duplicate tuples are
1. Key constraints: Each relation has at least one automatically eliminated.
minimal subset of attributes, which can identify a This operation is valid for the following:
tuple uniquely. ••Both relations must have the same number of
••No two tuples have identical value for key at-
attributes.
tributes. ••Attribute domains must be compatible.
••Key attribute does not have NULL value.
4. Intersection: It returns a relation instance,
2. Domain constraints: Attributes have specific which contains all tuples occurring in both
domain values in real world. For example, value of ­relations. It is denoted as R ∩ S , where R and S
age can only be positive. are two relations.
3. Referential integrity constraints: If a relation
5. Set difference: It returns a relation instance,
refers to a key attribute of a different relation then
which contains all tuples that occur in the first
that key element must exist.
relation but not in the second relation. It is denoted
as R − S , where R and S are two relations.
8.4.1.2 Relational Algebra 6. Cartesian product: It returns a relation
instance, which contains all the fields of the first
It is a procedural query language. It takes instances of relation ­followed by all the fields of the second
relations as input and returns instances of relations as relation. It is denoted as R×S , where R and S are
output. Operators are used to perform queries. two relations.

Chapter 8.indd 406 4/9/2015 10:03:35 AM


8.4 DATA MODELS 407

7. Rename: It returns a relation but without any { t | Condition}, wherte t is a tuple variable and Condition
name. It is used to rename the output relation. It is a conditional expression.
is denoted as r. Preliminaries of tuple calculus are as follows:
8. Joins: It returns combined information from two
or more relations. 1. Constants
2. Predicates
••Condition joins: It accepts a join condition c 3. Boolean and, or, not
and a pair of relation instances as arguments, 4. $ there exists
and returns a relation instance. It is denoted as 5. " for all
s c (R×S ) .
••uijoin: It is a special case of condition joins
8.4.2 ER Model
where the condition c contains equalities.
••Natural join: It is a Cartesian product of two
ER model represents the conceptual view of a database.
relations. It is denoted by . It describes the relation of data to each other (Table 8.2).
It was developed by Peter Chen in 1976. It views real-
8.4.1.3 Tuple Calculus world data as systems of entities and relationships.
ER model has three basic elements: entity, attribute
It is a non-procedural query language. In this, number and ­relationship. These are discussed in the following
of tuple variables is specified. It is represented as sections.

Table 8.2 |   Entity-relationship objects


Element Symbol Example
Entity
Student Employee

Weak entity
Order Order item

Attribute
Name

Simple attribute
Order

Composite attribute Last_name


First_name

Name

Derived attribute
Date−of−birth Age

Single-valued attribute
Roll_number

Multi-value attribute
Phone_number

(Continued)

Chapter 8.indd 407 4/9/2015 10:03:40 AM


408 Chapter 8: Databases

Table 8.2 |   Continued


Element Symbol Example

Relationship

One-to-One 1 1
relationship Department has Manager

One-to-Many
1 N
relationship Manager manages Employee

Many-to-One N 1
relationship Employee works Department

Many-to-Many N N
relationship Employee handles Projects

8.4.2.1 Entity For example, Details of Employee’s spouse, order item, etc.

Entities represent the real-world things. These are data Order Order item
objects which maintain different relationships with each
other, for example, Employee, Department, etc. These
are represented by means of rectangles.
8.4.2.2 Attributes

Attribute is a property, trait or characteristic of an


entity, relationship or another attribute. All attributes
have values. A domain or range of values can be assigned
For example, to attributes. For example, name, class, age are attributes
of the entity student. These are represented by ovals.
Student Employee

Entity set is a collection of similar types of entities. For


example, all Employees set may contain all employees
of all departments. For example,
Weak entity depends on the existence of another
entity. A weak entity cannot be identified by its own
attributes. It uses a foreign key combined with its attri- Name
butes to form the primary key. It is represented by
means of double rectangles.
There are different types of attributes, listed as follows:

1. Simple attribute: Simple attributes contains


atomic values. Atomic values cannot be divided

Chapter 8.indd 408 4/9/2015 10:03:42 AM


8.4 DATA MODELS 409

into sub parts. Examples are mobile number, roll For example contact number, email ids.
number.

Order Phone−number

2. Composite attribute: Composite attributes are


composition of many simple attributes. For exam- 8.4.2.3 Relationships
ple address can be divided into house number,
street number, locality and city. It represents the association among entities in a specified
way. For example, employee entity has relation works at
with department entity. Relationships are represented
First−name Last−name by diamond-shaped boxes.

Name

3. Derived attribute: Derived attributes are those Some basic terminologies related to relationship are given
whose value is derived from some other attribute below.
in the database. For example, age of person can be As we have discussed, relations are the core compo-
calculated from date of birth. Dotted oval is used nents in RDBMS, these relations are defined by two
to represent derived attributes. major characteristics–relationship set and the degree of
relationship, defined in the following text.
1. Relationship set: Relationship of similar type
is called relationship set. It has attributes. These
attributes are called descriptive attributes.
2. Degree of relationship: It defines the number
For example, of participating entities in a relationship. They are
of the following types:
••Unary relationship (Degree 1): One entity
­participates. For example,
Date−of−birth Age
Manages

4. Single-valued attribute: Single valued attri- Managed by


butes contain only one value for that attribute. For
example age for person, blood group. Employee

Roll−number ••Binary relationship (Degree 2): Two entities


participate. For example,

5. Multi-value attribute: In this, an attribute may Received


contain more than one value. Multiple values are Customer Invoice
represented by double ovals.
Sent to

••Ternary relationship (Degree 3): Three entities


participate. For example,

Chapter 8.indd 409 4/9/2015 10:03:45 AM


410 Chapter 8: Databases

of generalization. For example, employees are


Student ­specialized as salaried and contract as shown in
Fig 8.3.

Specialization
Course Registration Faculty Employee

••n-ary relationship (Degree n): n entities ­participate.

3. Cardinality: It defines the number of instances of Contract


Salaried
an entity, which can be associated to the number of
Generalization
instances of other entity via relationship set.
••One to one: One instance of an entity is associ- Figure 8.3 |   Specialization and Generalization.
ated with at most one instance of another entity 6. Aggregation: It allows a relationship set partici-
with the relationship. It is represented as `1-1’. pate in another relationship set (see Fig. 8.4).
For example,

1 1
Department has Manager Employee works Department

••One to many: One instance of an entity is


a­ ssociated with more than one instance of
­another entity with the relationship. It is repre- handles Projects
sented as `1-N’. For example,

1 N Figure 8.4 |   Aggregation.


Manager manages Employee

••Many to one: More than one instance of an 8.5 DATABASE DESIGN


e­ ntity is associated with another entity with
the relationship. It is represented as `N-1’. For
­ The design of a database consists of the following steps:
example,
1. Identifying entities
N 1 2. Identifying relationships
Employee works Department 3. Identifying attributes
4. Presenting entities and relationships: Entity rela-
••Many to many: More than one instance of an
tionship diagram (ERD)
5. Assigning keys
entity is associated with more than one instance
6. Defining the attribute’s data type
of another entity with the relationship. It is
7. Normalization
­represented as `N-N’. For example,

8.5.1 Integrity Constraints


N N
Employee handles Projects
Integrity constrains are applied to maintain the consis-
tency in a database. This helps in providing the unique
4. Generalization: It is a collection of entity sets, answer to a given query on the database. For example,
having similar characteristics, brought together if the answer to particular query is `x’ then it should be
into one generalized entity. For example, ­salaried `x’ if such a query is carried out again (without adding/
and contract employees are generalized as deleting/modifying) on the same table.
employee. 1. Domain integrity: It defines a valid set of values
5. Specialization: It is the process of identifying for an attribute, for example, length or size, data
­subsets of an entity set. It is a reverse process type, etc.

Chapter 8.indd 410 4/9/2015 10:03:46 AM


8.5 DATABASE DESIGN 411

2. Entity integrity constraint: It defines that Also, sometimes called, Project-Join Normal Form
the primary keys cannot be null. There must be a (PJNF).
proper value in the primary key field.
3. Referential integrity constraint: It is speci- 8.5.3 Attribute Closure
fied between two tables. It is used to maintain
the consistency among rows between the two Set of all attributes functionally determined by X is
tables. called closure of X. Closure set of X is denoted by X+.
4. Foreign key integrity constraint: There are
two types of foreign key integrity constraints:
Problem 8.1: Functional dependencies F = {A → B,
••Cascade update related fields: Whenever the B → C, C → D} is given, find closure of A, B, C
primary key of a row in the primary table is and D.
changed, the foreign key values are updated in
the matching rows in the related table. Solution:
••Cascade delete related rows: Whenever a row in
Closure of A = A+ = (A, B, C, D)
the primary table has been deleted, the match-
ing rows are automatically deleted in the related Closure of B = B+ = (B, C, D)
table. Closure of C = C+ = (C, D)
Closure of D = D+ = (D)
8.5.2 Normal Forms

Normalization is a technique of organizing the ­database 8.5.4 Key


tables, such that they have minimum redundancy.
Following are the normal forms that are to be achieved It is a minimum set of attributes used to differentiate all
for the process of normalization. The underlying ­concept the tuples of the table.
is that, if we say a table to be satisfying an `x’ level
normal form, then it is understood that it has also satis-
fied the `x-1’ level normal form. 8.5.4.1 Superkey
1. First normal form: It defines that all the attri- It is a set of attributes that uniquely identifies each
butes in a relation must have atomic domains. record in a table. It is a superset of candidate key. In
2. Second normal form: It defines that every non- other words, let R be the schema and X be the set of
prime attribute should be fully functionally depen- attributes over R. If closure of X (X+) determines all the
dent on prime key attribute. A prime attribute is attributes of R, then X is called superkey.
a part of prime key. The relation must be in the
First normal form.
3. Third normal form: It defines that no non- Problem 8.2: Consider a schema R(ABCDE) and
prime attribute is transitively dependent on prime FDs {AB → C, C → D, B → E}. Find superkey.
key attribute. The relation must be in the Second
normal form. Solution:
4. Boyce—Codd normal form: It defines that for
any non-trivial FD, X → A, then X must be a
Find closure set of (A, B, C, AB).
superkey. It is an extension of the Third normal Closure of A = A+ = {A}
form. Closure of B = B+ = {B, E}
5. Fourth normal form: It defines that for every
Closure of C = C+ = {C, D}
multivalued dependency X →→ Y that holds
over R, one of the following statements is true: Closure of AB = AB+ = {A, B, C, D, E}
(a) Y is the subset of X and (b) X is a superkey. So, AB is the superkey.
Also, sometimes called, Multi-valued Dependency
Normal Form (MDNF).
6. Fifth normal form: It denies that for every join
dependency  {R1, …, Rn} that holds over R, 8.5.4.2 Candidate Key
one of the following statements is true: (a) Ri = R
and (b) the join dependency is implied by the set It is an attribute or set of attribute that can act as a pri-
of those FDs over R in which the left side is a key. mary key of a table that uniquely identifies each record

Chapter 8.indd 411 4/9/2015 10:03:47 AM


412 Chapter 8: Databases

in a table. Every candidate key is a superkey but not (a) Algorithm for finding decomposition is lossless:
vice versa.
 tep  1:  Union of all decomposed sub-relation
S
In other words, candidate key is the minimal super- should be equal to relation R.
key. If X is a superkey and none of the proper subset of
X is a superkey, then X is called the minimal superkey R1 ∪ R2 ∪ R3 ∪ … ∪ Rn = R
or candidate key.
 tep  2:  Any two sub-relations Ri and Rj can be
S
merged into Rij with R1 ∪ R2 only if

Problem 8.3: Consider a schema R(ABCDE)      i. Ri ∩ Rj ≠ ∅ (null)


and FDs {AB → C, C → D, B → EA}. Find the
    ii. Ri ∩ Rj = X then closure of X(X + ) ⊇ Ri
superkey.
or
Solution: Ri ∩ R j = X then closure of X(X + ) ⊇ Rj
Find closure set of (A, B, C, AB)
 tep 3: Repeat step 2 until `N’ relations become
S
Closure of A = A+ = {A} one relation. If `N’ relations become single relation,
Closure of B = B+ = {B, E, A, C, D} then composition is called lossless, otherwise not.

Closure of C = C+ = {C, D}
Problem 8.4: Consider a schema R (ABCDEFGHIJ)
Closure of AB = AB+ = {A, B, C, D, E}
and functional dependencies {FDs = (AB → C, A →
So, AB and B are superkeys. But only B is the can- D, B → F, F → GH, D → IJ)} and decompositions
didate key. (a) {D = (ABCDE, BFGH, DIJ)}
(b) {D = (ABCD, DE, BF, FGH, DIJ)}
Check whether the decomposition is lossless or not.
8.5.4.3 Primary Key
Solution:
It is one or more data attributes that uniquely identify (a) Given
an entity. It does not allow null values. R1 = (ABCDE)
1. Alternate key: The candidate key, which is not R2 = (BFGH)
selected as a primary key.
2. Composite key: It consists of two or more R3 = (DIJ)
attributes. Apply algorithm for lossless decomposition:
3. Foreign key: It is an entity that is the reference
to the primary key of another entity. Step 1:
R1 ∪ R2 ∪ R3 = {(ABCDE ) ∪ (BFGH ) ∪ (DIJ )} = (ABCDEFGHIJ
8.5.5 Decomposition R1 ∪ R2 ∪ R3 = {(ABCDE ) ∪ (BFGH ) ∪ (DIJ )} = (ABCDEFGHIJ ) = R

It is required to eliminate redundancy from the schema. Step 1 satisfies the given condition, so it is true.
If a relational schema R has redundancy in the data, Step 2:
then decompose R into two R1 and R2 schema. There For R1 and R2:
are two properties, which should be maintained when
we perform decomposition, it should be a lossless join as R1 ∩ R2 = (ABCDE ) ∩ (BFGH ) = B
well as dependency preserving.
Find closure of B = B+ = {B, F, G, H} .
Condition (ii) is satisfied, so R1 and R2 can be
8.5.5.1 Lossless Join
merged together. After merging R1 and R2,
Let R be a relation schema and let F be a set of func- R12 = (ABCDEFGH)
tional dependency (FD) over R. R is decomposed into R1
and R2. R1 and R2 are called lossless-join decomposition Now merge R12 and R3.
if R = R1  R2 or if we can recover original relation
from the decomposed relation. R12 ∩ R3 = (ABCDEFGH ) ∩ (DIJ ) = D

Chapter 8.indd 412 4/9/2015 10:03:49 AM


8.5 DATABASE DESIGN 413

Find closure of D = D+ = {D, I, J }. Condition (ii) of step 2 is satisfied, so R134 and R5


can be merged together. After merging R134 and R5
Condition (ii) is satisfied, so R12 and R3 can be
merged together. After merging R12 and R3 R1345 = (ABCDFGHIJ )

R123 = (ABCDEFGHIJ ) For R1345 and R2

So, R123 = R, therefore, it is lossless decomposition. R1345 ∩ R2 = (ABCDFGHIJ ) ∩ (DE ) = D

(b) Given that Find closure of D = D+ = {D, I, J }.


R1 = (ABCD) Condition (ii) of step 2 does not satisfy, so R1345
R2 = (DE) and R2 cannot be merged together.

R3 = (BF) So, finally we left with two decompositions which


cannot be merged into a single relation. So condi-
R4 = (FGH) tion given in step 3 does not satisfy, therefore, it is
R5 = (DIJ) not lossless decomposition.
Apply algorithm for lossless decomposition.
Step 1:
8.5.5.2 Dependency Preserving
R1 ∪ R2 ∪ R3 ∪ R4 ∪ R5 = {(ABCD) ∪ (DE ) ∪ (BF ) ∪ (FGH ) ∪ (DIJ )} = (ABCDEFGHIJ ) = R
{(ABCD) ∪ (DE ) ∪ (BF ) ∪ (FGH ) ∪ (DIJ )} = (ABCDEFGHIJ ) = R All the dependency should be preserved after the decom-
position of schema R.
Step 1 satisfies the given condition, so it is true.
Let R be the relational schema with FD set F decom-
Step 2: posed into R1, R2, R3, …, Rn with FD sets F1, F2, F3, …,
For R1 and R2 Fn, respectively. The decomposition is said to be depen-
dency preserved if
R1 ∩ R2 = (ABCD) ∩ (DE ) = D
F1 ∪ F2 ∪ F3 ∪ … ∪ Fn = F
Find closure of D = D+ = {D, I, J }.
Condition (ii) of step 2 does not satisfy, so R1 and And composition is called non-dependency preserved if
F1 ∪ F2 ∪ F3 ∪ … ∪ Fn ⊂ F
R2 cannot be merged together.
For R1 and R3
(a) Algorithm to check dependency is preserved or not:
R1 ∩ R3 = (ABCD) ∩ (BF ) = B Step 1: Find all the FDs of sub-relations.
Find closure of B = B+ = {B, F, G, H}.  tep 2: Check that all the FDs of relation F is
S
covered by FDs of sub-relation in any form directly
Condition (ii) of step 2 is satisfied, so R1 and R3 or indirectly.
can be merged together. After merging R1 and R3
R13 = (ABCDF)
For R13 and R4 Problem 8.5: Consider a schema R(ABCD) and FDs
set F = (A → B, B → C, C → D, D → A) and decom-
R13 ∩ R4 = (ABCDF ) ∩ (FGH ) = F positions D = (AB, BC, CD). Check decomposition is
dependency preserving or not.
Find closure of F = F + = {F, G, H }
Solution:
Condition (ii) of step 2 is satisfied, so R13 and R4
can be merged together. After merging R13 and R4 Step 1: Find all the FDs of sub-relations.

R134 = (ABCDFGH) R1 = (AB) R2 = (BC ) R3 = (CD)


For R134 and R5 Direct A→B B→C C→D
Dependency
R134 ∩ R5 = (ABCDFGH ) ∩ (DIJ ) = D
Indirect B→A C→B D→C
Find closure of D = D+ = {D, I, J } Dependency

Chapter 8.indd 413 4/9/2015 10:03:51 AM


414 Chapter 8: Databases

To calculate indirect dependency, use closure property. Functional Covered by F


In relation R1, we have indirect dependency B → A. Dependency
We have to first find closure of all attributes in that
decomposition (like in R1 we have A and B) by using B→C Direct covered
FD set F and then check that we are getting attribute A→C Indirect covered (A+ = ABC ),
A in closure of B. So closure of B = B + = ABCD. So we so A can determine B and C.
can write that indirectly as we have three dependencies
{B → A, B → C and B → D} but only B → A is valid (b) G covers F
for R1 as we have only two attributes (A and B) in
­relation R1. Functional Dependency Covered by G
 tep 2: Check that all the FDs of relation F is cov-
S A→B Direct covered
ered by FDs of sub-relation in any form directly or
indirectly. B→C Direct covered

(a) A → B (Directly covered by sub-relation FD)


In our example, F covers G and vice versa. So F = G.
(b) B → C (Directly covered by sub-relation FD)
(c) C → D (Directly covered by sub-relation FD)
(d) D → A (Indirectly covered by sub-relation FD by
using {D → C then C → B and then B → A}). 8.6 QUERY LANGUAGES (SQL)
All the dependency is preserved, so it is dependency-
preserved decomposition. SQL stands for Structured Query Languages, which
is a standard computer language for relational data-
8.5.5.3 Relation between Two Functional base management and data manipulation. It is used to
Dependency (FD) Sets query, insert, update and modify data in the table. Some
common RDBMS that use SQL are Oracle, Microsoft
Let F and G be two FD sets: SQL Server, Access, Ingres, Sybase, etc. Raymond Boyce
and Donald Chamberlin developed it in the early 1970s at
1. Set F and G are equal if and only if closure of IBM, but commercially released by Relational Software
F(F +) = Closure of G(G +) Inc. (now, Oracle Corporation) in 1979. Looking into the
2. Set F and G are equal only if both of the below history it was the software named- VULCAN, that was
conditions satisfy: procured by Ashton Tate and then by FoxPro and later
••F covers G: All FD in G is logically implied by F. was purchased by Microsoft. Other popular softwares
••G covers F: All FD in F is logically implied by G.
were/are — Clipper and Gupta Technologies.

3. If all FD in G is logically implied by F but all FD 8.6.1 SQL Commands


in F is not logically implied by G then G ⊂ F.
The SQL commands are used to interact with relational
databases. These commands can be classified into the
Problem 8.6: Let there be two FD sets F and G, following groups (see Tables 8.3 and 8.4):
given as follows:
1. Data Definition Language (DDL): It con-
F = {A → B, B → C } tains metadata, that is, data about data. All the
G = {A → B, AB → C, B → C, A → C } integrity constraints and data base schemas are
defined through DDL. These commands are used
Find the relation between both FDs.
to create, modify and delete database objects.
CREATE, ALTER, TRUNCATE and DROP are
Solution:
part of DDL. TRUNCATE command is used to
(a) F covers G: delete complete data from an existing table, while
DROP command is used to remove a table defini-
Functional Covered by F tion and all data, indexes, triggers and constraints
Dependency for a table.
A→B Direct covered ••Syntax of create command
AB → C +
Indirect covered (AB = ABC ), CREATE TABLE table_name( column1
so AB can determine C. datatype, column2 datatype, column3

Chapter 8.indd 414 4/9/2015 10:03:51 AM


8.6 QUERY LANGUAGES (SQL) 415

datatype, ..... columnN datatype,


Table 8.4 |   Clauses in SQL
PRIMARY KEY( one or more columns ) );
For example, create table instructor Clause Description
(INST_ID char(5), name varchar(20),
From Equals to cross product
dept_name varchar(20), salary
numeric(8,2)); Where Selects the tuples which satisfies the
condition
••Syntax of alter command
Group by Groups the table based on specified
ALTER TABLE table_name ADD column_ attribute
name datatype;
Having Used to select groups based on condition
ALTER TABLE table_name DROP COLUMN
column_name;
ALTER TABLE table_name MODIFY COLUMN
2. Data Manipulation Language (DML):
column_name datatype;
DML is used to manipulate the data in database.
ALTER TABLE table_name ADD CONSTRAINT It allows to insert, update and delete data items.
Constraint UNI (column1, column2...); SELECT, INSERT, DELETE and UPDATE
ALTER TABLE table_name ADD CONSTRAINT are part of DML. Control statements BEGIN
Constraint CHECK (CONDITION); TRANSACTION, SAVEPOINT, COMMIT and
ALTER TABLE table_name ADD CONSTRAINT ROLLBACK are also part of DML.
PrimaryKey PRIMARY KEY (column1, ••Syntax of select command
column2...);
SELECT * FROM table_name;
For example, ALTER TABLE CUSTOMERS ADD
GRADE char(1); SELECT column1, column2, columnN
FROM table_name;
••Syntax of truncate command For example, SELECT empno, ename FROM
TRUNCATE TABLE table_name; emp;
For example, TRUNCATE TABLE emp; ••Syntax of insert command
••Syntax of drop command INSERT INTO TABLE_NAME (column1,
DROP TABLE table_name; column2, column3,...columnN)] VALUES
(value1, value2, value3,...valueN);
For example, DROP TABLE emp;
INSERT INTO TABLE_NAME VALUES
(value1,value2,value3,...valueN);

Table 8.3 |   SQL commands


For example, INSERT
into emp (empno,
ename, sal, dept) VALUES (100, ABC,
Command Description 10000, Accounts);
CREATE Creates a new table, a view of a ••Syntax of delete command
table or other objects in the database
DELETE FROM table_name WHERE [condi-
ALTER Modifies an existing database (table) tion];
DROP Deletes a table, a view of a table or For example, DELETE FROM emp WHERE
other objects in the database empno = 8;
SELECT Retrieves data from one or more ••Syntax of update command
tables (database)
INSERT Inserts new data record in the UPDATE table_name SET column1 =
database value1, column2 = value2....,
columnN = valueN WHERE [condition];
UPDATE Modifies data in the database
For example, UPDATE mp SET sal = 12000
DELETE Deletes data from the database WHERE empid = 8;
GRANT Gives a privilege to the user
3. Data Control Language (DCL): To assign or
REVOKE Takes back privileges granted from
revoke access rights data control language is used.
the user
GRANT and REVOKE are used for DCL.

Chapter 8.indd 415 4/9/2015 10:03:51 AM


416 Chapter 8: Databases

••Syntax of grant command in logarithmic time. B trees are a general form of binary
trees where a node can have more than one child.The
GRANT privilege_name ON object_name
B-trees are efficient for those systems that read and write
TO {user_name | PUBLIC | role_name}
large blocks of data, that is, databases and file systems.
[with GRANT option];
B trees are self-balancing trees. All the leaf nodes are
For example, GRANT SELECT ON emp TO
at the same level. A B tree with order p has:
user1;
1. Root node may contain minimum 1 key
••Syntax of revoke command  p
2. Minimum number of child =   − 1
REVOKE privilege_name ON object_name  2
3. Maximum number of children = p
FROM {User_name | PUBLIC | Role_name};
4. Maximum keys = p − 1
For example, REVOKE SELECT ON emp TO
user1; The order of B-tree can be found as follows:
p × P + ( p − 1) (K + Pr ) ≤ Block size
8.7 FILE STRUCTURES where p is order of the tree, P is the block pointer, Pr is
the record pointer and K is the key pointer.
File structure mainly deals with how files are stored on
the disk. Various file organisations are described below. Problem 8.7: The order of B-tree index is the maxi-
mum number of children it can have. Suppose that
8.7.1 Sequential Files a block pointer takes 6 bytes, the search field value
takes 9 bytes, record pointer takes 7 bytes and the
It is a file organisation system where every file record block size is 512 bytes. What is the order of the B
contains an attribute to uniquely identify a particular tree?
record. Records are placed in a sequential order with
some unique key. Solution:
Given that block size = 512 B; record pointer (Pr) =
8.7.2 Indexing 7 B; block pointer (PB) = 6 B; key pointer (K) = 9
B. So, we have
Indexing is a data structure mechanism to efficiently
retrieve records from the database, for example, book p ×P + ( p − 1) (K + Pr ) ≤ Block size
index. It is defined based on its indexing attributes.
Indexing is of three types: 6p + (p − 1)(9 + 7) = 512

1. Primary index: Indexing is based on ordering p = 24


key field of file.
2. Secondary index: Indexing is based on non-
ordering field of file. 8.7.4 B+ Trees
3. Clustering index: Indexing is based on ordering
non-key field of file. It supports multilevel indexing. The leaf nodes of B+
Ordering field is the field on which the records of file are tree represent actual data pointers. It ensures all leaf
ordered. Ordered indexing is of two types: dense index nodes are balanced, that is, at the same height. Leaf
and sparse index. nodes are linked using link list.
1. Dense index: For every search key value in the B+ tree structure is such that B+ tree is of order n
database, there is an index record. Index record has and it is fixed for every B+ tree.
two parts: search key value and the pointer. The The internal nodes contain at least [n/2] pointers,
pointer is pointed to the actual record. except the root node and at most n pointers. Leaf nodes
2. Sparse index: In this no index record is created contain at least [n/2] record pointers, at least [n/2] key
for every search key. values, at most n record pointers, at most n key values,
and every leaf node contains one block pointer P to point
8.7.3 B Tree to next leaf node and forms a linked list.
Order of internal nodes can be calculated as:
p ×PB + ( p − 1)×k ≤ Block size
A B tree is a data structure that stores data in such a
manner that search, insertions and deletions can be done

Chapter 8.indd 416 4/9/2015 10:03:53 AM


8.8 TRANSACTIONS AND CONCURRENCY CONTROL 417

Order of leaf nodes: (no transaction will affect the existence of any
Pleaf ×[k + Pr ] + PB ≤ Block size other transaction).
4. Durability: Persistence of data even if the system
where p is the order of internal nodes, Pr is record fails and restarts.
pointer, PB is block pointer, k is key pointer, Pleaf is the
order of leaf nodes.
8.8.2 Schedule
Problem 8.8: The order of an internal node in a B+
A chronological execution sequence of transactions is called
tree index is the maximum number of children it can
schedule. It is a list of actions reading, writing, aborting
have. Suppose that a block pointer takes 6 bytes, the
or committing from a set of transactions. A schedule can
search field value takes 9 bytes, record pointer take
have many transactions. Schedule can be further divided
7 bytes and the block size is 512 bytes. What is the
into two types–serial schedule and concurrent schedule.
order of the internal node and leaf node?

Solution: 8.8.2.1 Serial Schedule


Given that block size = 512 B, record pointer (Pr)
A schedule is called serial schedule if all transactions
= 7 B, block pointer (PB) = 6 B and key pointer
are executed in a non-interleaving manner. Let there are
(k) = 9 B.
two transactions T1 and T2 schedule S then schedule S
Order of internal nodes: is called serial schedule if one transaction executes after
p ×PB + ( p − 1) k £ Block size another shown in Fig. 8.5.
6p + (p − 1)9 = 512
p = 34 T1 T2 T1 T2

Order of leaf nodes: Read (X) Read (X)


X=X−Z X=X+W
Pleaf × éëk + Pr ùû + PB £ Block size Write (X)
Write (X)
Pleaf [9 + 7] + 6 = 512 Read (X)
Read (Y )
Y=Y+Z X=X−Z
Pleaf = 31 Write (X)
Write (Y )
Read (X) Read (Y )
8.8 TRANSACTIONS AND X=X+W Y=Y+Z
CONCURRENCY CONTROL Write (X) Write (Y )

Figure 8.5 |   Serial schedule.


Transactions are series of read and write operations on
database. When many users access same database at the 8.8.2.2 Concurrent Schedule
same time, some control mechanism is required by data-
bases to stay in consistent state. Following sections will A schedule is called concurrent schedule if transactions
provide description about transactions and their control are executed in an interleaving manner or simultaneous
mechanism. execution of two or more transactions. Let there be two
transactions T1 and T2 in a schedule S, then schedule S
8.8.1 Transactions is called concurrent schedule if both the transactions exe-
cute parallel. One of the scenarios is shown in Fig. 8.6.
It is a series of reads and writes of database objects.
It maintains the integrity of a database while running T1 T2
multiple concurrent operations. Transactions have four
properties that a DBMS must ensure to maintain data. Read (X)
These are known as ACID properties: X = X−Z
Read (X)
1. Atomicity: Either all actions are carried out or X=X+W
none (no partial transaction). Write (X)
2. Consistency: After the execution of transaction, Write (X)
Read (Y)
the database must be in a consistent state (no con- Y =Y+Z
current execution of other transactions). Write (Y)
3. Isolation: All the transactions are carried out
and executed as the only transaction in the system Figure 8.6 |   Concurrent schedule.

Chapter 8.indd 417 4/9/2015 10:03:54 AM


418 Chapter 8: Databases

8.8.2.3 Comparison between Serial and Example: Let value of data item A = 1000, X =
Concurrent Schedule 100 and Y = 200, then transaction Ti will update
the database with value 1100. Transaction Tj will
Table 8.5 shows the comparison between serial and con- read data item value as 1100. Transaction Tj will
current schedule. update database with value 1300. As transaction Ti
fails, the database value should be 1200. Therefore,
Table 8.5 |   Serial schedule vs. concurrent schedule this problem is known as uncommitted read prob-
lem or dirty read problem.
Serial Schedule Concurrent Schedule
2. RW problem (Write after Read): RW
All serial schedules are All serial schedules are problem is also known incorrect summary prob-
consistent schedules not consistent schedules lem or unrepeatable read problem (Fig. 8.8). Let
Less throughput More throughput there be two transactions Ti and Tj of schedule S.
Poor resource utilization Good resource utilization Transaction Ti read a data item and Tj also read
similar data item. Transactions Ti and Tj both
More response time Less response time have write operation on that data item. If in both
For the given For the given transactions, read operation occurs before commit
transactions, the number transactions, the number of the other transaction and before writing back
of serial schedule is of concurrent schedule is that data item in database by other transaction
very much less than the more than the number of then RW problem will arise.
number of concurrent serial schedule
schedule Transaction Ti Transaction Tj
Read (A)
8.8.2.4 Problems Occurring due to If (A > 0)
Concurrent Schedule A=A+1
Read (A)
In concurrent schedule, more than one transaction is
If (A > 0)
executed simultaneously; and due to this some problem
A=A+1
arises with concurrent schedule given as follows:
Write (A)
1. WR problem (Read after Write): WR ­problem Commit Write (A)
is also known as dirty read problem or uncommit- Commit

Figure 8.8 |   Incorrect summary problem


ted read problem. Let there be two transactions Ti
and Tj of schedule S. If transaction Tj reads a data
(RW problem)
item which is written by Ti, but till that time trans-
action Ti is not committed, then WR problem can
occur. If in transaction Ti rollback or failure occurs, Example: Let value of data item A = 100.
then database will be inconsistent due to uncom- Transaction Ti will read data item with value 100
mitted read by Tj transaction (Fig. 8.7). and update database with value 101. Transaction
Tj will read data item value as 100 because it will
Transaction Ti Transaction Tj read before updating value of data item A by trans-
action Ti . Transaction Tj will update database
Read (A)
with value 101. But data item `A’ value should
A=A+X
be 102 because both the transaction have commit
Write (A) Dirty Read successfully. Therefore, this problem is known as
unrepeatable read problem.
Read (A)
Write (B) 3. WW Problem (Write after Write): WW
A=A+Y problem in concurrent schedule is also known as
Read (B)
lost update problem (Fig. 8.9). Let there are two
transactions Ti and Tj of schedule S. Transaction
Ti reads a data item and updates it. Now, trans-
action Tj also writes similar data item with some
Failure or rollback
other value and transaction Tj commits success-
Write (A) fully. After commit of transaction Tj, transaction
Figure 8.7 |   Uncommitted read problem
Ti will rollback. Thus, updated value of data item
by transaction Tj will be lost.
(WR problem).

Chapter 8.indd 418 4/9/2015 10:03:56 AM


8.8 TRANSACTIONS AND CONCURRENCY CONTROL 419

Transaction Ti Transaction Tj may exist in cascading rollback recoverable schedule.


Problems of no recoverability, Uncommitted Read prob-
Read (A) lem (WR problem) and cascading rollback problem do
Write (A) not occur in such schedule.

Write (A) 8.8.3.4 Strict Recoverable Schedule


Commit
Let a schedule S have two transactions Ti and Tj . If S is
Failure and Rollback
called a strict recoverable schedule then it satisfies these
Figure 8.9 |   Lost update problem (WW problem). two conditions:
1. Schedule S should be cascading rollback recover-
Example: Let the value of data item A=100. able schedule.
Transaction Ti read data item and update data- 2. If one transaction Ti writes a data item `A’, then
base with value 200. Let transaction Tj updated the other transaction Tj is not allowed to write on
database with value 250 and commit successfully. that data item.
After committing transaction Tj , transaction Ti Only incorrect summary problem (RW problem) may
rollback which set data item value to its initial arise in strict recoverable schedule. Irrecoverable,
value 100. So, this problem is known as lost update Uncommitted Read problem (WR problem), lost update
problem. and cascading rollback problems do not occur in such
schedule.
8.8.3 Classification of Schedule Based
on Recoverability 8.8.3.5 Summary of Schedules Based
on Recoverability
8.8.3.1 Irrecoverable Schedule

Let there be two transactions Ti and Tj of schedule S. If Schedule Problem Exists Problem
transaction Tj reads a data item which is updated by Remove
transaction Ti and transaction Tj commits before the
commit (or rollback) of transaction Ti , then the given Irrecoverable All None
schedule S is called irrecoverable schedule. Recoverable Incorrect summary Irrecoverable
problem (RW),
Uncommitted Read
8.8.3.2 Recoverable Schedule problem (WR
problem), lost
Let a schedule S have two transactions Ti and Tj . If
update (WW) and
transaction Tj reads a data item which is updated by
cascading rollback
transaction Ti and transaction Tj is not allowed to
problem
commit (or rollback) before the commit (or rollback) of
transaction Ti , then the given schedule S is called recov- Cascading Incorrect summary Irrecoverable,
erable schedule. Recoverable schedule may suffer from rollback problem (RW) and Uncommitted
uncommitted read, lost update and incorrect summary recoverable lost update (WW). Read problem
problem. (WR problem)
and cascading
rollback
8.8.3.3 Cascading Rollback Recoverable problem.
Schedule
Strict Incorrect summary Irrecoverable,
Let there be four transactions (T1 , T2 , T3 , T4 ) in a given recoverable problem (RW) Uncommitted
schedule S. In schedule S, if rollback of a transaction (T1 ) Read problem
results in the rollback of the other transactions (T2 , T3 , T4 ) (WR problem),
(because of their dependency on each other), then this is lost update
called cascading rollback. (WW) and
cascading
If a schedule is recoverable and has no cascading roll- rollback
back, then it is called cascading rollback recoverable problem.
schedule. Incorrect summary and lost update problems

Chapter 8.indd 419 4/9/2015 10:04:00 AM


420 Chapter 8: Databases

Problem 8.9: A concurrent schedule S has three Step 3: Check for strict recoverable. There
transactions T1 , T2 , T3. Transactions execute read/write should be commit operation between write operations
operation in the following sequence: on similar data items by two different transactions. In
the given problem, transaction T2 and transaction T3
Read1(x), Read2 (z), Read3 (x), Read1(z), Read2 (y), have write operations on data item (y), but no commit
Read3 (y), Write1(x), Commit1 , Write2 (z), Write3 (y), operation is performed by transaction T3 before data
item is updated by transaction T2. Hence, the given
Write2 (y), Commit3 , Commit2 .
schedule is not strict recoverable.
Find out the type of recoverable schedule?

Solution: 8.8.4 Classification of Schedule Based


on Serializability
Transaction Transaction Transaction
T1 T2 T3 A concurrent transaction schedule is said to be a serial-
izable schedule if its outcome (resulting database state)
Read1(x) ----- ----- is equal to the outcome of the serial execution of trans-
actions in the given concurrent schedule. Serializable
----- Read2 (z) ----- schedule can be divided into two types: conflict serializ-
ability and view serializability.
----- ----- Read3 (x)

Read1(z) ----- ----- 8.8.4.1 Conflict Serializability


-----
----- Read2 (y) A concurrent schedule S is said to be conflict serial-
izable schedule if S is conflict equivalent to a serial
----- ----- Read3 (y) schedule.
1. Conflict equivalent schedule: Let there be
Write1 ( x ) ----- ----- two schedules S1 and S2. If after swapping con-
secutive non-conflict pairs of operation of transac-
Commit1 ----- ----- tions in schedule S1 results in schedule S2, then
S1 and S2 schedules are called conflict equivalent
----- Write2 (z) ----- schedules.
2. Conflict and non-conflict pairs: A pair of
----- ----- Write3 (y) operations in a schedule is called conflict schedule
if they have all three conditions:
----- Write2 ( y ) ----- •• At least one of the operations should be write
operation.
----- ----- Commit3 •• Both the operations should be performed on the
same data item.
----- Commit2 ----- •• Both operations should be performed by differ-
ent transactions.

Step 1: Check for recoverable. By using the defi- If a pair of operations do not fulfill the above three con-
nition of recoverable schedule we can find that given ditions, then the pair is call a non-conflict pair.
schedule is recoverable or not. (In this problem, no
transaction has read operation on a data item after
that data item is written by another transaction, so Problem 8.10: A schedule S1 with two transactions
the given schedule is recoverable.) T1 and T2 is given as:
Step 2: Check for cascading recoverable. In Read1(x), Write1(x), Read2 (x), Read1(y), Write1(y),
cascading recoverable schedule, no uncommitted read
Read2 (y)),....
is allowed. In the given problem, there is no uncommit-
ted read so the given schedule is cascadless schedule. Find conflict equivalent schedule to S1 .

Chapter 8.indd 420 4/9/2015 10:04:07 AM


8.8 TRANSACTIONS AND CONCURRENCY CONTROL 421

Solution: (i) Vi : Read(x) operation and V j : Write(x) operation


Schedule S1: (ii) Vi : Write(x) operation and V j : Read(x) operation
Transaction T1 Transaction T2 (iii) Vi : Write(x) operation and V j : Write(x) operation
Read1 (x)
Conflict Pair
Write1 (x) Problem 8.11: Consider the given schedule S and
draw precedence graph for S.
Read2 (x)

Read1 (y) Transaction T1 Transaction T2


Non-Conflict
Read1 (x)
Pair
Write1 (y) ------

Read2 (y) Write1 (x) ------

Conflict and non-conflict pairs in S1. ------ Read2 (x )


After swapping non-conflict pair in schedule S1 we get Write1 (y ) ------
another schedule called S2 . The schedules S1 and S2
are conflict equivalent schedules. ------ Read2 (y )
Schedule S2:

Solution:
Transaction T1 Transaction T2
Vertices V1 and V2 represent transactions T1 and T2.
Read1 (x) ------ Edge 1 for
V2 : Read(x) operation and V1 : Write (x) operation
Write1 (x) ------
Edge 2 for
Read1 (y )
V1 : Write(y) operation and V2 : Read (y) operation
------

Read2 (x )
1
------

Write1 (y ) ------ V2

Read2 (y )
V1
------
2
Conflict equivalent schedules.

Note: If two schedules S1 and S2 are given then they


3. Testing method of conflict serializable will be conflict equivalent if they satisfy following
schedule: To check that a given schedule is con- conditions:
••Precedence
flict serializable or not, we use precedence graph. If
graph of S1 and S2 should be equal.
precedence drawn from a schedule does not contain
••Each transaction operation should be equal in
a cycle then only it is a conflict serializable sched-
ule. If precedence graph contains a cycle then it is S1 and S2. (The two transaction will be equal of
not a conflict serial schedule. A precedence graph they have same number of operations and type
G contains: of operation are also same.)
(a) Vertex set V: denotes the transactions of the
schedule.
(b) Edge between two vertex from Vi to V j will
be draw where i ¹ j and Vi’s operation occur
Problem 8.12: Consider the given schedule S and
identify that schedule S is conflict serial schedule or
before V j’s operation, if Vi and V j have follow-
not by the use of precedence graph.
ing cases:

Chapter 8.indd 421 4/9/2015 10:04:14 AM


422 Chapter 8: Databases

Transaction Transaction Transaction 1. If transaction Ti reads the initial value of data


T1 T2 T3 item (x) in schedule S1, then transaction Ti must
read the initial value of (x ) in schedule S2 .
Read (x) ------ ------ 2. In schedule S1 , if transaction Ti reads the value
Write (x) ------ ------ of data item (x ) produced by transaction Tj then
Ti must reads the value of (x ) that produced by
------ Read (x) ------ transaction Tj in the schedule S2 .
------ ------ Write (x) 3. If transaction Ti write the final value of data item
(x ) in schedule S1 , then transaction Ti must write
------ Write (y) ------ the final value of (x ) in schedule S2 .
Write (y) ------ ------
Problem 8.13: Identify whether the given concurrent
------ ------ Read (y) schedule is a view serializable schedule or not?

Solution: T1 T2 T3
Vertices V1, V2 and V3 represent transactions T1, T2 ------ ------ Read (x)
and T3.
------ Read (x) ------
Edge 1 for
V1 : Write(x) operation and V2 : Read (x) operation ------ ------ Write (x)
Edge 2 for Read (x) ------ ------
V2 : Read (x) operation and V3 : Write(x) operation Write (x) ------ ------
Edge 3 for
V2 : Write(y) operation and V1 : Write(y) operation
Solution: We have a concurrent schedule S1 with
Edge 4 for three transactions, so the number of possible serializ-
V1 : Write(y) operation and V3 : Read (y) operation able schedules with three transactions is 8 (23 = 8).
Edge 5 for So, we will pick one serializable schedule S2 and if
V2 : Write(y) operation and V3 : Read (y) operation the given concurrent schedule is be view equivalent
to that schedule, only then would the given schedule
will be view serializable.
3
V1
T1 T2 T3
1
V2 ------ ------ Read (x)
4
------ Read (x) ------
2 and 5 ------ ------ Write (x)
Read (x) ------ ------
V3
Write (x) ------ ------

Precedence graph drawn from the given schedule con- Schedule S1


tains cycle with vertices V1 and V2. So, the given
schedule is not a conflict serializable schedule. T2 T3 T1
Read (x)

8.8.4.2 View Serializability Read (x)


If a given concurrent schedule is view equivalent to one Write (x)
of the possible serial schedule, then it is called a view
serializable schedule. Let there be two schedules S1 and Read (x)
S2 with the similar set of transactions. Schedules S1 and Write (x)
S2 will be view equivalent if they satisfy following three
conditions: Schedule S2

Chapter 8.indd 422 4/9/2015 10:04:21 AM


8.8 TRANSACTIONS AND CONCURRENCY CONTROL 423

1. Two Phase Locking Protocol (2PL): Two


Step 1: Initial Reads. Transactions T3 and T2 read
phase locking protocol is used by the transaction
initial value of data item (x ) in schedule S1 (will not
to lock a data item before reading and writing to
consider T1 because it reads updated value of x by
maintained consistency in the database. The proto-
T3). T3 and T2 also read the initial value of (x ) in
col uses two phases to apply the locking scheme as:
•• Expanding phase: Transaction acquired locks in
serial schedule S2. So, the first condition is true.
Step 2: W-R Conflicts. In the given schedule S1, this phase and no locks are released.
there is only one W-R conflict (between T3 and T1). •• Shrinking phase: Transaction releases locks in
Transaction T1 reads the value of x produced by this phase and no locks are acquired.
transaction T3 in schedule. In schedule S2, transaction Note: Two phase locking protocol ensure conflict
T1 also reads the value of x produced by transaction serializability, but may not be free from irrecover-
T3 in schedule. So, the second condition is also true. able, deadlock and starvation.
Step 3: Final Write. Transaction T1 writes the 2. Two Phase Locking Protocol with Lock
final value of data item (x ) in schedules S1 and S2 Up-gradation: This technique allows transaction
both. So, the third condition is also satisfied. holding a shared lock on a data item can upgrade
shared lock into the exclusive lock on the data item
All the three conditions are satisfied, so the given con-
without performing unlock for shared lock.
current schedule S1 is view equivalent to one of the
3. Strict Two Phase Locking Protocol: In this
possible serial schedule S2. Therefore, the given con-
protocol, if a transaction has exclusive lock on a data
current schedule is a view serializable schedule.
item, then it cannot release the lock until commit
or rollback is performed by that transaction.
4. Rigorous Two Phase Locking Protocol:
8.8.5 Concurrency Control Protocol In this protocol, if a transaction has any lock
(exclusive or shared) on a data item, then it cannot
Concurrency control protocols are used to ensure serial- release the lock until commit or rollback performed
izability of transactions in a concurrent transaction exe- by that transaction.
cution environment. They can be lock based protocols
and Time stamp ordering protocols.
8.8.5.2 Time Stamp Ordering Protocol
8.8.5.1 Lock Based Protocol
The time stamp ordering protocol uses time stamp values
(unique) of the transactions assigned by database to ensure
Lock based protocols uses a mechanism through which a
serializability among transactions. A time stamp is a
transaction has to obtain a read lock (for read operation)
unique value assigned by the database in ascending order.
and write lock (for write operation) on a data item (x)
Suppose we have three transactions T1, T2 and T3 arriving
without obtaining a lock transaction not allowed to per-
in database, then assigned time stamp values will be as
form any operation. There are two types of lock mecha-
T1 < T2 < T3 . A data item has two types of time stamp.
nisms supported by lock based protocol.
1. Shared lock: If a transaction T has shared lock 1. Read Time Stamp (x): It is the highest trans-
on a data item (x), then it can only perform Read action time stamp that has performed Read opera-
operation on that data item. tion successfully on a data item (x). It is also
2. Exclusive lock: If a transaction T has exclusive denoted by RTS(x). Initial value of RTS(x) is zero.
lock on a data item (x), then it can perform Read 2. Write Time Stamp (x): It is the highest
or Write any operation on that data item. transaction time stamp that has performed Write
operation successfully on a data item (x). It is also
Lock Compatibility Matrix is given in Table 8.6, which
denoted by WTS(x). Initial value of WTS(x) is
shows that if a transaction has shared/exclusive lock
zero.
then for which lock it can make a request.
3. Basic Time Ordering Protocol
Table 8.6 |   Lock compatibility matrix •• Transaction Ti issues Read(x) operation
Lock Hold Shared Exclusive (a) If (WTS(x) > Time Stamp( Ti ))
Lock Lock Lock Rollback Ti
Request (b) Otherwise allowed execution of Read(x) by
Shared Lock Yes No transaction Ti
Exclusive Lock No No Set RTS(x) = Maximum (RTS(x),
Time Stamp ( Ti ))

Chapter 8.indd 423 4/9/2015 10:04:23 AM


424 Chapter 8: Databases

•• Transaction T issues Write(x) operation to inspect all the operation performed by trans-
(a) If (RTS(x) > Time Stamp( Ti )) actions in a schedule. There are various deadlock
Rollback Ti prevention scheme which uses time stamp ordering
mechanism.
(b) If (WTS(x) > Time Stamp( Ti )) 6. Wait-Die Protocol: Assume TimeStamp (T1 ) < … < TimeSt
Rollback Ti TimeStamp (T1 ) < … < TimeStamp (Tn ) and Ti and Tj are any two
Note: Basic Time Ordering protocol ensures seri- transactions and transaction Ti is older than trans-
alizability, equivalent serial schedule based on time action Tj . This implies
stamp Ordering. It is deadlock free protocol, but
basic time ordering protocol can have starvation TimeStamp (Ti ) < … < TimeStamp (Tj ).
and irrecoverable schedule (because it does not •• If transaction Ti required a lock which is hold
ensure order of commit). by transaction Tj, then transaction Ti is allowed
4. Strict Time Ordering Protocol: In this pro- to wait.
tocol, an additional condition is added in the Basic •• If transaction Tj required a lock which is hold by
Time Ordering protocol. This condition is as follows: transaction Ti then transaction Tj will rollback.
Let a schedule have two transactions Ti and Tj 7. Wound-Wait Protocol: Let TimeStamp (T1 ) < … < TimeSt
where time stamp of ( Ti ) is less than timeTimeStamp
stamp (T1 ) < … < TimeStamp (Tn ) and Ti and Tj are any two
of Tj . If transaction Tj issues Read(x)/ Write(x) transactions and transaction Ti is older than trans-
operation with WTS(x) < Time Stamp (Tj ) then action Tj . This implies
(Tj ) has to be delayed Read(x)/ Write(x) opera-
tion until commit or rollback of transaction Ti TimeStamp (Ti ) < TimeStamp (Tj )
that has performed Write(x). •• If transaction Ti required a lock which is hold
Strict Time Ordering Protocol ensures serializability by transaction Tj then rollback transaction Tj.
and deadlock free. But it may suffer from starvation. •• If transaction Tj required a lock which is hold
5. Deadlock Prevention: To prevent deadlock by transaction Ti then transaction Tj is allowed
situation in a database system it is very important to wait.

IMPORTANT FORMULAS

1. All the 3NF and BCNF decomposition guarantee 5. CP: Child Pointer
lossless join.
6. KV: Key Value
2. All 3NF and BCNF decomposition may not guar-
7. Order of internal node in B+ tree is
antee dependency preservation.
PI . CP + (PI − 1)KV ⇐ Block size
3. Any relation with two attributes is in BCNF.
4. Functional dependency F: X → Y implies that for
any two tuples if t1[X ] = t2[X ], they must have
t1[Y ] = t2[Y ].

Inference Rules:
1. Reflexivity 5. Decomposition
If Y ⊆ X, then X → Y If X → YZ, then X → Y and X → Z
2. Augmentation 6. Pseudo-transitivity
If X → Y, then XZ → YZ for any Z If A → B and BC → D, then AC → D
3. Transitivity 7. Number of rows in cross product = r1 × r2
If X → Y and Y → Z, then X → Z
8. Number of attributes in cross product = m + n
4. Union
If X → Y and X → Z, then X → YZ 9. Complete set of operations are {s, p, ∪, −, ×}

Chapter 8.indd 424 4/9/2015 10:04:27 AM


SOLVED EXAMPLES 425

10. In clustering index, file is ordered on non-key field. 12. Different kind of keys — Primary Key, Secondary
Key, Candidate Key, Alternate key, Foreign Key,
11. Secondary indexes are created on unordered file on Superkey, Compound key (or Composite key)
either key or non-key field. (or concatenated key).

SOLVED EXAMPLES

1. Which of the following operations does not modify Solution: Lost update, inconsistent data and
the database? uncommitted (WR) problems are overcome by
database lock.
(a) Sorting (b) Insertion at the beginning
Ans. (d)
(c) Append (d) Modify
6. Database language may consist of the following?
Solution: Sorting does not involving addition/
(a) DDL and DML
insertion or modification of any element.
(b) DML
Ans. (a)
(c) Query language
2. Which of the following operations is based on rela- (d) DDL, DML, query language
tional algebra?
Solution: DDL, DML, query language are all data
(a) rename and union base languages.
(b) select and union Ans. (d)
(c) only union
7. Which of the following is not a function of DBA?
(d) rename, select and union
(a) Network maintenance
Solution: Rename, select and union are some of (b) Routine maintenance
the operations of relational algebra. (c) Defining the schema
Ans. (d) (d) Data access through authorization
3. ACID properties of a transaction are Solution: The role of database administrator is to
(a) atomicity, constant, integrity, durability. manage the DBMS.
(b) atomicity, consistency, isolation, durability. Ans. (a)
(c) atomicity, compact, in-built, durability. 8. In a relational database the category type of infor-
(d) automatically, consistency, isolation, database. mation is represented in

Solution: ACID properties are atomicity, consis- (a) tuple. (b) field.
tency, isolation and durability. (c) primary key. (d) database name.
Ans. (b) Solution: Field shows type of information in rela-
tional database.
4. Which normal form is concerned with removing
Ans. (b)
transitive dependency?
9. Which key is used to identify a tuple uniquely?
(a) 1NF (b) 2NF
(c) 3NF (d) BCNF (a) Primary key  (b) Tuple key
(c) Unique key  (d) Domain key
Solution: 3NF is also the highest normal form.
Solution: Primary key.
Ans. (c) Ans. (a)
5. The concept Database Lock is used to overcome
10. An association of the information is represented by
the problem of
(a) an attribute. (b) a relationship.
(a) lost update and inconsistent data.
(c) a normal form. (d) the records.
(b) uncommitted dependency and inconsistent data.
(c) inconsistent data only. Solution: Relationship shows association of
(d) lost updates, inconsistent data and uncommitted information.
dependency. Ans. (b)

Chapter 8.indd 425 4/9/2015 10:04:27 AM


426 Chapter 8: Databases

11. In which stage of database design, all the necessary Solution: It provides data storage and responsive-
fields and their types of a database are listed? ness to queries.
Ans. (a)
(a) Data definition (b) Data field definition
(c) E-R diagram  (d) User definition 17. If D1, D2, … Dn are domains in a relational
model, then the relation is a table, which is a
Solution: In data definition stage, all the fields
subset of
and their types are listed.
(a) D1 ⊕ D2 ⊕ … ⊕ Dn (b) D1 × D2 × … × Dn
(c) D1 ∪ D2 ∪ … ∪ Dn (d) D1 ∩ D2 ∩ … ∩ Dn
Ans. (a)

12. The maximum length of an attribute of type text is


Solution: D1 × D2 × …× Dn
(a) 127. (b) 255. Ans. (a)
(c) 256. (d) It is a variable.
18. Which normal form is considered adequate for
Solution: 255 normal relational database design?
Ans. (b)
(a) 2NF (b) 5NF
13. When a transaction has completed all its opera- (c) 4NF (d) 3NF
tions and is waiting for commit or rollback action,
the state of transaction is Solution: Decomposition in BCNF is not always
(a) partially commit. (b) ready commit. lossless and dependency preserving. So, 3NF is
(c) half commit. (d) commit and rollback. considered to be most adequate form in relational
database.
Solution: The state of that transaction is par- Ans. (d)
tially commit.
19. Let R = (A, B, C, D, E, F) be a relation scheme with
Ans. (a) the following dependencies C → F, E → A, EC → D,
14. Which of the following statement is not false? A → B. Which of the following is a key for R?

(a) Time stamp protocols avoid deadlock. (a) CD   (b) EC (c) AE   (d) AC
(b) Locking technique is used to avoid deadlock.
(c) 2Phase locking does not provide serializability. Solution: To find key, we try to find closure.
(d) 2Phase deals with input and storing phase. (CD)+ = {C, D, F}
Solution: Time stamp protocol is a deadlock free (EC)+ = {A, B, C, D, E, F}
protocol. (AE)+ = {A, B, E}
(AC)+ = {A, B, C, F}
Ans. (a)
15. Data integrity control makes use of for
maintaining the integrity of database. Only EC satisfies the closure property.
Ans. (b)
(a) specific alphabets (uppercase and lowercase
alphabets) 20. Give the following relation instance:
(b) passwords
(c) relational algebra x y z
(d) storing on a backup hard disk 1 4 2
1 5 3
Solution: It promotes and enforces some integ- 1 6 3
rity rules for the reducing data redundancy and 3 2 2
increasing data consistency.
Ans. (a) Which of the following functional dependencies are
16. Data warehouse provides satisfied by the instance?

(a) transaction responsiveness. (a) XY → Z and Z → Y


(b) storage, functionality responsiveness to queries. (b) YZ → X and Y → Z
(c) demand and supply responsiveness. (c) YZ → X and X → Z
(d) storage of transactions. (d) XZ → Y and Y → X

Chapter 8.indd 426 4/9/2015 10:04:27 AM


GATE PREVIOUS YEARS’ QUESTIONS 427

Solution: Functional dependency should uniquely maximum and minimum sizes of the join, respec-
identify the value. From the given options only the tively, are
following satisfies this condition:
(a) m + n and 0. (b) mn and 0.
XY → Z, YZ → X, Y → Z AND Y → X (c) m + n and |m − n|. (d) mn and m + n.
Ans. (b)
Solution: Number of tuples in their join gets mul-
21. Relation R is decomposed using a set of functional tiplied with number of tuples in both the relations.
dependencies F, and relation S is decomposed So, the maximum and minimum tuples will be mn
using another set of functional dependencies G. and 0, respectively.
One decomposition is definitely BCNF, the other is Ans. (b)
definitely 3NF, but it is not known which is which.
To make a guaranteed identification, which on one 23. Given the relations
of the following tests should be used on the decom-
Employee(name, salary, deptno) and depart-
positions? (Assume that the closures of F and G
ment (deptno, deptname, address), which
are available.)
of the following queries cannot be expressed
(a) Dependency-preservation using the basic relational algebra operations
(b) Lossless-join (s, p, , E, ∩, −)?
(c) BCNF definition (a) Department address of every employee
(d) 3NF definition (b) Employees whose name is the same as their
department name
Solution: All the 3NF and BCNF decomposition (c) The sum of all employees’ salaries
guarantee lossless join. All 3NF and BCNF decom- (d) All employees of a given department
position does not guaranty dependency preservation.
According to this, only option (c) can be correct. Solution: Aggregate functions such as sum, avg,
Ans. (c) min and max cannot be expressed in terms of
basic relational algebra operations. These require
22. Consider the join of a relation R with a relation extended relational algebra.
S. If R has m tuples and S has n tuples then the Ans. (c)

GATE PREVIOUS YEARS’ QUESTIONS

1. Which of the following scenarios may lead to an 2. Consider the following SQL query:
irrecoverable error in a database system?
SELECT distinct a1,a2,K,an a
(a) A transaction writes a data item after it is read FROM r1,r2,L,rm r
by an uncommitted transaction. WHERE P
(b) A transaction reads a data item after it is read
by an uncommitted transaction. For an arbitrary predicate P, this query is equiva-
(c) A transaction reads a data item after it is lent to which of the following relational algebra
­written by a committed transaction. expressions?
(d) A transaction reads a data item after it is (a) Πa1 ,a2 ,…,a n s p (r1 × r2 ×  × rm )
­written by an uncommitted transaction.
(GATE 2003: 1 Mark) (b) Πa ,a , K ,an s p (r1  r2  L  rm)
1 2

Solution: In option (d), a problem will occur. (c) Πa ,a


1 2
, K , an s p (r1 ∪ r2 ∪ L ∪ rm )
Suppose first transaction A is reading data written
by second transaction B, but B has not committed (d) Πa ,a
1 2
, K , an s p (r1 ∩ r2 ∩ L ∩ rm )
yet. If A acts on that data and eventually commits,
but B rollback. A will have “acted on" data that (GATE 2003: 1 Mark)
was never committed. So it will lead to irrecover-
able error in database. Solution: P is projection that by default selects
Ans. (d) distinct value of attributes. Cross product will

Chapter 8.indd 427 4/9/2015 10:04:29 AM


428 Chapter 8: Databases

consider all the tables r1, r2, … rn. Row (s ) selects Grades: (Roll_number, Course_number, Grade)
all the rows satisfying predicate P. SELECT distinct Name
Ans. (a) FROM Students, Courses, Grades
3. Consider the following functional dependencies in a WHERE Students.Roll_number = Grades.
database: Roll_number
Date_of_birth → Age and Courses.Instructor = Shyam
Age → Eligibility and Courses.Course_number = Grades.
Name → Roll_number Course_number
Roll_number → Name and Grades.grade = A
Course_number → Course_name Which of the following sets is computed by the
above query?
Course_number → Instructor
(Roll_number, Course_number) → Grade (a) N ames of students who have got an A grade in
all courses taught by Shyam.
The relation (Roll_number, Name, Date_of_
(b) Names of students who have got an A grade in
birth, Age) is
all courses.
(a) In Second normal form but not in Third normal (c) Names of students who have got an A grade in
form at least one of the courses taught by Shyam.
(b) In Third normal form but not in BCNF (d) None of the above.
(c) In BCNF (GATE 2003: 2 Marks)
(d) In none of the above
(GATE 2003: 2 Marks) Solution: Use of distinct selects the name only
once. Due to distinct selected names will be unique,
Solution: The applicable FDs are: so one person will be selected only once, in spite of
Date_of_birth → Age getting grade A in any number of courses taught
Age → Eligibility by Shyam.
Ans. (c)
Name → Roll_number
Roll_number → Name 5. Consider three data items D1, D2 and D3, and the fol-
The candidate keys are lowing execution schedule of transactions T1, T2 and
(Roll_number, Date_of_birth) and (Name, T3. In the diagram, R(D) and W(D) denote the actions
Date_of_birth) reading and writing the data item D, respectively.
For candidate key the FD Date_of_birth → Age
T T2 T3
is a partial dependency.
So, the relation is in 1NF. R(D3);
Candidate keys for given relation are: (date of R(D2);
birth, name) and (date of birth, roll_number)
W(D2);
Data of Birth → Age
R(D2);
Age (non-prime) is partially dependent upon Date
of Birth which is prime attribute. According to R(D3);
2NF definition, every non-prime attribute should R(D1);
be fully functional dependent on key, so it is not in W(D1);
2NF. Therefore, it will not be in 3NF and BCNF
W(D2);
also.
Ans. (d) W(D3);
R(D1);
4. Consider the set of relations shown below and the
SQL query that follows: R(D2);
W(D2);
Students: (Roll_number, Name, Date_of_
birth) W(D1);
Courses: (Course number, Course_name,
Instructor) Which of the following statements is correct?

Chapter 8.indd 428 4/9/2015 10:04:30 AM


GATE PREVIOUS YEARS’ QUESTIONS 429

(a) The schedule is serializable as T2; T3; T1. present in (Student * Enroll), where `*’ denotes
(b) The schedule is serializable as T2; T1; T3. natural join?
(c) The schedule is serializable as T3; T2; T1.
(a) 8, 8 (b) 120, 8
(d) The schedule is not serializable.
(c) 960, 8 (d) 960, 120
(GATE 2003: 2 Marks)
(GATE 2004: 1 Mark)
Solution: Let us draw a flow diagram between
the three schedules. An edge is drawn between two Solution: The maximum and minimum number
schedules if there exists read-write or write-write of tuples presented in (Student * Enroll) would be
dependency between them. represented by the minimum of these 120 and 8,
that is, min(120, 8) = 8
Natural join has inbuilt condition of equality. So in
T2 both the relations, minimum and maximum tuples
T3 will be 8.
Ans. (a)
8. It is desired to design an object-oriented employee
record system for a company. Each employee has
a name, unique id and salary. Employees belong
T1 to different categories and their salary is deter-
mined by their category. The functions getName,
getId and compute salary are required. Given the
The schedule has cycle between T1 and T3. So, the class hierarchy below, possible locations for these
schedule is not serializable. ­functions are:
Ans. (d)
(i) getId is implemented in the superclass
6. Let R1(A, B, C, D) and R2(D, E) be two ­relation (ii) getId is implemented in the subclass
schema where the primary keys are shown   (iii) getName is an abstract function in the
­underlined, and let C be a foreign key in R1 refer- superclass
ring to R2. Suppose there is no violation of the   (iv) getName is implemented in the superclass
above referential integrity constraint in the corre-   (v) getName is implemented in the subclass
sponding relation instances r1 and r2. Which one of   (vi) getSalary is an abstract function in the
the following relational algebra expressions would superclass
necessarily produce an empty relation?   (vii) getSalary is implemented in the superclass
(a) ΠD (r2 ) − ΠC (r1 ) (b) ΠC (r1 ) − ΠD (r2 ) (viii) getSalary is implemented in the subclass

(c) ΠD (r1  C ≠D r2 ) (d) ΠD (r1  C = D r2 )


(GATE 2004: 2 Marks)

(GATE 2004: 1 Mark)


Employee

Solution: C is foreign key in relation R1 refer-


ring to relation R2. So, C would not contain any
­additional value that is not present in D. So ΠC(r1) Manager Engineer Secretary
− ΠD(r2) will return no value.
Returns all the tuples which are in ΠC(r1) but not
in ΠD(r2). Choose the best design
Ans. (b)
(a) (i), (iv), (vi), (viii) (b) (i), (iv), (vii)
7. Consider the following relation schema pertaining (c) (i), (iii), (v), (vi), (viii) (d) (ii), (v), (viii)
to a student’s database:
Students(rollno, name, address) Solution: getid and getname can be placed in base
class. Because all the subclasses will have the same
Enroll(rollno, courseno, coursename)
implementation for these functions. But getsalary
where the primary keys are shown underlined. The is dependent on category of employee. So, abstract
number of tuples in the student and Enroll tables function can be placed in base class and its imple-
are 120 and 8, respectively. What are the maxi- mentation in the subclasses.
mum and minimum number of tuples that can be Ans. (a)

Chapter 8.indd 429 4/9/2015 10:04:31 AM


430 Chapter 8: Databases

9. The relation scheme Student Performance (Name, 11. The order of an internal node in a B+ tree index
CourseNo, RollNo, Grade) has the following func- is the maximum number of children it can have.
tional dependencies: Suppose that a child pointer takes 6 bytes, the
Name, CourseNo → Grade search field value takes 14 bytes, and the block
RollNo, CourseNo → Grade size is 512 bytes. What is the order of the internal
Name → RollNo node?
RollNo → Name (a) 24 (b) 25
The highest normal form of this relation scheme is (c) 26 (d) 27
(a) 2NF (b) 3NF (GATE 2004: 2 Marks)
(c) BCNF (d) 4NF
Solution: Order of internal node is PI . CP + (PI − 1)
(GATE 2004: 2 Marks) KV <= Block Size
Child pointer (CP) = 6 B
Solution: Name and RollNo are candidate keys. Key value (KV) = 14 B
The attributes are not repeated. So, the relation Block size = 512 B
schema is in 1 NF. 6 + (PI − 1) × 14 = 512
There is no partial dependency. So, the relation is PI = 26
in 2NF. Ans. (c)
Grade is not fully dependent upon all candidate
keys, so it is not in 3NF. 12. The employee information in a company is stored
Candidate keys for given relation are: (rollNo, in the following relation:
courseNo) and (name, courseNo) there is no partial Employee(name, sex, salary, deptName)
and transitive dependencies. So the highest normal
form is 3NF. name → rollNo and rollNo → name Consider the following SQL query:
are violating the condition of BCNF.
SELECT deptName
Ans. (b)
FROM Employee
10. Consider the relation Student (name, sex, marks), WHERE sex = ‘M’
where the primary key is shown underlined, per-
taining to students in a class that has at least one GROUP by deptName
boy and one girl. What does the following rela- HAVING avg(salary)>
tional algebra expression produce? (SELECT avg (salary) FROM Employee)
(Note: r is the rename operator).
It returns the names of the department in which
Π name (rsex = female (Student) − Π name (Studentsex = female Λx = maleΛmarks≤m rn ,x ,m (Student))
(a) the average salary is more than the average
) − Π name (Studentsex = female Λx = maleΛmarks≤m rn ,x ,m (Student)) salary in the company.
(b) the average salary of male employees is more
(a) Names of girl students with the highest marks than the average salary of all male employees
(b) Names of girl students with more marks than in the company.
some boy student (c) the average salary of male employees is more
(c) Names of girl students with marks not less than than the average salary of employees in the
some boy student same department.
(d) Names of girl students with more marks than (d) the average salary of male employees is more
all the boy students than the average salary in the company.
(GATE 2004: 2 Marks)
(GATE 2004: 2 Marks)

Solution: The query is Part 1 − Part 2. Part 2 Solution: Inner query selects the average salary
of the query results in names of girls having marks of employees in the company. Outer query selects
less than or equal to marks of all the boys. And average salary of male employees. So, the whole
Part 1 selects the names of all the girls in class. So, query finds the department name of whose average
Part 1 − Part 2 will result in names of girl students salary of male employees is more than the average
having more marks than all the boy students. salary in the company.
Ans. (d) Ans. (d)

Chapter 8.indd 430 4/9/2015 10:04:32 AM


GATE PREVIOUS YEARS’ QUESTIONS 431

13. Which one of the following is a key factor for pre- Table r1 Table r2
ferring B+ trees to binary search trees for indexing
A B C A D
database relations?
1 10 100 1 1000
(a) Database relations have a large number of
records. 1 20 200 1 2000
(b) Database relations are sorted on the primary key.
(c) B+ trees require less memory than binary Table s (natural join of r1 and r2)
search trees.
(d) Data transfer from disks is in blocks. A B C D
(GATE 2005: 1 Mark)
1 10 100 1000
Solution: Indexing performs well for large data 1 20 200 1000
blocks. 1 20 100 200
B+ trees are preferred over the binary search
1 20 200 2000
trees.
B+ trees transfer data from disk to primary Ans. (c)
memory in form of data blocks.
In case of B+ trees, data moves in terms of blocks. 16. Let E1 and E2 be two entities in an E/R diagram
Ans. (d) with simple single-valued attributes. R1 and R2 are
two relationships between E1 and E2, where R1 is
14. Which one of the following statements about one-to-many and R2 is many-to-many. R1 and R2
normal forms is FALSE? do not have any attributes of heir own. What is the
(a) BCNF is stricter than 3NF. minimum number of tables required to represent
(b) Lossless, dependency-preserving decomposition this situation in the relational model?
into 3NF is always possible. (a) 2 (b) 3
(c) Lossless, dependency-preserving decomposition (c) 4 (d) 5
into BCNF is always possible. (GATE 2005: 2 Marks)
(d) Any relation with two attributes is in BCNF.
(GATE 2005: 1 Mark) Solution:
R2
Solution: It is not always possible to decompose E1 E2 E1 E2
a table in BCNF and preserve dependencies. For
l x L x
example, a set of functional dependencies {AB → C,
C → B} cannot be decomposed in BCNF. m y L y
Ans. (c) n z M y

15. Let r be a relation instance with schema R = The one-to-many relationships are represented
(A, B, C, D). We define r1 = PA, B, C(R) and r2 = with entity set from one side. In one-to-many rela-
PA, D(r). Let s = r1 * r2 where * denotes natural tionships, each entity in the entity set can be asso-
join. Given that the decomposition of r into r1 and ciated with at most one entity of the other. So, the
r1 is lossy, which one of the following is TRUE? table is not formed for R1. Hence, the tables are
formed for R2, E1 and E2.
(a) s ⊂ r (b) r ∪ s = r
Using cross-reference technique, minimum of 3
(c) r ⊂ s (d) r * s = s
tables are required.
(GATE 2005: 1 Mark)
Ans. (b)
Solution: 17. The following table has two attributes A and C
where A is the primary key and C is the foreign key
Table r
referencing A with on-delete cascade.
A B C D
A C
1 10 100 1000
2 4
1 20 200 1000
3 4
20 200 2000
(Continued)

Chapter 8.indd 431 4/9/2015 10:04:32 AM


432 Chapter 8: Databases

Continued {A → B, BC → D, E → C, D → A}.
A C What are the candidate keys of R?
4 3 (a) AE, BE (b) AE, BE, DE
5 2 (c) AEH, BEH, BCH (d) AEH, BEH, DEH
7 2 (GATE 2005: 2 Marks)
9 5
Solution: Let S be a candidate key of relation R
6 4 if the closure of S is all attributes of R and there
is no subset of S whose closure is all attributes
The set of all tuples that must be additionally of R.
deleted to preserve referential integrity when the Closure of AEH: AEH+ = {ABCDEH}
tuple (2, 4) is deleted is
Closure of BEH: BEH+ = {ABCDEH}
(a) (3, 4) and (6, 4)
(b) (5, 2) and (7, 2) Closure of DEH: DEH+ = {ABCDEH}
(c) (5, 2), (7, 2) and (9, 5) Ans. (d)
(d) (3, 4), (4, 3) and (6, 4)
20. Consider the following log sequence of two transac-
(GATE 2005: 2 Marks) tions on a bank account, with initial balance 12000,
that transfer 2000 to a mortgage payment and then
Solution: When (2,4) is deleted, as C is a foreign apply a 5% interest.
key referring A with delete on cascade, all entries
with value 2 in C must be deleted. 1. T1 start
So, (5, 2) and (7, 2) are deleted. As a result of 2. T1 B old = 1200 new = 10000
this, 5 and 7 are deleted from A, which causes
(9, 5) to be deleted. 3. T1 M old = 0 new = 2000
Ans. (c) 4. T1 commit
5. T2 start
18. The relation book (title, price) contains the titles
and prices of different books. Assuming that no 6. T2 B old = 10000 new = 10500
two books have the same price, what does the fol- 7. T2 commit
lowing SQL query list?
Suppose the database system crashes just before log
SELECT title record 7 is written. When the system is restarted,
FROM book as B which one statement is true of the recovery procedure?
WHERE (SELECT count(*) (a) We must redo log record 6 to set B to 10500.
FROM book as T (b) We must undo log record 6 to set B to 10000
and then redo log records 2 and 3.
WHERE T.price > B.price) < 5
(c) We need not redo log records 2 and 3 because
(a) Titles of the four most expensive books transaction T1 has committed.
(b) Title of the fifth most inexpensive book (d) We can apply redo and undo operations in arbi-
(c) Title of the fifth most expensive book trary order because they are idempotent.
(d) Titles of the five most expensive books (GATE 2006: 1 Mark)
(GATE 2005: 2 Marks)
Solution: When a transaction is committed, no
Solution: The outer query selects all titles from need to redo or undo operations.
book table. For every selected book, the sub-query Ans. (c)
returns count of those books, which are more expen-
sive than the selected book. The where clause of outer 21. Consider the relation account (customer, balance)
query will be true for five most expensive books. where customer is a primary key and there are
Ans. (d) no null values. We would like to rank customers
according to decreasing balance. The customer with
19. Consider a relation scheme R = (A, B, C, D, E, H ) the largest balance gets rank 1. Ties are not broke
on which the following functional dependencies hold: but ranks are skipped: if exactly two customers

Chapter 8.indd 432 4/9/2015 10:04:33 AM


GATE PREVIOUS YEARS’ QUESTIONS 433

have the largest balance they each get rank 1, and (a) All queries return identical row sets for any database
rank 2 is not assigned. (b) Query 2 and Query 4 return identical row sets for
all databases but there exist databases for which
Query 1: SELECT A.customer, count(B. Query 1 and Query 2 return different row sets.
customer) FROM account A, account B (c) There exist databases for which Query 3 returns
WHERE A.balance ⇐ B.balance group by strictly fewer rows than Query 2
A.customer (d) There exist databases for which Query 4 will
Query 2: SELECT A.customer, 1+count(B. encounter an integrity violation at runtime.
customer) FROM account A, account B
WHERE A.balance < B.balance group by (GATE 2006: 2 Marks)
A.customer Solution: The output of Query 2, Query 3 and
Consider these statements about Query 1 and Query 2. Query 4 will be identical. Query 1 produces dupli-
cate rows. But row set produced by all the queries
I. Q uery 1 will produce the same row set as Query will be same. For example,
2 for some but not all databases.
II. Both Query 1 and Query 2 are correct imple- Table: Enrolled Table: Enrolled
mentation of the specification. Student Course Student Amount
III. Q  uery 1 is a correct implementation of the
specification but Query 2 is not. A C1 A 100
IV. Neither Query 1 nor Query 2 is a correct imple- B C1 B 100
mentation of the specification. A C2 D 200
V. Assigning rank with a pure relational query
C C1
takes less time than scanning in decreasing
­balance order assigning ranks using ODBC.
Output of Query 1: A B
Which two of the above statements are correct? Output of Query 2: A B
(a) II and V (b) I and III Output of Query 3: A B
(c) I and IV (d) III and V
Output of Query 4: A B
(GATE 2006: 2 Marks) Ans. (a)

Solution: Both query1 and query2 perform the 23. Consider the relation enrolled (student, course) in
same task, but none of them sort the customer which (student, course) is the primary key, and the
according to the decreasing balance. So, option (c) relation paid (student, amount) in which student
is correct. is the primary key. Assume no null values and no
Ans. (c) foreign keys or integrity constraints. Assume that
amounts 6000, 7000, 8000, 9000 and 10000 were
22. Consider the relation enrolled (student, course) in
each paid by 20% of the students. Consider these
which (student, course) is the primary key, and
query plans (Plan 1 on left, Plan 2 on right) to “list
the relation paid (student, amount) where student
all courses taken by students who have paid more
is the primary key. Assume no null values and no
than x”.
foreign keys or integrity constraints. Given the fol-
lowing four queries: A disk seek takes 4 ms, disk data transfer band-
Query  1: SELECT student FROM enrolled WHERE width is 300 MB/s and checking a tuple to see if
student in (SELECT student FROM paid) amount is greater than x takes 10 µs. Which of the
following statements is correct?
Query  2: SELECT student FROM paid WHERE
student in (SELECT student FROM enrolled) (a) P lan 1 and Plan 2 will not output identical row
Query  3: SELECT E.student FROM enrolled sets for all databases.
E, paid P WHERE E.student = P.student (b) A course may be listed more than once in the
Query  4: SELECT student FROM paid WHERE output of Plan 1 for some databases.
exists (c) For x = 5000, Plan 1 executes faster than Plan
2 for all databases.
(SELECT * FROM enrolled WHERE enrolled. (d) For x = 9000, Plan 1 executes slower than Plan
student = paid.student) 2 for all databases.
Which one of the following statements is correct? (GATE 2006: 2 Marks)

Chapter 8.indd 433 4/9/2015 10:04:33 AM


434 Chapter 8: Databases

Enrolled Paid Enrolled Paid

Probe index Sequential Probe index Sequential scan


on student scan select on student

amount >x

Indexed nested loop join Indexed nested loop join

Project on course Select on amount > x

Project on course

Solution: Plans need to load both tables’ courses The closure of {AF}+ is {A F D E}. It cannot
and enrolled. So, disk access time is the same for drive all the members of set given in option (c). So,
both plans. option (c) is false.
Plan 2 does lesser number of comparisons com- Ans. (c)
pared to Plan 1.
25. Information about a collection of students is given
So, join operation will require more comparisons
by the relation studinfo(studId, name, sex). The
as the second table will have more rows in Plan 2
relation enroll(studId, courseId) gives which stu-
compared to Plan 1.
dent has enrolled for (or taken) what course(s).
The joined table of two tables will have more rows,
Assume that every course is taken by at least
so, more comparisons are needed to find amounts
one male and at least one female student. What
greater than x.
does the following relational algebra expression
Plan 1 executes faster than Plan 2. First tuples are
represent?
filtered then joined, whereas in plan 2 first tuples
are joined and then filtered, which takes more time. Π courseId (Π studId (s sex ="female"(studInfo)) × Π courseId (enroll))
Π courseId (Π studId (s sex ="female"(studInfo)) × Π courseId (enroll))
Ans. (c)
(a) C ourses in which all the female students are
24. The following functional dependencies are given:
enrolled.
AB → CD, AF → D, DE → F, C → G, F → E, G (b) Courses in which a proper subset of female
→A ­students are enrolled.
(c) Courses in which only male students are
Which one of the following options is false? enrolled.
(a) {CF}+ = {ACDEFG} (d) None of the above
(b) {BG}+ = {ABCDG} (GATE 2007: 2 Marks)
(c) {AF}+ = {ACDEFG} Solution:
(d) {AB}+ = {ABCDFG}
  (i) Select studId of all female students and select
(GATE 2006: 2 Marks) all courseId of all courses.
  (ii) The query performs a Cartesian product of the
Solution: Closure of AF or AF+ = {ADEF}, above select two columns in Step 1.
­closure of AF does not contain C and G. (iii)  It subtracts enroll table from the result of Step 2.

Chapter 8.indd 434 4/9/2015 10:04:33 AM


GATE PREVIOUS YEARS’ QUESTIONS 435

This will remove all the (studId, courseId) pairs, Solution: Q1 selects an employee but does not
which are present in enrol table. compute the result as after the `Where not exists’,
If all female students have registered in courses, it does not have statement to produce the result.
then this course will not be there in the subtracted Q2 selects an employee who gets higher salary
result. than anyone in the department 5. It correctly finds
So, the complete expression returns courses in employees who get higher salary than anyone in
which a proper subset of female students is enrolled. the department 5.
Ans. (b) Ans. (b)

26. Consider the relation employee (name, sex, 28. Which one of the following statements is FALSE?
­supervisorName) with name as the key. super-
(a) Any relation with two attributes is in BCNF.
visorName gives the name of the supervisor of
(b) A relation in which every key has only one
the employee under consideration. What does
attribute is in 2NF.
the ­following Tuple Relational Calculus query
(c) A prime attribute can be transitively dependent
produce?
on a key in a 3NF relation.
{e.name|employee(e) ∧{(∀x) (d) A prime attribute can be transitively depen-
[¬employee(x) ∨ x.supervisorName ≠ dent on a key in a BCNF relation.
e.name ∨ x.sex = “male”)]}/ inserted on (GATE 2007: 2 Marks)
the left of expression
Solution: According to the definition of 3NF, a
(a) Names of employees with a male supervisor.
prime attribute can be transitively dependent on a
(b) Names of employees with no immediate male
key. Option (d) is incorrect because this does not
subordinates.
satisfy the condition of BCNF.
(c) Names of employees with no immediate female
Ans. (d)
subordinates.
(d) Names of employees with a female supervisor. 29. The order of a leaf node in a B+ tree is the maxi-
(GATE 2007: 2 Marks) mum number of (value, data record pointer) pairs
Solution: The query selects all those employees it can hold. Given that the block size is 1 KB, data
whose immediate subordinate is “male”. record pointer is 7 bytes long, the value field is
Ans. (b) 9 bytes long and a block pointer is 6 bytes long,
what is the order of the leaf node?
27. Consider the table employee (empId, name, depart- (a) 63 (b) 64
ment, salary) and the two queries Q1, Q2 below. (c) 67 (d) 68
Assuming that department 5 has more than one (GATE 2007: 2 Marks)
employee, and we want to find the employees who
get higher salary than anyone in the department Solution: Let the order of the leaf node be n.
5, which one of the statements is TRUE for any Block size = 1 KB = 1024 bits
arbitrary employee table? 6 + 7n + (n − 1)9 = 1024
Q1: SELECT e.empId 1 n = 64
FROM employee e Ans. (b)
WHERE not exists (SELECT * FROM
employee s WHERE s.department = “5”
­ 30. Consider the following schedules involving two
and s.salary >= e.salary) transactions. Which one of the following ­statements
is TRUE?
Q2: SELECT e.empId
FROM employee e S1: r1(X ); r1(Y ); r2(X ); r2(Y ); w2(Y ); w1(X )
WHERE e.salary > Any S2: r1(X ); r2(X ); r2(Y ); w2(Y ); r1(Y ); w1(X )
(SELECT distinct salary FROM em-
ployee s WHERE s.department = “5”) (a) Both S1 and S2 are conflict serializable.
(b) S1 is conflict serializable and S2 is not conflict
(a)     Q1 is the correct query. serializable.
(b)   Q2 is the correct query. (c) S1 is not conflict serializable and S2 is conflict
(c) Both Q1 and Q2 produce the same answer. serializable.
(d) Neither Q1 nor Q2 is the correct query. (d) Both S1 and S2 are not conflict serializable.
(GATE 2007: 2 Marks) (GATE 2007: 2 Marks)

Chapter 8.indd 435 4/9/2015 10:04:33 AM


436 Chapter 8: Databases

Solution: Solution: Using negation theorem, we find option


Schedule S1 (c) is true.
Ans. (c)
T1 T2
33. Let R and S be two relations with the following
r1(X)
schema:
r1(Y) R(P, Q, R1, R2, R3)
r2(X) S(P, Q, S1, S2)
r2(Y) where {P, Q} is the key for both schemas. Which
of the following queries are equivalent?
w2(Y)
w1(X) I. Π P (R  S )
II. Π P (R)  Π P (S )
( )
The schedule is not conflict serializable.
III. Π P Π P ,Q (R) ∩ Π P ,Q (S ) .

( ))
Schedule S2

T1 T2
(
IV. ΠP ΠP ,Q (R ) − ΠP ,Q (R ) − ΠP ,Q (S )

r1(X) (a) Only I and II (b) Only I and III


(c) Only I, II and III (d) Only I, III and IV
r2(X)
(GATE 2008: 2 Marks)
r2(Y)
w2(Y) Solution:
r1(Y) In I, P from natural join of R and S are selected.
In III, all P from intersection of (P, Q) pairs ­present
w1(X)
in R and S.
IV is also equivalent to III because (R − (R − S))
The schedule is conflict serializable to T2T1. = R ∩ S.
Ans. (c)
II is not equivalent as it may also include P, where
31. A clustering index is defined on the fields which are Q is not same in R and S.
of type Queries I, II and III performs the same operations,
that is, they select attribute P.
(a) Non-key and ordering
Ans. (c)
(b) Non-key and non-ordering
(c) Key and ordering
34. Consider the following relational schemes for a
(d) Key and non-ordering
library database:
(GATE 2008: 1 Mark)
Book(Title, Author, Catalog_no, Publisher, Year,
Solution: Clustering index is defined on the fields. Price)
If the records of the file are physically ordered on Collection(Title, Author, Catalog_no)
a non-key field, it will not have a distinct value for within the following functional dependencies:
each record. So, the clustering index is defined on I. Title Author → Catalog_no
the fields of type non-key and ordering. II. Catalog_no → Title Author Publisher Year
Ans. (a) III. Publisher Title Year → Price
32. Which of the following tuple relational calculus Assume {Author, Title} is the key for both schemes.
expression(s) is/are equivalent to ∀t ∈r (P(t))? Which of the following statements is true?
Ι. ¬∃t ∈r (P(t) ) (a) Both Book and Collection are in BCNF.
ΙΙ. ∃t ∉r (P(t) ) (b) Both Book and Collection are in 3NF only.
(c) Book is in 2NF and Collection is in 3NF.
ΙΙΙ. ¬∃t ∈ r(¬P (t) ) (d) Both Book and Collection are in 2NF only.
ΙV. ∃t ∉r (¬P(t) ) (GATE 2008: 2 Marks)

(a) I only (b) II only Solution: Collection is in BCNF as there is only one
(c) III only (d) III and IV only functional dependency Title Author → Catalog_no
(GATE 2008: 1 Mark) and {Author, Title} is key for collection.

Chapter 8.indd 436 4/9/2015 10:04:35 AM


GATE PREVIOUS YEARS’ QUESTIONS 437

Book is not in BCNF because Catalog_no is not a Book is in 2NF because every non-prime attribute
key and there is a functional dependency Catalog_ of the table is dependent on either the key [Title,
no → Title Author Publisher Year. Author] or another non-prime attribute.
Book is not in 3NF because non-prime attributes Ans. (c)
(Publisher Year) are transitively dependent on key Linked Answer Questions 35 and 36: Consider the
[Title, Author]. following ER diagram:

M1 M2 M3 P1 P2 N1 N2

M R1 P R2 N

35. The minimum number of tables needed to repre- S1: R1[x]R2[x]R2[y]W1[x]W1[y]W2[y]


sent M, N, P, R1, R2 is
S2: R1[x]R2[x]R2[y]W1[x]W2[y]W1[y]
(a) 2 (b) 3 S3: R1[x]W1[x]R2[x]W1[y]R2[y]W2[y]
(c) 4 (d) 5
(GATE 2008: 2 Marks) S4: R2[x]R2[y]R1[x]W1[x]W1[y]W2[y]
Which of the above schedules are conflict-
Solution: Many-to-one and one-to-many relation- serializable?
ship sets that are total on the many-side can be
(a) S1 and S2 (b) S2 and S3
represented by adding an extra attribute to the
(c) S3 only (d) S4 only
“many” side, containing the primary key of the
“one” side. (GATE 2009: 2 Marks)
Since R1 is many to one and participation of M is
total, M and R1 can be combined to form the table Solution: The serial schedule T1T2 has the follow-
{M1, M2, M3, P1}. N is a week entity set, so it can ing sequence of operations:
be combined with P.
Ans. (a) R1[x]W1[x]W1[y]R2[x]R2[y]W2[y]

36. Which of the following is a correct attribute set And the schedule T2T1 has the following sequence
for one of the tables for the correct answer to the of operations:
above question? R2[x]R2[y]W2[y]R1[x]W1[x]W1[y]
(a) {M1, M2, M3, P1}
The schedule S2 is conflict-equivalent to T2T1 and
(b) {M1, P1, N1, N2}
S3 is conflict-equivalent to T1T2.
(c) {M1, P1, N1}
Only S2 and S3 are conflict serialzable schedules.
(d) {M1, P1}
To test for conflict serializability, make the wait
(GATE 2008: 2 Marks)
for graph. If there is a cycle in graph, it means not
Solution: conflict serializable.
For R1 correct attribute set: M1, M2, M3, P1 Ans. (b)
For R2 correct attribute set: N1, N2, P1, P2 with P1
38. The following key values are inserted into a B+ tree
as primary key and N1 as weak entity set.
in which order of the internal nodes is 3, and that of
Ans. (a)
the leaf nodes is 2, in the sequence given below. The
37. Consider two transactions T1 and T2, and four order of internal nodes is the maximum number of
schedules S1, S2, S3, S4 of T1 and T2 as given below: tree pointers in each node, and the order of leaf
nodes is the maximum number of data items that
T1: R1[x]W1[x]W1[y] can be stored in it. The B+ tree is initially empty.
T2: R2[x]R2[y]W2[y] 10, 3, 6, 8, 4, 2, 1

Chapter 8.indd 437 4/9/2015 10:04:35 AM


438 Chapter 8: Databases

The maximum number of times leaf nodes would FROM Catalog C


get split up as a result of these insertions is WHERE C.pid NOT IN (SELECT P.pid
(a) 2 (b) 3 FROM Parts P
(c) 4 (d) 5 WHERE P.color <> ‘blue’))
(GATE 2009: 2 Marks)
Assume that relations corresponding to the above
Solution: Total number of splits here are 5. schema are not empty. Which one of the following
is the correct interpretation of the above query?
4 (a) F ind the names of all suppliers who have sup-
plied a non-blue part.
2 3 6 (b) Find the names of all suppliers who have not
supplied a non-blue part.
(c) Find the names of all suppliers who have sup-
1 2 3 4 6 8 10 plied only blue parts.
(d) Find the names of all suppliers who have not
Ans. (a) supplied only blue parts.
(GATE 2009: 2 Marks)
39. Let R and S be relational schemes such that R =
{a, b, c} and S = {c}. Now consider the following The sub-query “SELECT P.pid FROM
Solution:
queries on the database: Parts P WHERE P.color <> ‘blue’” returns
I. Π R − S (r) − Π R − S (Π R − S (r) × s − Π R − S ,S (r))
pids of parts, which are not blue.
The sub-query “SELECT C.sid FROM Catalog
II. { t | t ∈ ΠR −S (r) ∧ ∀u ∈ s(∃v ∈ r(u = v[s] ∧ t = v[R − S ]))} C WHERE C.pid NOT IN (SELECT P.pid FROM
∀u ∈ s(∃v ∈ r(u = v[s] ∧ t = v[R − S ]))} Parts P WHERE P.color <> ‘blue’)” returns
III. {t | t ∈ ΠR −S (r) ∧ ∀v ∈r(∃u ∈ s(u = v[s] ∧ t = v[R − S ]))} sid of all those suppliers who have supplied blue
parts.
∧ ∀v ∈r(∃u ∈ s(u = v[s] ∧ t = v[R − S ]))}
The complete query returns the names of all sup-
IV. SELECT R.a, R.b FROM R, S WHERE R.c pliers who have supplied a non-blue part.
= S.c
Inner query SELECT P. pid FROM Parts P
Which of the above queries are equivalent? WHERE P. color <> ‘blue’ will select pid
(a) I and II (b) I and III of non-blue parts. Outer query of this will pro-
(c) II and IV (d) III and IV duce sid of those suppliers whose pid is not in
the result of inner query. The query will result
(GATE 2009: 2 Marks) the supplier names who have supplied non-blue
Solution: I and II describe the division operator parts
in relational algebra and tuple relational calculus, Ans. (a)
respectively.
Ans. (a) 41. Assume that, in the suppliers relation above, each
supplier and each street within a city has a unique
Common Data Questions 40 and 41: Consider the name, and (sname, city) forms a candidate key.
following relational schema: No other functional dependencies are implied other
Suppliers(sid: integer, sname: string, than those implied by primary and candidate keys.
city: string, street: string) Which one of the following is TRUE about the
above schema?
Parts(pid: integer, pname: string,
color: string) (a) The schema is in BCNF.
Catalog(sid: integer, pid: integer, (b) The schema is in 3NF but not in BCNF.
cost: real) (c) The schema is in 2NF but not in 3NF.
(d) The schema is not in 2NF.
40. Consider the following relational query on the (GATE 2009: 2 Marks)
above database:
SELECT S.sname Solution: Candidate key: sname, city
FROM Suppliers S Primary key: sname
WHERE S.sid NOT IN (SELECT C.sid Alternate key: sid, sname

Chapter 8.indd 438 4/9/2015 10:04:37 AM


GATE PREVIOUS YEARS’ QUESTIONS 439

city → street EXISTS(SELECT * FROM Passenger


sname → city   WHERE age > 65 AND Passenger.pid =
Transitivity: sname → city → street Reservation.pid)
So, it is in 3NF.
(a) 1, 0 (b) 1, 2
Ans. (b) (c) 1, 3 (d) 1, 5
42. Consider a B+ tree in which the maximum number (GATE 2010: 1 Marks)
of keys in a node is 5. What is the minimum number
of keys in any non-root node? Solution: The outer query selects four entries
(a) 1 (b) 2 (with pid as 0, 1, 5, 3) from Reservation table. Out
(c) 3 (d) 4 of these selected entries, the sub-query returns non-
(GATE 2010: 1 Mark) null values only for 1 and 3.
Ans. (c)
Solution: In B+ tree, root node has minimum
two block pointers and maximum p block pointer, 44. Which of the following concurrency control proto-
where p = order and key = order − 1. cols ensure both conflict serializability and freedom
In the non-root node, the minimum number of keys from deadlock?
= p/2 − 1 I. Two-phase locking
Here, key = 5, order = 6 II. Timestamp ordering
So, minimum number of keys in non-root node (a) I only (b) II only
= 6/2 − 1 = 2 (c) Both I and II (d) Neither I nor II
Ans. (b) (GATE 2010: 1 Mark)
43. A relational schema for a train reservation data-
base is given below: Solution: Two-phase locking is a concurrency
control method, which guarantees serializability.
Passenger(pid, pname, age) The protocol utilizes locks, applied by a transac-
Reservation(pid, class, tid) tion to data, which may block other transactions
Table: Passenger from accessing the same data during the transac-
tion’s life. 2PL may be lead to deadlocks which
pid pname Age result from the mutual blocking of two or more
transactions.
0 `Sachin’ 65
Timestamp ordering algorithm is a non-lock con-
1 `Rahul’ 66 currency control method. In Timestamp based
2 `Saurav’ 67 method, deadlock cannot occur as no transaction
3 `Anil’ 69 ever waits.
Ans. (b)
Table: Reservation 45. Consider the following schedule for transactions
pid Class Tid T1, T2 and T3:

0 `AC’ 8200
T1 T2 T3
1 `AC’ 8201
Read(X )
2 `SC’ 8201
Read(Y )
5 `AC’ 8203
Read(Y )
1 `SC’ 8204
Write(Y )
3 `AC’ 8202
Write(X )
What pids are returned by the following SQL query Write(X )
for the above instance of the tables?
Read(X )
SELECT pid
FROM Reservation WHERE class = ‘AC’ AND Write(X )

Chapter 8.indd 439 4/9/2015 10:04:37 AM


440 Chapter 8: Databases

Which one of the schedules below is the correct multiple accounts or joint accounts. This attri-
serialization of the above? bute stores the primary account number.
(a) T1 → T3 → T2 (b) T2 → T1 → T3 IV. Name: Name of the Student.
(c) T2 → T3 → T1 (d) T3 → T1 → T2 V. Hostel_Room: Room number of the hostel.
(GATE 2010: 2 Marks) Which of the following options is INCORRECT?
(a) BankAccount_Number is a candidate key.
Solution: T1 can complete before T2 and T3 as
(b) Registration_Number can be a primary key.
there is no conflict between Write(X) of T1 and
(c) UID is a candidate key if all students are from
the operations in T2 and T3 which occur before
the same country.
Write(X) of T1.
(d) If S is a superkey such that S ∩ UID is NULL
then S ∪ UID is also a superkey.
T3 can complete before T2 as the Read(Y) of T3
does not conflict with Read(Y) of T2. Similarly,
(GATE 2011: 1 Mark)
Write(X) of T3 does not conflict with Read(Y) and
Write(Y) operations of T2. Solution: When two students hold joint account
After topologically sorting, the sequence is T1 → in that case BankAccount_Num will not uniquely
T3 → T2. determine other attributes.
Ans. (a) Ans. (a)
46. The following functional dependencies hold for 48. Database table by name Loan_Records is given
relations R(A, B, C) and S(B, D, E) below:
B → A,
Borrower Bank_Manager Loan_Amount
A→C
Ramesh Sunderajan 10000.00
The relation R contains 200 tuples and the rela-
tion S contains 100 tuples. What is the maximum Suresh Ramgopal 5000.00
number of tuples possible in the natural join R  S? Mahesh Sunderajan 7000.00
(a) 100 (b) 200
(c) 300 (d) 2000 What is the output of the following SQL query?
(GATE 2010: 2 Marks) SELECT count(*) FROM(
 (SELECT Borrower, Bank_Manager FROM
Solution: B is a candidate key of R.
Loan_Records) AS S NATURAL JOIN
So, all 200 values of B must be unique in R.  (SELECT Bank_Manager, Loan_Amount
There is no functional dependency given for S. FROM Loan_Records) AS T );
For the maximum number of tuples in output,
(a) 3 (b) 9
there are two ways.
(c) 5 (d) 6
(i) All 100 values of B in S are same and there is (GATE 2011: 2 Marks)
an entry in R, which matches with this value.
So, we are having 100 tuples in output.
Solution: Output as Table S:
(ii) All 100 values of B in S are different and these
values are present in R also. So, we are having Borrower Bank_Manager
100 tuples.
Ans. (a) Ramesh Sunderajan
Suresh Ramgopal
47. Consider a relational table with a single record
for each registered student with the following Mahesh Sunderajan
attributes:
Output as Table T:
  I. Registration_Number: Unique registration
number for each registered student. Bank_Manager Loan_Amount
II. UID: Unique Identity number, unique at the Sunderajan 10000.00
national level for each citizen.
III. B  ankAccount_Number: Unique account Ramgopal 5000.00
number at the bank. A student can have Sunderajan 7000.00

Chapter 8.indd 440 4/9/2015 10:04:37 AM


GATE PREVIOUS YEARS’ QUESTIONS 441

Natural Join of S and T: 50. Which of the following statements are TRUE
about an SQL query?
Borrower Bank_Manager Loan_Amount
I. An SQL query can contain a HAVING clause
Ramesh Sunderajan 10000.00 even if it does not have a GROUP BY clause.
Ramesh Sunderajan 7000.00 II. An SQL query can contain a HAVING clause
only if it has GROUP BY clause.
Suresh Ramgopal 5000.00 III. A ll attributes used in the GROUP BY clause
Mahesh Sunderajan 10000.00 must appear in the SELECT clause.
IV. Not all attributes used in the GROUP BY
Mahesh Sunderajan 7000.00 clause need to appear in the SELECT clause.
(a) I and III (b) I and IV
Ans. (c)
(c) II and III (d) II and IV
49. Consider a database table T containing two col- (GATE 2012: 1 Mark)
umns X and Y each of type integer.
Solution:
After the creation of the table, one record (X = 1,
Y = l) is inserted in the table. I. HAVING clause can also be used with aggregate
function. If we use a HAVING clause without
Let MX and MY denote the respective maximum
a GROUP BY clause, the HAVING condi-
values of X and Y among all records in the table at
tion applies to all rows that satisfy the search
any point in time. Using MX and MY, new records
condition.
are inserted in the table 128 times with X and Y
values being MX + 1, 2 * MY + 1, respectively. II. To verify S,
It may be noted that each time after the insertion, CREATE TABLE temp (id INT, name
values of MX and MY change. VARCHAR(100));
What will be the output of the following SQL INSERT INTO temp VALUES (1, "abc");
query after the steps mentioned above are carried INSERT INTO temp VALUES (2, "abc");
out?
INSERT INTO temp VALUES (3, "bcd");
SELECT Y FROM T WHERE X = 7;
INSERT INTO temp VALUES (4, "cde");
(a) 127 (b) 255 SELECT Count(*) FROM temp GROUP BY name;
(c) 129 (d) 257
(GATE 2011: 2 Marks) Result:
count(*)
Solution: 2
The value of X is calculated using MX + 1 and Y
1
is calculated using 2 × MY + 1
1
For example, X = 1 and Y = 1
Ans. (b)
at the next step, X = MX + 1 = 1 + 1 = 2,
Y = 2 × MY + 1 = 2 × 1 + 1 = 3 51. Given the basic ER and relational models, which of
the following is INCORRECT?

X Y (a) A n attribute of an entity can have more than


one value.
1 1 (b) An attribute of an entity can be composite.
2 3 (c) In a row of a relational table, an attribute can
have more than one value.
3 7 (d) In a row of a relational table, an attribute can
4 15 have exactly one value or a NULL value.
5 31 (GATE 2012: 1 Mark)
6 63 Solution: In relational table, more than one value
7 127 for an attribute will violate the condition of first
normal form. So, option (c) is incorrect.
Ans. (a) Ans. (c)

Chapter 8.indd 441 4/9/2015 10:04:37 AM


442 Chapter 8: Databases

52. Which of the following is TRUE? Solution: Two schedules are conflict-serializable
if the actions belong to different transactions, one
(a) Every relation is 3NF is also in BCNF.
of the actions is a write operation, the actions
(b) A relation R is in 3NF if every non-prime attri-
access the same object (read or write).
bute of R is fully functionally dependent on
every key of R. Two schedules are conflict-equivalent if both sched-
(c) Every relation in BCNF is also in 3NF. ules involve the same set of transactions, and the
(d) No relation can be in both BCNF and 3NF. order of each pair of conflicting actions in both
(GATE 2012, 1 mark) schedules is the same.
Take the following schedule:
Solution: Any relation that is in `x’NF, will auto-
matically be in `x-1’ NF. Now, let us see the order of S: r1(P); r1(Q); r2(Q); r2(P); w1(Q);w2(P)
NF as we discussed in the text, it is — 1NF, 2NF, 3NF,
BCNF, 4NF and 5NF. Therefore, option (c) is correct.
Ans. (c)
53. Suppose R1(A, B) and R2(C, D) are two relation
schemas. Let r1 and r2 be the corresponding rela- T1 T1
tion instances. B is a foreign key that refers to C in
R2. If data in r1 and r2 satisfy referential integrity
constrains, which of the following is ALWAYS
TRUE?
(a) ΠB(r1) − ΠC(r2) = ∅
Cycle exists between two transactions, so T1 and
(b) ΠC(r2) − ΠB(r1) = ∅
T2 are not conflict serializable.
(c) ΠB(r1) − ΠC(r2) A schedule is conflict-serializable when the schedule
(d) ΠB(r1) − ΠC(r2) ≠ ∅ is conflict-equivalent to one or more serial sched-
(GATE 2012: 2 Marks) ules. There are two possible serial schedules–T1
followed by T2 and T2 followed by T1. In both,
Solution: B is a foreign key in r1, which refers to one of the transactions reads the value written by
C in r2. another transaction as a first step.
r1 and r2 satisfy referential integrity constraints. Ans. (b)
So, every value that exists in column B of r1 must Common Data Questions 55 and 56: Consider the
also exist in column C of r2. following relations A, B and C:
Ans. (a)
A
54. Consider the following transactions with data Id Name Age
items P and Q initialized to zero:
12 Arun 60
T1: read (P); 15 Shreya 24
   read (Q); 99 Rohit 11
   if P=0 then Q:=Q+1;
   write (Q). B
T2: read (Q); Id Name Age
   read (P) 15 Shreya 24
   if Q=0 then P:=P+1; 25 Hari 40
   write (P). 98 Rohit 20
Any non-serial interleaving of T1 and T2 for con-
99 Rohit 11
current execution leads to
(a) a serializable schedule C
(b) a schedule that is not conflict-serializable
(c) a conflict-serializable schedule Id Phone Area
(d) a schedule for which precedence graph cannot 10 220 02
be drawn
99 2100 01
(GATE 2012: 2 Marks)

Chapter 8.indd 442 4/9/2015 10:04:37 AM


GATE PREVIOUS YEARS’ QUESTIONS 443

55. How many tuples does the result of the following 57. An index is clustered, if
relational algebra expression contain? Assume that
(a) It is on a set of fields that form a candidate key.
the schema of A ∪ B is the same as that of A.
(b) It is on a set of fields that include the primary key.
(A ∪ B) A.Id> 40  C.Id <15 C (c) The data records of the file are organized in the
same order as the data entries of the index.
(a) 7 (b) 4 (d) The data records of the file are organized not in
(c) 5 (d) 9 the same order as the data entries of the index.
(GATE 2012: 2 Marks)
(GATE 2013: 1 Mark)

Solution: Solution: In clustered index, the leaf nodes


A∪B ­contain the data page of the table concerned. Each
index row contains a key value and a pointer to
Id Name Age either the intermediate level (other than root and
12 Arun 60 leaf nodes) or data row in leaf level. Therefore, they
are organised in the same order.
15 Shreya 24
Ans. (c)
99 Rohit 11
58. Consider the following relational schema:
25 Hari 40
Students(rollno: integer, sname: string)
98 Rohit 20
Courses(courseno: integer, cname: string)
(A ∪ B)  A.Id> 40  C.Id <15 C Registration(rollno: integer, courseno;
integer, percent: real)
Id Name Age Id Phone Area Which of the following queries are equivalent to
12 Arun 60 10 2200 02 this query in English?
15 Shreya 24 10 2200 02 “Find the distinct names of all students who score
99 Rohit 11 10 2200 02 more than 90% in the course numbered 107”
25 Hari 40 10 2200 02 I. S
 ELECT DISTINCT S.sname FROM Students
98 Rohit 20 10 2200 02 as S, Registration as R WHERE
R.rollno=S.rollno AND R.Courseno=107
AND R.percent>90
99 Rohit 11 99 2100 01
98 Rohit 20 99 2100 01
II. P sname (s courseno =107 L percent > 90(Registration students))
Ans. (a)
P sname (s courseno =107 L percent > 90(Registration students))
56. How many tuples does the result of the following
SQL query contain? III. { T |∃S ∈ Student, ∃R ∈ Registration (S.rollno =
R.rollno ∧ R.courseno = 107 ∧ R.percent > 90 ∧
SELECT A.Id T. sname = S.name)}
FROM A {<SN>  | ∃SR∃RP(<SR, SN> ∈ Student∧ < SR,
WHERE A.Age > ALL (SELECT B.Age 107, RP> ∈ RP > 90)}

FROM B (a) I, II, III and IV (b) I, II and III only


(c) I, II and IV only (d) II, III and IV only
WHERE B.Name = ‘Arun’)
(GATE 2013: 2 Marks)
(a) 4    (b) 3    (c) 0    (d) 1
Solution: All of the query statements are equiva-
(GATE 2012: 2 Marks)
lent to the given query.
Solution: ALL returns the values to be greater Ans. (a)
than all the values returned by the sub-query. Linked Answer Questions 59 and 60: Relation R
There is no entry with the name Arun in Table B. has eight attributes ABCDEFGH. Fields of R con-
So, the sub-query will return NULL. If a sub-query tain only atomic values. F = {CH → G, A → BC,
returns NULL, then the condition becomes true for B → CFH, E → A, F → EG} is a set of functional
all rows of A. dependencies (FDs) so that F+ is exactly the set of
Ans. (b) FDs that hold for R.

Chapter 8.indd 443 4/9/2015 10:04:38 AM


444 Chapter 8: Databases

59. How many candidate keys does the relation R have? (c) In 3NF, but not in BCNF
(d) In BCNF
(a) 3 (b) 4
(c) 5 (d) 6 (GATE 2013: 2 Marks)
(GATE 2013: 2 Marks)
Solution: A → BC, B → CFH and F → EG are
Solution: Candidate keys AD, BD, ED and FD
partial dependencies.
Ans. (b)
So, they are in 1NF but not in 2NF.
60. The relation R is Ans. (a)
(a) In 1NF, but not in 2NF
(b) In 2NF, but not in 3NF

PRACTICE EXERCISES

Set 1

1. A relational database can be defined as 7. Which of the following is a unary operation?


(a) a place to store relational information. (a) Join (b) Project
(b) a database that relates to different operations. (c) Intersection (d) Cartesian product
(c) a database to relate the relationship of different
databases. 8. What is the least and ceiling number of rows in the
(d) a database related to storing of information join selection of two tables having m rows and n
about humans. rows, respectively?
2. The minimal superkeys are called (a) 1 and m × n (b) 0 and m + n
(c) 1 and infinite (d) 0 and mn
(a) minimum keys. (b) candidate keys.
(c) domain keys. (d) optimal keys. 9. An attribute of one table having the same value of
a primary key of another table is termed as
3. Which of the following combinations represents a
valid data model? (a) foreign key. (b) alternate key.
(c) candidate key. (d) transitive key.
(a) E-R model, network data model, normal form
data model 10. A database has five attributes defined as {A, B, C,
(b) Relational data model, hierarchical data model, D, E}. The functional dependency is defined by
SQL data model {A → BC, E → FG}
(c) Object-based data model, table data model
The candidate key of this table is
(d) E-R model, relational data model, object-based
data model (a) A. (b) AE.
(c) E. (d) BC and FG.
4. Which of the following is not a database model?
11. Consider the following set of functional dependency
(a) Network database model
(b) Relational database model {AB → CD, CD → E, E → A} AB, CD are
(c) Object-oriented database model ­candidate keys.
(d) Normal form data model It is in which normal form?
5. Which one of the following allows accessing or (a) 2NF   (b) BCNF   (c) 3NF   (d) 4NF
maintaining data in a database?
12. The term physical data independence can be
(a) DCL  (b) DML
defined as
(c) DDL  (d) None of the above
(a) If we modify the conceptual schema, then there
6. Which of the following is a binary operation? is no need to modify the physical schema.
(a) Select (b) Project (b) If we modify the physical schema, then there is
(c) Rename (d) Union no need to modify the conceptual schema.

Chapter 8.indd 444 4/9/2015 10:04:39 AM


PRACTICE EXERCISES 445

(c) If we modify the conceptual schema, then there 20. The value of particular field should be less than 50
is no need to modify the external schema. if it is
(d) If we modify the middle level schema, then
(a) An integrity constraint.
any one schema (internal/external) may be
(b) A referential constraint.
modified.
(c) A primary key constraint.
13. Relational algebra can be defined as (d) A normalization constraint.

(a) Procedural. (b) Non-procedural. 21. The ER model of a database includes


(c) User dependent. (d) Domain dependent.
I. Different entity types.
14. Which of the following is true? II. Attributes for each entity type.
III. Relationships among entity types.
(a) A relation in BCNF is always in 3NF. IV. Semantic integrity constraints are not defined.
(b) A relation in 3NF is always in BCNF.
(c) PJNF and 3NF are same. Which of the above statements is correct?
(d) A relation in PJNF is not in 3NF.
(a) All of the above (b) Only I, II and III
15. RAD stands for . (c) III and IV (d) II and III

(a) Read and destroy 22. The cardinality ratio and participation
(b) Read and develop of a table helps us to implement either the cross
(c) Rapid application development referencing design or mutual referencing design.
(d) Rapid application design
(a) Functionality (b) Constraints
16. In DML, RECONNCT command cannot be used (c) SQL queries (d) Domains
with
23. Consider the following schemas:
(a) CLEAN set. (b) FIXED set.
(c) OPTIONAL set. (d) USER set. Branch = (Branch-name, Assets, Branch-city)
Customer = (Customer-name, Bank name,
17. The UWA is used to communicate the contents of
Customer-city)
individual records between
Borrow = (Branch-name, loan number,
(a) Host program and present record. ­customer account-number)
(b) Host program and host record. Deposit = (Branch-name, Accountnumber,
(c) Host program and DBMS. Customer-name, Balance)
(d) Host program and host language.
Using relational algebra, the query that finds cus-
18. Referential integrity is concerned with tomers having deposits have accountnumber >
23456 is
(a) Primary key. (b) Foreign key.
(c) Alternate key. (d) Project_Join key. (a) Πcustomer-name (s account-number > 23456 (Deposit)
(b) s customer-name (s account-number > 23456 (Deposit)
(c) Πcustomer-name (s account-number > 23456 (Borrow)
19. Match the following:
(d) s customer-name (Π account-number > 23456 (Borrow)
List — I List — II
I. DDL A. LOCK TABLE 24. What deletes the entire file except the file structure?
II. DML B. COMMIT (a) DELETE
III. TCL C. Natural Difference (b) DELETE RECORDS
(c) ZAP
IV. BINARY D. REVOKE
(d) RETAIN STRUCT
Operation
25. The function of a Transaction Manager is to
Codes:
I. Maintain a log of transactions.
I    II     III IV II. Maintain database images before and after a
(a) A D B C transaction.
(b) A B D C III. Maintain appropriate concurrency control.
(c) D B A C IV. Avoid deadlocks.
(d) D A B C

Chapter 8.indd 445 4/9/2015 10:04:39 AM


446 Chapter 8: Databases

(a) I, II, III and IV (b) II and III 32. A network schema is used to
(c) I, III and IV (d) I, II and III
(a)   restrict to many-to-many relationship.
(b) permit many-to-many relationship.
26. The term granularity relates to
(c)  permit to store data in a database.
(a) the size of table. (d) help in storing one-to-many relationship.
(b) the size of data item.
(c) the size of tuple. 33. Storing under unified scheme at a single site is
(d) the size of primary key called as
(a) data mining. (b) universal database.
27. Match the following: (c) data warehousing. (d) advanced database.
34. The task of correcting and cleaning data is called as
I. OLAP A. Back propagation
(a) data organization. (b) pre-processing.
II. OLTP B. Data warehouse
(c) data mining. (d) data structuring.
III. Decision tree C. RDBMS
35. A clustering index consists of .
IV. Neural network D. Classification
(a) declared and ordered primary key
(b) both grouped and ordered primary key and for-
I II III IV eign key
(c) ordered foreign key
(a) D C A B
(d) grouped and ordered candidate key and alter-
(b) B C D A nate key
(c) A B C D
36. Consider the following schema:
(d) D B A C
Emp(Empcode, Name, Sex, Salary, Deptt)
A simple SQL query is executed as follows:
28. Which of the following statements is TRUE?
SELECT Deptt FROM Emp
(a) A prime attribute can be transitively depen-
dent on a key in 5NF relation. WHERE sex = ‘M’
(b) A prime attribute can be transitively depen- GROUP by Dept
dent on a key in 3NF relation.
HAVING avg (Salary) > {select avg (Salary)
(c) A prime attribute can be transitively depen-
from Emp}
dent on a key in PJNF relation.
(d) A prime attribute cannot be transitively depen- The output will be
dent on a key in BCNF relation.
(a) a verage salary of male employee is the average
29. Which normal form is considered as adequate for a salary of the organization.
simple database design? (b) average salary of male employee is less than the
average salary of the organization.
(a) 2NF (b) 3NF
(c) average salary of male employee is equal to the
(c) BCNF (d) PJNF
average salary of the organization.
30. Which of the following is not a type of DBMS? (d) average salary of male employees in a depart-
ment is more than the average salary of the
(a) Hierarchical (b) Network organization.
(c) Relational (d) Graphical
37. Consider a relation X(p, q, r), and the domains are
31. Manager’s salary details are to be hidden from represented by their atomic values. Further, the
Employee Table. This technique is called as functional dependency p → q, q → r is observed,
the relation is in
(a) internal level datahiding.
(b) physical level datahiding. (a) 1NF, not in 2NF
(c) external level datahiding. (b) 2NF, not in 3NF
(d) logical level datahiding. (c) 3NF
(d) 1NF, not in 3NF

Chapter 8.indd 446 4/9/2015 10:04:39 AM


PRACTICE EXERCISES 447

38. Match the following: Name Age Occupation Category


I. Foreign keys A. Domain constraint Jenifer 28 DOC B
II. Private key B. Referential integrity Maya 32 SER D
III. E
 vent control C. Password Dev 24 MUS C
action model
There is an index file associated with this and it
IV. Data security D. Trigger
contains the values 1, 3, 2, 5 and 4. Which one of
the fields in the index built from?
Codes:
(a) Age (b) Name
  I II III IV (c) Occupation (d) Category
(a) C B A D 4. Consider the following database relations contain-
(b) B A D C ing the attributes
(c) C D A B Book_id
(d) A B C D Subject_Category_of_books
Name_of_Author
Set 2 Nationalilty_of_Author
With Book_id as the primary key
1. For a database relation R(a, b, c, d), where the
domains of a, b, c, d include only atomic values, (a) T
 he highest normal form satisfied by this rela-
only the following functional dependencies and tion is – NF.
those that can be inferred from them holds. (b) Suppose the attributes Book_title and Author_
address are added to the relation, and the pri-
a→c mary key is changed to {name_of_Author,
b→d Book_title}, the highest normal form satisfied
This relation is by the relation is – NF.

(a) In First normal form but not in Second normal 5. Consider the scheme R = (S T U V) and the
form. dependencies S → T, T → U, U → V and V → S.
(b) In Second normal form but not in Third normal Let R = (R1 and R2) be a decomposition such that
form. R1 ∩ R2 = f. The decomposition is
(c) In Third normal form. (a) Not in 2NF
(d) None of the above. (b) In 2NF but not 3NF.
2. Let R(a, b, c) and S(d, e, f) be two relations in (c) In 3NF but not 2NF.
which d is the foreign key of S that refers to the (d) In both 2NF and 3NF.
primary key of R. Consider the following four oper- 6. Consider a schema R(A, B, C, D) and functional
ations on R and S: dependencies A → B and C → D. Then the decom-
I. Insert into R II. Insert into S position of R into R1(AB) and R2(CD) is
III. Delete from R IV. Delete from S (a) Dependency preserving and lossless join.
Which of the following is true about the referential (b) Lossless join but not dependency preserving.
integrity constraint above? (c) Dependency preserving but not lossless join.
(d) Not dependency preserving and not lossless join.
(a) None of I, II and IV can cause its violation.
(b) All of I, II, III and IV can cause its violation. 7. Relation R with an associated set of functional
(c) Both I and IV can cause its violation. dependencies, F, is decomposed into BCNF. The
(d) Both II and III can cause its violation. redundancy (arising out of functional ­dependencies)
in the resulting set of relation is
3. There are five records in a database
(a) Zero.
Name Age Occupation Category (b) More than zero but less than that of an equiva-
lent 3NF decomposition.
Rama 27 CON A
(c) Proportional to the size of F+.
Abdul 22 ENG A (d) Indeterminate.

Chapter 8.indd 447 4/9/2015 10:04:39 AM


448 Chapter 8: Databases

8. From the following instance of a relation scheme 13. Which of the following relational calculus expres-
R(A, B, C), we can conclude that sions is not safe?

A B C (a) {t|∃ ∈ R1(t[A] = u[A]) ∧ ¬ ∃s ∈ R2(t[A] = s[A])}


1 1 1 (b) {t|∀u ∈ R1(u[A] = “x”⇒ ∃s ∈ R2 (t[A] = s[A]
∧ s[A] = u[A]))}
1 1 0
(c) {t|¬ (t ∈ R1)
2 3 2
(d) {t|∃ ∈ R1(t[A] = u[A]) ∧ ∃s ∈ R2(t[A] = s[A])}
2 3 2
14. With regard to the expressive power of the formal
(a) A functionally determines B and B functionally relational query languages, which of the following
determines C. statement is true?
(b) A functionally determines B and B does not
(a) R elational algebra is more powerful than rela-
functionally determine C.
tional calculus.
(c) B does not functionally determine C.
(b) Relational algebra has the same power as rela-
(d) A does not functionally determine B and B
tional calculus.
does not functionally determine C.
(c) Relational algebra has the same power as safe
9. Which of the following is/are correct? relational calculus.
(d) None of the above.
(a) An SQL query automatically eliminates
duplicates. 15. For the schedule given below, which of the follow-
(b) An SQL query will not work if there are no ing is correct?
indexes on the relations. 1 Read A
(c) SQL permits attribute names to be repeated in
2 Read B
the same relation.
(d) None of the above. 3 Write A
4 Read A
10. Given relation r(w, x) and s(y, z), the result of
select distinct w, x from r, s is guaranteed to be 5 Write A
same as r, provided 6 Write B
(a) r has no duplicates and s is non-empty. 7 Read B
(b) r and s have no duplicates. 8 Write B
(c) s has no duplicates and r is non-empty.
(d) r and s have the same number of tuples. (a) T his schedule is serializable and can occur in a
scheme using 2PL protocol.
11. Which of the following query transformations (b) This schedule is serialzable but cannot occur in
(i.e. replacing the LHS expression by the RHS a scheme using 2PL protocol.
expression) is incorrect? R1 and R2 are relations, (c) This schedule is not serializable but can occur
C1 and C2 are selection conditions and A1 and A2 in a scheme using 2PL protocol.
are attributes of R1. (d) This schedule is not serializable and cannot
occur in a scheme using 2PL protocol.
(a) s c (s c (R1 )) → s c (s c (R1 ))
1 2 2 1

(b) s c (ΠA (R1 )) → ΠA (s c (R1 )) 16. Which of the following is correct?


1 1 1 1

(c) s c (R1ER2 ) → s c (R1 ) E s c (R1 ) (a) B


 trees are for storing data on disk and B+
1 1 1

(d) ΠA (s c (R1 )) → s c (ΠA (R1 )) trees are for main memory.


2 1 1 2 (b) Range queries are faster on B+ trees.
(c) B trees are for primary indexes and B+ trees
12. The relational algebra expression equivalent to the
are for secondary indexes.
following tuple calculus expression:
(d) The height of a B+ tree is independent of the
{t|t ∈ r ∧ (t[A] = 10 ∧ t[B] = 20)} is number of records.
(a) s(A = 10 ∨ B = 20)(r) 17. B+ trees are preferred to binary trees in database
(b) s(A = 10(r) E sB = 20(r) because
(c) s(A = 10(r) ∩ sB = 20(r) (a) d
 isk capacities are greater than memory
(d) s(A = 10(r) − sB = 20(r) capacities.

Chapter 8.indd 448 4/9/2015 10:04:40 AM


ANSWERS TO PRACTICE EXERCISES 449

(b) disk access is much slower than memory access. student names are of length 8 bytes, disk blocks are
(c) d
 isk data transfer rates are much less than of size 512 bytes and index pointers are of size 4
memory data transfer rates. bytes. Given this scenario, what would be the best
(d) disks are more reliable than memory. choice of the degree (i.e., the number of pointers
per node of the B+ tree?
18. A B+ tree index is to be built on the name attri-
bute of the relation STUDENT. Assume that all

ANSWERS TO PRACTICE EXERCISES

Set 1

1. (a) 8. (d) 15. (c) 22. (b) 29. (b) 36. (d)
2. (b) 9. (a) 16. (b) 23. (a) 30. (d) 37. (a)
3. (d) 10. (b) 17. (c) 24. (c) 31. (c) 38. (b)
4. (d) 11. (c) 18. (b) 25. (d) 32. (b)
5. (a) 12. (b) 19. (d) 26. (b) 33. (c)
6. (d) 13. (a) 20. (a) 27. (b) 34. (b)
7. (b) 14. (a) 21. (a) 28. (d) 35. (a)

Set 2

1. (a) The relation has the following functional Candidate keys for this relation are {S, T, U, V }
dependencies: As given R1 ∩ R2 = f
a→c So, R1 and R2 might have two attributes. A rela-
b→d tion with two attributes satisfies both 2NF and
{a b} is key for the given relation. 3NF conditions.

Here prime attributes are a and b. 6. (c) Dependency is preserved because there is no
loss of functional dependencies.
c and d are dependent on partial keys, this is not Since R1(AB) and R2(CD) do not have any common
allowed in 2NF. So, the given relation is in 1 NF. attribute. So, it is lossy join.
2. (d) Insertion into S and deletion from R can cause 7. (a) Redundancy is zero when we decompose into
inconsistency. So, II and III can cause violation. BCNF.
3. (c) The index is built from field occupation where 8. (b, c) F: X → Y implies that for any two tuples if
column is sorted in alphabetic order CON, DOC, t1[X] = t2[X], they must have t1[Y] = t2[Y].
ENG, SER and MUS. Index is assigned according
9. (d) DISTINCT key word is used to remove dupli-
to alphabetic position.
cate rows along with SELECT keyword. SQL does
4. (a) 2NF, (b) 3NF not permit attribute names to be repeated in the
same relation.
5. (d) The relation schema R = (S T U V) has follow-
ing functional dependencies 10. (a) Cross join with an empty relation produces the
S→T overall result as empty. Therefore, S is not empty. `r’
should not have any duplicates to have same rows as `r’.
T→U
U→V 11. (a) s c (s c (R)) and (s c (s c (R))) will not be equiv-
1 1 2 1

V→S alent if conditions c1 and c2 are not equivalent.

Chapter 8.indd 449 4/9/2015 10:04:41 AM


450 Chapter 8: Databases

12. (c) Tuple calculus expression select tuples where Leaf nodes are linked together in B+ trees, hence
A = 10 and B = 20 of relation R. So, the equiva- range queries are faster.
lent relational algebra expression is s(A = 10(r) ∩
sB = 20(r). 17. (b) In B+ trees, disk access is slow and numbers of
hits are less. B+ trees have very high fan-out (typi-
13. (c) The query −t|¬ (t ∈ R1) is syntactically cor- cally on the order of 100 or more), which reduces
rect. But it requests for all tuples that are not in the number of I/O operations required to find an
R1. That set of such t tuples is obviously infinite, element in the tree.
in the context of such as the infinite domain set of 18. (43) Let n be the degree.
all integers. Therefore, this is unsafe query.
Given k key size (length of the name = 8 byte attri-
14. (c) Both relational algebra and relational calculus bute of student)
has same expressive powers. Every query written
in relation algebra can be expressed in relational Disk block size, B = 512 bytes
calculus. Index pointer size, b = 4 bytes
15. (d) Given schedule has W1(A), R2(A) and W2(B), Degree of B+ tree can be calculated if we know
R1(B) conflicting pairs. So the schedule is not seri- the maximum number of key an internal node can
alizable. Hence, it cannot occur in scheme using have; the formula for that is
2PL protocol. (n − 1)k + n × b = Block size
16. (b) Most database systems use indexes built on (n − 1)8 + n × 4 = 512
some form of a B+ tree due to its many advan- 12n = 520
tages, in particular its support for range queries. n = 520/12 = 43

Chapter 8.indd 450 4/9/2015 10:04:41 AM

You might also like