Unit 1completed

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

1

UNIT 1
RELATIONAL MODEL ISSUES
ER Model - Normalization Query Processing Query Optimization - Transaction Processing Concurrency Control Recovery - Database Tuning

ER MODEL

ER modeling is a top-down approach to database design that begins by identifying the


important data called entities and relationships between the data that must be represented in
the model.
ER modeling is an important technique for any database designer to master and forms the
basis of the methodology.
There are a number of different notations that can be used to represent each concept
diagrammatically which is known as Unified Modeling Language (UML).
UML is the successor to a number of object-oriented analysis and design methods
introduced in the 1980s and 1990s.
The Object Management Group (OMG) is currently looking at the standardization of UML
and it is anticipated that UML will be the de facto standard modeling language.
Entity Types
o A group of objects with the same properties, which are identified by the
enterprise as having an independent existence.
o The basic concept of the ER model is the entity type, which represents a group of
objectsin the real world with the same properties.
o An entity type has an independent existence and can be objects with a physical (or
real) existence or objects with a conceptual (or abstract) existence,
Physical existence
Staff
part
Property
supplier
Customer
product
Conceptual existence
Viewing
sale
Inspection
work experience
o Each uniquely identifiable object of an entity type is referred to simply as an entity
occurrence.
o Examples for entity types:
Staff, Branch, PropertyForRent, and PrivateOwner
o Diagrammatic representation of entity types
Each entity type is shown as a rectangle labeled with the name of the entity,
which isnormally a singular noun.
In UML, the first letter of each word in the entity name is upper case (for
example, Staffand PropertyForRent)

Relationship types
o A relationship typeis a set of associations between one or more participating entity
types.
o Each relationship type is given a name that describes its function.
o Example of a relationship type:POwns, which associates thePrivateOwnerand
PropertyForRententities.
o A relationship occurrenceindicates the particular entity occurrences that are related.
o Consider a relationship type called Has, which represents an association between
Branch and Staff entities that is Branch HasStaff. Each occurrence of the Has
relationship associatesone Branch entity occurrence with one Staff entity occurrence.
o A semantic net is an object-level model, which uses the symbol to represent entities
and the symbol

to represent relationships.

o Each relationship describes an association of a single Branch entity occurrence with a


single Staff entity occurrence.
o Relationships are represented by lines that join each participating Branch entity with
the associated Staff entity.

o The ER model uses a higher level of abstraction than the semantic net by combining
sets of entity occurrences into entity types and sets of relationship occurrences into
relationship types.
o Diagrammatic representation of relationships types
Each relationship type is shown as a line connecting the associated entity
types, labeled with the name of the relationship. Normally, a relationship is
named using a verb (for example, Supervises or Manages) or a short phrase
including a verb (for example, LeasedBy).
Again, the first letter of each word in the relationship name is shown in upper
case. Whenever possible, a relationship name should be unique for a given
ER model.
A relationship is only labeled in one direction, which normally means that
the name of the relationship only makes sense in one direction (for example,
Branch HasStaff makes more sense than Staff HasBranch).
So once the relationship name is chosen, an arrow symbol is placed beside
the name indicating the correct direction for a reader to interpret
therelationship name (for example, Branch Has Staff).
o Degree of Relationship Type

The entities involved in a particular relationship type are referred to as participants


in that relationship.
The number of participants in a relationship type is called the degreeof
thatrelationship.
Therefore, the degree of a relationship indicates the number of entity types involved
in a relationship.
A relationship of degree two is called binary.
Examples:POwns relationships has PrivateOwner and PropertyForRent
entities
A relationship of degree three is called ternary.
Example Registerswith three participating entity types, namely Staff,
Branch, and Client.

The term complex relationship is used to describe relationships with degrees


higher than binary.
A relationship of degree four is called quaternary.
Examples:Arranges with four participating entity types, namely Buyer,
Solicitor, Financial_Institution, and Bid

Diagrammatic representation of complex relationships


The UML notation uses a diamond to represent relationships with degrees higher
than binary( complex relationships).
The name of the relationship is displayed inside the diamond and in this case the
directional arrow normally associated with the name is omitted.

o Recursive Relationship
A relationship type where the same entity type participates more than
once in different roles.

Attributes
o The particular properties of entity types are called attributes.
Example:
a Staff entity type may be described by the staffNo, name,
position, and salary attributes.
o The attributes hold values that describe each entity occurrence and represent the
main part of the data stored in the database.
o Each attribute is associated with a set of values called a domain.
o The domain defines the potential values that an attribute may hold and is similar
to the domain concept in the relational model
Example: the number of rooms associated with a property is between 1 and 15
for each entity occurrence.
define the set of values for the number of rooms (rooms) attribute of the
PropertyForRent entity type as the set of integers between 1 and 15.
o Attributes may share a domain. For example, the address attributes of the Branch,
PrivateOwner, and BusinessOwner entity types share the same domain of all possible
addresses. Domains can also be composed of domains.
Example: the domain for theaddressattribute of the Branch entity is made up of
sub-domains: street, city, and postcode.
o Attributes can be classified as:

Simple Attribute
o Simple attributes cannot be further subdivided into smaller
components.
o Simple attributes are sometimes called atomic attributes.
Examples: position and salaryof the Staffentity
Composite;
o Some attributes can be further divided to yield smaller
componentswith an independent existence of their own.
Examples:address attribute as a simple attribute or to subdivide the
attribute into street, city, and postcode
Single Valued
o An attribute that holds a single value for each occurrence of an entity
type
Example: each occurrence of the Branch entity type has a single value for
the branch number (branchNo) attribute (for example B003), and therefore the
branchNoattribute is referred to as being single-valued.
Multi-Valued
o An attribute that holds multiple values for each occurrence of an
entity type
Example: each occurrence of the Branch entity type can have multiple
values for the telNo attribute (for example, branch number B003 has telephone
numbers 0141-339-2178 and 0141-339-4439) and therefore the telNo attribute in
this case is multi-valued.
o A multi-valued attribute may have a set of numbers with upper and
lower limits.
Example: a branch may have aminimum of a single telephone number to
a maximum of three telephone numbers.
Derived
o An attribute that represents a value that is derivable from the value
of a related attribute or set of attributes, not necessarily in the same
entity type.
Example:
o The value for the duration attribute of the Lease entity is calculated
from the rentStart and rentFinish attributes also of the Lease entity
type. We refer to the duration attribute as a derived attribute, the
value of which is derived from the rentStart and rentFinish attributes.
Derived attributes may also involve the association of attributes of different
entity types.
Example:
o Consider an attribute called deposit of the Lease entity type. The
value of the deposit attribute is calculated as twice the monthly rent
for a property. Therefore, the value of the deposit attribute of the L
PropertyForRententity type.

o Keys
Candidate Key:The minimal set of attributes that uniquely identifies each
occurrence of an entity type.
o A candidate key is the minimal number of attributes, whose
value(s) uniquely identify each entity occurrence.
o Example:
The branch number (branchNo) attribute is the candidate key
for the Branch entity type, and has a distinct value for each
branch entity occurrence.
o The candidate key must hold values that are unique for every
occurrence of an entity type.
o This implies that a candidate key cannot contain a null.
o Example:
Each branch has a unique branch number (for example, B003),
and there will never bemore than one branch with the same
branch number.
Primary Key: The candidate key that is selected to uniquely identify each
occurrence of an entity type.An entity type may have more than one candidate
key.
o Example:
Consider that a member of staff has a unique companydefined staff number (staffNo) and also a unique National
Insurance Number (NIN) that is used by the Government.
We therefore have two candidate keys for the Staff entity, one
of which must be selected as the primary key.
we select staffNo as the primary key of the Staff entity type
and NIN is then referred to as the alternate key.
Composite Key: A candidate key that consists of two or more attributes.
o Example:
Consider an entity called Advert with propertyNo(property
number), newspaperName, dateAdvert, and cost attributes.
Many properties are advertised in many newspapers on a given
date.
To uniquely identify each occurrence of the Advert entity type
requires values for the propertyNo, newspaperNameand
dateAdvertattributes.
Thus, the Advert entity type has a composite primary key
made up of the propertyNo, newspaperName, and dateAdvert
attributes.

o Diagrammatic representation of attributes:


attributes
If an entity type is to be displayed with its attributes, we divide the rectangle
representing the entity in two.
The upper part of the rectangle displays the name of the entity and the lower
part lists the names of the attributes.
attributes

The first attribute(s) to be listed is the primary key for the entity type, if
known.
Thename(s)
name(s) of the primary key attribute(s) can be labeled with the tag {PK}.
Additional tags that can be used include partial primary key {PPK} when an
attribute forms part of a composite primary key, and alternate key {AK}.
For simple, single-valued
single
attributes, there is no need to use tags and so we
simply display the attribute names
n
in a list below the entity name.
For composite attributes, we list the name of the composite attribute
followed below and indented to the right by the namesof its simple component
attributes.
o For example, the composite attribute address of the Branc
Branch entity is
shown followed below by the names of its component attributes, street,
city, and postcode.
For multi-valued
valued attributes,
attributes we label the attribute name with an indication of
the range of values available for the attribute.
attribute
o For example, if we label thetelNo
telNo attribute with the range [1..*], this
means that the values for the telNo attribute is one or more.
If we know the precise maximum number of values,, we can label the
attribute with an exact range.
o For example, if the telNo attribute holds one to
to a maximum of three
values, we can label the attribute with [1..3].
For derived attributes,
attributes we prefix the attribute name with a /.
For example, the derivedattribute of the Staffentity type as /totalStaff.
Strong and weak entity types
o We can classify entity types as strong or weak

Strong Entity Type: An entity type that is notexistence-dependent on


some otherentity type.
An entity type is referred to as being strong if its existence does not
depend upon the existence of another entity type.
o Examples:the Staff, Branch, PropertyForRent, and Client
entities.
A characteristic of a strong entity type is that each entity occurrence is
uniquely identifiable using the primary key attribute(s) of that entity
type.
Examples: we can uniquely identify each member of staff using the
staffNo attribute, which is the primary key for the Staff entity type.

Weak Entity Type: An entity type that is existence-dependent on some


other entitytype.
A weak entity type is dependent on the existence of another entity
type.
Example: a weak entity type called Preference.

A characteristic of a weak entity is that each entity occurrence cannot be


uniquely identified using only the attributes associated with that entity
type.
Example: there is no primary key for the Preference entity. This means
that we cannot identify each occurrence of the Preference entity type
using only the attributes of this entity.
Weak entity types are sometimes referred to as child, dependent, or
subordinate entities and strong entity types as parent, owner, or
dominant entities.
Structural Constraints
o The main type of constraint on relationships is called multiplicity.
o The number (or range) of possible occurrences of an entity type that may relate to a
single occurrence of an associated entity type through a particular relationship.
o Multiplicity constrains the way that entities are related. It is a representation of the
policies (or business rules) established by the user or enterprise.

o Ensuring that all appropriate constraints are identified and represented is an important
part of modeling an enterprise.
o Binary relationships are generally referred to as being one-to-one (1:1), one-to-many
(1:*), ormany-to-many (*:*).
o We examine these three types of relationships using the following integrity
constraints:
a member of staff manages a branch (1:1)
a member of staff oversees properties for rent (1:*)
newspapers advertise properties for rent (*:*)
o One-to-One (1:1) Relationships
Each relationship (rn) represents the association between a single Staff entity
occurrence and a single Branch entity occurrence.

Determining the multiplicity


o Determining the multiplicity normally requires examining the precise
relationships between the data given in a enterprise constraint using
sample data.
o The sample data may be obtained by examining filled-in forms or
reports and, if possible, from discussion with users.
Diagrammatic representation of 1:1 relationships
o An ER diagram of the Staff ManagesBranchrelationship.
o To represent that a member of staff can manage zero or onebranch, we
place a 0..1 beside the Branchentity.

o One-to-Many (1:*) Relationships


Each relationship (rn) represents the association between a single Staff entity
occurrence and a single PropertyForRententityoccurrence.

10

Determining the multiplicity


o Memberof staff can oversee zero or more properties for rent and a
property for rent is overseen by zero or one member of staff.
o Therefore, for members of staff participating in this relationship there
are many properties for rent, and for properties participating in this
relationshipthere is a maximum of one member of staff.
o We refer to this type of relationship as one-to-many, which we usually
abbreviate as (1:*).
Diagrammatic representation of 1:* relationships
o To represent that a member of staff can oversee zero or more
properties for rent, we place a 0..* beside the PropertyForRententity.
o To represent that each property for rent is overseen by zero or one
member of staff, we place a 0..1 beside the Staffentity.

o If we know the actual minimum and maximum values for the


multiplicity, we can display these instead.
o For example, if a member of staff oversees a minimum of zero and a
maximum of 100 properties for rent, we can replace the 0..* with
0..100.
o Many-to-Many (*:*) Relationships
Each relationship (rn) represents the association between a single Newspaper
entity occurrence and a single PropertyForRent entity occurrence.
Determining the multiplicity
Newspaper advertises one or more properties for rent an oneproperty
for rent is advertised in zero or morenewspapers.
Therefore, for newspapers there are many properties for rent, and for
each property for rent participating in this relationship there are
manynewspapers.

11

Determining the multiplicity


Newspaper advertises one or more properties for rent an oneproperty
for rent is advertised in zero or morenewspapers.
Therefore, for newspapers there are many properties for rent, and for
each property for rent participating in this relationship there are
manynewspapers.
Diagrammatic representation of *:* relationships
To represent that each newspaper can advertise one or moreproperties
forrent, we place a 1..* beside the PropertyForRententity type.
To represent that each property for rent can be advertised by zero or
more newspapers, we place a 0..* beside the Newspaperentity.

Cardinality and Participation Constraints


o Multiplicity actually consists of two separate constraints known as cardinality and
participation.
o Cardinality
Describes the maximum number of possible relationship occurrencesfor an
entity participating in a given relationship type.
The cardinality of a binary relationship is what we previously referred to as a
one-toone (1:1), one-to-many (1:*), and many-to-many (*:*).
The cardinality of a relationship appears as the maximum values for the
multiplicity ranges on either side of the relationship.
Example: The Manages relationship has one-to-one (1:1) cardinality and
this is represented by multiplicity ranges with a maximum value of 1 on both
sides of the relationship.

12

o Participation
Determines whether all or only some entity occurrences participatein a
relationship.
The participation constraint represents whether all entity occurrences are
involved in a particular relationship (referred to as mandatory participation)
or only some (referred to as optional participation).
The participation of entities in a relationship appears as the minimum values
for the multiplicity ranges on either side of the relationship.
Optional participation is represented as a minimum value of 0 while
mandatory participation is shown asa minimum value of 1.
Problems with ER Models
o The problems in ER Models are referred to as connection traps, and normally occur
due to a misinterpretation of the meaning of certain relationships.
o Two main types of connection traps:
Fan traps
Chasm traps
o Fan traps
Where a model represents a relationship between entity types, but the pathway
between certain entity occurrences is ambiguous.
A fan trap may exist where two or more 1:* relationships fan outfrom the
same entity.
A potential fan trap shows two 1:* relationships (Has and Operates)
emanating from the same entity called Division.
This model represents the facts that a single division operates one or more
branches andhas one or more staff.
However, a problem arises when we want to know which membersof staff
work at a particular branch.

13

To appreciate the problem, we examine some occurrences of the Has and


Operates relationships using values for the primary key attributes of the Staff,
Division, and Branch entity types

Fan trap

Removing of fan trap:

o Chasm Traps
Where a model suggests the existence of a relationship between entity types,
but the pathway does not exist between certain entity occurrences.
Chasm trap

A chasm trap may occur where there are one or more relationships with a
minimum multiplicity of zero (that is optional participation) forming part of
the pathway between related entities.
Removing the chasm trap:

14

Enhanced ER Model
o As the basic concepts of ER modeling are often not sufficient to represent the
requirements of the newer, more complex applications, this stimulated the need to
develop additional semantic modeling concepts.
o Many differentsemantic data models have been proposed and some of the most
important semantic concepts have been successfully incorporated into the original ER
model.
o The ER model supported with additional semantic concepts is called the Enhanced
EntityRelationship(EER) model.
o There are three important and useful additional concepts of the EER model, namely
Specialization/Generalization
Aggregation
Composition
o Specialization/Generalization:
The concept of specialization/generalization is associated with special types of
entities known as super-classes and subclasses, and the process of attribute
inheritance.

15

The two main types of constraints on super-class/subclass relationships called


participation and disjoint constraints.
Superclasses and Subclasses:
Superclass: An entity type that includes one or more distinct
subgroupings of itsoccurrences, which require to be represented in a
data model.
Subclass: A distinct subgrouping of occurrences of an entity type,
which require to be represented in a data model.
Entity types that have distinct subclasses are called superclasses.
For example, the entities that are members of the Staff entity type
may be classified as Manager, SalesPersonnel, and Secretary.
In other words, the Staff entity is referred to as the superclassof the
Manager,SalesPersonnel, and Secretarysubclasses.
The relationship between a superclass and anyone of its subclasses is
called a superclass/subclass relationship.
For example, Staff/Manager has a superclass/subclass relationship.
Superclass/Subclass Relationships
o Each member of a subclass is also a member of the
superclass. In other words, the entity in the subclass is the
same entity in the superclass, but has a distinct role.
o The relationship between a superclass and a subclass is one-toone (1:1) and is called a superclass/subclass relationship.
o Some superclasses may contain overlapping subclasses, as
illustrated by a member of staff who is both a Manager and a
member of Sales Personnel. In this example, Manager and
SalesPersonnel are overlapping subclasses of the Staff
superclass.

Attribute Inheritance
o A subclass is an entity in its own right and so it may also have one or
more subclasses.
o An entity and its subclasses and their subclasses, and so on, is called a
type hierarchy. Type hierarchies are known by a variety of names
including:

16

Specialization Hierarchy(for example, Manageris a


specialization of Staff),
Generalization
Hierarchy(for
example,
Staffis
a
generalization of Manager),
IS-A Hierarchy (for example, Manager IS-A (member of )
Staff).
o A subclass with more than one superclass is called a shared subclass.
In other words, a member of a shared subclass must be a member of
the associated superclasses.
o As a consequence, the attributes of the superclasses are inherited by
the shared subclass, which may also have its own additional attributes.
This process is referred to as multiple inheritance.
o Specialization Process:
The process of maximizing the differences between members of an entity by
identifying their distinguishing characteristics is known as specialization
process
Specialization is a top-down approach to defining a set of superclasses and
their related subclasses.
The set of subclasses is defined on the basis of some distinguishing
characteristics of the entities in the superclass.
When we identify a set of subclasses of an entity type, we then associate
attributes specific to each subclass (where necessary), and also identify any
relationships between each subclass and other entity types or subclasses
(where necessary).
For example, consider a model where all members of staff are represented as
an entity called Staff.
If we apply the process of specialization on the Staff entity, we attempt to
identify differences between members of this entity such as members with
distinctive attributes and/or relationships.
o Generalization Process:
The process of minimizing the differences between entities by identifying
their common characteristics is known as generalization process
The process of generalization is a bottom-up approach, which results in the
identification of a generalized superclass from the original entity types.
For example, consider a model where Manager, SalesPersonnel, and Secretary
are represented as distinct entity types.
If we apply the process of generalization on these entities, we attempt to
identify similarities between them such as common attributes and
relationships.
o Diagrammatic representation of specialization/generalization:

17

Constraints on Specialization/Generalization
o There are two constraints that may apply to a specialization/generalization
called participation constraints and disjoint constraints.
o Participation constraints
Determines whether every member in the superclass must participate
as a member of a subclass.
A participation constraint may be mandatoryor optional.
A superclass/subclass relationship with mandatory participation
specifies that every member in the superclass must also be a member
of a subclass.
To represent mandatory participation, Mandatory is placed in curly
brackets below the triangle that points towards the superclass.
A superclass/subclass relationship with optional participation specifies
that a member of a superclass need not belong to any of its subclasses.
To represent optional participation,Optional is placed in curly
brackets below the triangle that points towards the superclass.
o Disjoint constraints
Describes the relationship between members of the subclasses and
indicates whether it is possible for a member of a superclass to be a
member of one, or more than one, subclass.
disjoint constraint only applies when a superclass has more than one
subclass. If the subclasses are disjoint, then an entity occurrence can
be a member of only one of the subclasses.
To represent a disjoint superclass/subclass relationship, Or is placed
next to the participation constraint within the curly brackets.
If subclasses of a specialization/generalization are not disjoint (called
nondisjoint), then an entity occurrence may be a member of more than
one subclass. To represent a nondisjoint superclass/subclass
relationship, And is placed next to the participation constraint within
the curly brackets.

18

o Aggregation:
Represents a has-a or is-part-of relationship between entity types, where one
represents the whole and the other the part.
A relationship represents an association between two entity types that are
conceptually at the same level.
Sometimes we want to model a has-a or is-part-of relationship, in which one
entity represents a larger entity (the whole), consisting of smaller entities
(theparts). This special kind of relationship is called an aggregation (Booch et al.,
1998).
Aggregation does not change the meaning of navigation across the relationship
between the whole and its parts, nor does it link the lifetimes of the whole and its
parts.

19

An example of an aggregation is the Has relationship, which relates the Branch entity
(the whole) to the Staff entity (the part).
Diagrammatic representation of aggregation

o Composition
A specific form of aggregation that represents an association between entities,
where there is a strong ownership and coincidental lifetime between the whole
and the part.
There is a variation of aggregation called composition that represents a strong
ownership and coincidental lifetime between the whole and the part (Booch et
al., 1998).
In a composite, the whole is responsible for the disposition of the parts,
whichmeans that the composition must manage the creation and destruction of its
parts.

Diagrammatic representation of composition


o UML represents composition by placing a filled-in diamond shape at
one end of the relationship line next to the entity that represents the
whole in the relationship.
o For example, to represent the Newspaper DisplaysAdvert composition,
the filled-in diamond shape is placed next to the Newspaper entity,
which is the whole in this relationship.

20

NORMALIZATION

Normalization is a database design technique, which begins by examining the


relationships (called functional dependencies) between attributes. Attributes describe
some property of the data or of the relationships between the data that is important to the
enterprise.
Normalization uses a series of tests (described as normal forms) to help identify the
optimal grouping for these attributes to ultimately identify a set of suitable relations that
supports the data requirements of the enterprise.
A technique for producing a set of relations with desirable properties, given the data
requirements of an enterprise is known as Normalization.
PURPOSE OF NORMALIZATION
To identify a suitable set of relations that support the data requirements of an
enterprise.
The characteristics of a suitable set of relations include the following:
The minimalnumber of attributes necessary to support the data requirements of
theenterprise;
Attributes with a close logical relationship (described as functional
dependency)arefound in the same relation;
Minimal redundancy with each attribute represented only once with the
important exception of attributes that form all or part of foreign keys , which are
essential for the joining of related relation
How Normalization Supports Database Design:
Approach 1:
Shows how normalization can be used as a bottomup standalone database design technique

21

Approach 2 shows how normalization canbe used as a validation technique to check the structure
of relations, which may have beencreated using a top-down approach such as ER modeling.

The relations have the form:


o Staff (staffNo, sName, position, salary, branchNo)
o Branch (branchNo, bAddress)
o StaffBranch (staffNo, sName, position, salary, branchNo, bAddress)
The primary key for each relation is underlined.
Relations that have redundant data may have problems called update anomalies, which
are classified as insertion, deletion, or modification anomalies.
Insertion Anomalies
o There are two main types of insertion anomaly
To insert the details of new members of staff into the StaffBranch relation,
we must include the details of the branch at which the staff are to be located.
To insert details of a new branch that currently has no members of staff into
the StaffBranch relation, it is necessary to enter nulls into the attributes for
staff, such as staffNo.
Deletion Anomalies

22

o If we delete a tuple from the StaffBranch relation that represents the last member
of staff located at a branch, the details about that branch are also lost from the
database.
o For example, if we delete the tuple for staff number SA9 (Mary Howe) from the
Staff Branchrelation, the details relating to branch number B007 are lost from the
database.
Modification Anomalies
o If we want to change the value of one of the attributes of a particular branch in the
StaffBranch relation, for example the address for branch number B003, we must
update the tuples of all staff located at that branch.
o If this modification is not carried out on all the appropriate tuples of the
StaffBranch relation, the database will become inconsistent.
Functional Dependencies:
o An important concept associated with normalization is functional dependency,
which describes the relationship between attributes.
o Definition: Describes the relationship between attributes in a relation. For
dependency example, if A and B are attributes of relation R, B is functionally
dependent on A (denoted A B), if each value of A is associated with exactly
one value of B. (A and B may each consist of one or more attributes.)
o Characteristics of Functional Dependencies
Assume that a relational schema has attributes (A, B, C,..., Z) and that the
database is described by a single universal relation called R= (A, B, C,..., Z).
This assumption means that every attribute in the database has a unique
name.
o Functional dependency is a property of the meaning or semantics of the
attributes in a relation.
The semantics indicate how attributes relate to one another, and specify
the functional dependencies between attributes.
When a functional dependency is present, the dependency is specified as a
constraint between the attributes.
o Consider a relation with attributes A and B, where attribute B is functionally
dependent on attribute A. If we know the value of Aand we examine the relation
that holds this dependency, we find only one value of Bin all the tuples that have
a given value of A, at any moment in time. Thus, when two tuples have the same
value of A, they also have the same value of B. However, for a given value of B
there may be several different values of A.

o An alternative way to describe the relationship between attributes Aand Bis to


say that A functionally determines B.

23

o When a functional dependency exists, the attribute or group of attributes on the


left-hand side of the arrow is called the determinant.
o An additional characteristic of functional dependencies that is useful for
normalization is that their determinants should have the minimal number of
attributes necessary to maintain the functional dependency with the attribute(s)
on the right hand-side. This requirement is called full functional dependency.
o A functional dependency AB is a full functional dependency if removal of
any attribute from A results in the dependency no longer existing.
o A functional dependency AB is a partially dependency if there is some
attribute that can be removed from A and yet the dependency still holds.
The functional dependencies that we use in normalization have the following
characteristics:
o There is a one-to-one relationship between the attribute(s) on the left-hand
side (determinant) and those on the right-hand side of a functional
dependency.
o They hold for all time.
o The determinant has the minimal number of attributes necessary to
maintain the dependency with the attribute(s) on the right-hand side. In other
words, there must be a full functional dependency between the attribute(s) on
the left- and right-hand sides of the dependency.
There is an additional type of functional dependency called a transitive dependency
that we need to recognize because its existence in a relation can potentially cause the
types of update anomaly.

The Process of Normalization:


o Normalization is a formal technique for analyzing relations based on their primary
key (or candidate keys) and functional dependencies (Codd, 1972b).
o The technique involves a series of rules that can be used to test individual relations so
that a database can be normalized to any degree.
o When a requirement is not met, the relation violating the requirement must be
decomposed into relations that individually meet the requirements ofnormalization.
o There are six normal forms were proposed called
First Normal Form (1NF),
Second Normal Form (2NF),
Third Normal Form (3NF)
BoyceCodd Normal Form (BCNF)
Fourth Normal Form (4NF)
Fifth Normal Form (5NF)

24

o First Normal Form (1NF):


Unnormalized Form (UNF) A table that contains one or more repeating groups.
First Normal Form (1NF) A relation in which the intersection of each row and
column contains one and only one value.
The process of normalization begins by first transferring the data from the source (for
example, a standard data entry form) into table format with rows and columns.
In this format, the table is in Unnormalized Form and is referred to as an
unnormalized table.
To transform the unnormalized table to First Normal Form we identify and remove
repeating groups within the table.
A repeating group is an attribute, or group of attributes, within a table that
occurs with multiple values for a single occurrence of the nominated key attribute(s)
for that table.
There are two common approaches to removing repeating groups from unnormalized
tables:
By entering appropriate data in the empty columns of rows containing
the repeating data. In other words, we fill in the blanks by duplicating the
nonrepeating data, where required. This approach is commonly referred
to as flattening the table.

25

By placing the repeating data, along with a copy of the original key
attribute(s), in a separate relation. Sometimes the unnormalized table
may contain more than one repeating group, or repeating groups within
repeating groups. In such cases, this approach is applied repeatedly until no
repeating groups remain. A set of relations is in 1NF if it contains no
repeating groups.
For both approaches, the resulting tables are now referred to as 1NF relations
containing atomic (or single) values at the intersection of each row and column.
Although both approaches are correct, approach 1introduces more redundancy
into the original UNF table as part of the flattening process, whereas approach
2 creates two or more relations with less redundancy than in the original UNF
table.

Repeating Group =(propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName)

For example, there are two values for propertyNo(PG4 and PG16) for the
clientnamed John Kay. To transform an unnormalized table into 1NF, we ensure that
there is asingle value at the intersection of each row and column. This is
achieved by removing the repeating group.
With the first approach, we remove the repeating group (property rented details) by
entering the appropriate client data into each row. The resulting first normal form
ClientRental relation is shown.
First Normal Form :

We use the functional dependencies to identify candidate keys for the


ClientRentalrelation as being composite keys comprising (clientNo, propertyNo),

26

(clientNo, rentStart), and (propertyNo, rentStart). We select (clientNo, propertyNo)


as the primary key for the relation, and for clarity we place the attributes that make
up the primary key together at the left-hand side of the relation.
FDs

The ClientRentalrelation is defined as follows:


ClientRental(clientNo, propertyNo, cName, pAddress, rentStart,
rentFinish, rent, ownerNo, oName)
The ClientRental relation is in 1NF as there is a single value at the intersection of
each row and column. The relation contains data describing clients, property rented,
and property owners, which is repeated several times. As a result, the
ClientRentalrelation contains significant data redundancy.
The format of the resulting 1NF relations are as follows:
Client (clientNo, cName)
PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
rentFinish, rent, ownerNo, oName)
o Second Normal Form (2NF):
Second Normal Form (2NF) is based on the concept of full functional dependency.
Second Normal Form applies to relations with composite keys, that is, relations with a
primary key composed of two or more attributes. A relation with a single-attribute
primary key is automatically in at least 2NF. A relation that is not in 2NF may suffer
from the update anomalies.
DEFINITION:A relation that is in First Normal Form and every non-primary-key
attribute is fully functionally dependent on the primary key.
The normalization of 1NF relations to 2NF involves the removal of partial
dependencies. If a partial dependency exists, we remove the partially dependent
attribute(s) from the relation by placing them in a new relation along with a copy of their
determinant.
The ClientRental relation has the following functional dependencies:
fd1
clientNo, propertyNorentStart, rentFinish (Primary key)
fd2
clientNocName (Partial dependency)
fd3
propertyNopAddress, rent, ownerNo, oName (Partial dependency)

27

fd4
ownerNooName (Transitive dependency)
fd5
clientNo, rentStart propertyNo, pAddress,rentFinish, rent, ownerNo,
oName (Candidate key)
fd6
propertyNo, rentStartclientNo, cName, rentFinish (Candidate key)
This results in the creation of three new relations called Client, Rental, and
PropertyOwner
These three relations are in Second Normal Form as every non primary-key attribute is
fully functionally dependent on the primary key of the relation.
The relations have the following form:
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)

o Third Normal Form (3NF):


Although 2NF relations have less redundancy than those in 1NF, they may still suffer
from update anomalies.
If we update only one tuple and not the other, the database would be in an inconsistent
state. This update anomaly is caused by a transitive dependency; we need to remove
such dependencies by progressing to Third Normal Form.
DEFINITION for Third Normal Form (3NF): A relation that is in First and Second
Normal Form and in which no non-primary-key attribute is transitively dependent
on the primary key.
The normalization of 2NF relations to 3NF involves the removal of transitive
dependencies. If a transitive dependency exists, we remove the transitively dependent
attribute(s) from the relation by placing the attribute(s) in a new relation along with a
copy of the determinant.
The functional dependencies for the Client, Rental, and PropertyOwner relations:
Client
fd2
clientNocName (Primary key)

28

Rental
fd1
clientNo, propertyNorentStart, rentFinish (Primary key)
fd5
clientNo, rentStart propertyNo, rentFinish (Candidate key)
fd6
propertyNo, rentStart clientNo, rentFinish (Candidate key)
PropertyOwner
fd3
propertyNopAddress, rent, ownerNo, oName (Primary key)
fd4
ownerNooName (Transitive dependency)
All the non-primary-key attributes within the Client and Rental relations are functionally
dependent on only their primary keys. The Client and Rental relations have no transitive
dependencies and are therefore already in 3NF.
To transform the PropertyOwner relation into 3NF we must first remove this transitive
dependency by creating two new relations called PropertyForRent and Owner, as shown in
Figure 13.15. The new relations have the form:
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)
The PropertyForRentand Ownerrelations are in 3NF as there are no further transitive
dependencies on the primary key.
3NF derived from the PropertyOwner relation

Decomposition of the ClientRental 1NF into 3NF


The resulting 3NF relationshave the form:
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)
The normalization process has decomposed the original ClientRental relation using a series
of relational algebra projections. This results in a lossless-join(also called nonloss-or
nonadditive-join) decomposition, which is reversible using the natural join operation.

29

More on Functional Dependencies:


One of the main concepts associated with normalization is functional dependency,
whichdescribes the relationship between attributes.
Inference Rules for Functional Dependencies:
o Ideally,we want to identify a set of functional dependencies (represented as X) for
a relation that is smaller than the complete set of functional dependencies
(represented as Y) for that relation and has the property that every functional
dependency in Yis implied by thefunctional dependencies in X.
o Hence, if we enforce the integrity constraints defined by thefunctional
dependencies in X, we automatically enforce the integrity constraints defined
inthe larger set of functional dependencies in Y.
o This requirement suggests that there mustbe functional dependencies that can
be inferred from other functional dependencies.
o Forexample, functional dependencies ABandBCin a relation implies that
the functional dependency ACalso holds in that relation. ACis an example
of a transitivefunctional dependency.
o The set of all functional dependencies that are implied by a given set of
functionaldependencies Xis called the closureof X, written X+. We clearly need a
set of rules to help compute X+from X. A set of inference rules, called
Armstrongs axioms, specifieshow new functional dependencies can be inferred
from given ones (Armstrong, 1974).
o Let A, B, and C be subsets of the attributes of the relation R. Armstrongsaxioms
are as follows:
Reflexivity: If Bis a subset of A, then AB
Augmentation: If AB, then A,CB,C
Transitivity: If ABand BC, then AC
Self-determination: AA
Decomposition: If AB,C, then ABand AC
Union: If ABand AC, then AB,C
Composition: If ABand CDthen A,CB,D

30

Minimal Sets of Functional Dependencies:


o A set of functional dependencies Yis covered by a set of functional dependencies
X, if every functional dependency in Yis also in X+; that is, every dependency in
Ycan be inferred from X.
o A set of functional dependencies Xis minimal if itsatisfies the following
conditions:
Every dependency in Xhas a single attribute on its right-hand side.
We cannot replace any dependency ABin Xwith dependency CB,
where Cis aproper subset of A, and still have a set of dependencies that is
equivalent to X.
We cannot remove any dependency from Xand still have a set of
dependencies that isequivalent to X.
o A minimal set of dependencies should be in a standard form with no
redundancies.A minimal cover of a set of functional dependencies Xis a
minimal set of dependencies Xminthat is equivalent to X.
BoyceCodd Normal Form (BCNF):
Definition of BoyceCodd Normal Form:
o A relation is in BCNF, if and only if, every determinant is a candidate key.
o To test whether a relation is in BCNF, we identify all the determinants and make sure
that they are candidate keys. Recall that a determinant is an attribute, or a group
ofattributes, on which some other attribute is fully functionally dependent..
o Example, we re-examine the Client, Rental, PropertyForRent,and Owner relations.
The Client, PropertyForRent, and Ownerrelationsare all in BCNF, as each relation
only has a single determinant, which is the candidate key.
o However, recall that the Rentalrelation contains the three determinants (clientNo,
propertyNo),(clientNo, rentStart), and (propertyNo, rentStart), originally identified in
asshown below:
fd1
clientNo, propertyNorentStart, rentFinish
fd5
clientNo, rentStart propertyNo, rentFinish
fd6
propertyNo, rentStart clientNo, rentFinish
Fourth Normal Form (4NF):
o To the identification of another type of dependency called a MultiValuedDependency (MVD), which can also cause data redundancy.
o Multi-Valued Dependency:
The possible existence of multi-valued dependencies in a relation is due to
First NormalForm, which disallows an attribute in a tuple from having a set of
values.
For example, if wehave two multi-valued attributes in a relation, we have to
repeat each value of one of the attributes with every value of the other
attribute, to ensure that tuples of the relation are consistent.
This type of constraint is referred to as a multi-valued dependency and results
in data redundancy.

31

We represent a MVD between attributes A, B, and Cin a relation using the


following notation:
A>>B
A>>C
For example, we specify the MVD in the BranchStaffOwnerrelation shown in
Figure 14.8(a)as follows:
branchNo>>sName
branchNo>>oName

Fifth Normal Form (5NF):


o Whenever we decompose a relation into two relations the resulting relations have the
lossless-join property.
o This property refers to the fact that we can rejoin the resulting relations to produce the
original relation.
o However, there are cases where there is the requirement to decompose a relation into
more than two relations.
o Although rare, these cases are managed by join dependency and Fifth Normal Form
(5NF).
o Lossless-Join Dependency:
A property of decomposition, which ensures that no spurious tuples are
generated when relations are reunited through a natural join operation.
o Definition:
A relation that has no join dependency.
o Fifth Normal Form (5NF) (also called Project-Join Normal Form(PJNF)) specifies that
a 5NF relation has no join dependency

32

QUERY PROCESSING:

The activities involved in parsing, validating, optimizing, and executing a query is known as
query processing.
The aims of query processing are to transform a query written in a high-level language,
typically SQL, into a correct and efficient execution strategy expressed in a low-level
language (implementing the relational algebra), and to execute the strategy to retrieve the
required data.
The activity of choosing an efficient execution strategy for processing a query is known as
query optimization.
An important aspect of query processing is query optimization.
o As there are many equivalent transformations of the same high-level query, the aim of
query optimization is to choose the one that minimizes resource usage.
o Generally, we try to reduce the total execution time of the query, which is the sum of
the execution times of all individual operations that make up the query.
o However, resource usage may also be viewed as the response time of the query, in
which case we concentrate on maximizing the number of parallel operations.
Compare different processing strategies
Find all managers who work at a London branch
SELECT * FROM staff s, Branch b WHERE s.branchNo=b.branchNo AND
(s.position=Manager AND b.city=London)

33

For the purposes of this example, we assume that there are 1000 tuples in
Staff, 50 tuples in Branch, 50 Managers (one for each branch), and 5 London
branches.
We compare these three queries based on the number of disk accesses
required.
For simplicity, we assume that there are no indexes or sort keys on either
relation, and that the results of any intermediate operations are stored on
disk.
We further assume that tuples are accessed one at a time and main memory
is large enough to process entire relations for each relational algebra
operation.
o The first query calculates the Cartesian product of Staff and Branch, which requires
(1000 +50) disk accesses to read the relations, and creates a relation with (1000 * 50)
tuples. We then have to read each of these tuples again to test them against the selection
predicate at a cost of another (1000 * 50) disk accesses, giving a total cost of:
(1000 +50) +2*(1000 * 50) =101 050 disk accesses
o The second query joins Staff and Branch on the branch number branchNo, which again
requires (1000 +50) disk accesses to read each of the relations. We know that the join of
the two relations has 1000 tuples, one for each member of staff (a member of staff can
only work at one branch). Consequently, the Selection operation requires 1000 disk
accesses to read the result of the join, giving a total cost of:
2*1000 +(1000 +50) =3050 disk accesses
o The final query first reads each Staff tuple to determine the Manager tuples, which
requires 1000 disk accesses and produces a relation with 50 tuples. The second Selection
operation reads each Branchtuple to determine the London branches, which requires 50
disk accesses and produces a relation with 5 tuples. The final operation is the join of the
reduced Staffand Branchrelations, which requires (50 +5) disk accesses, giving a total
cost of:
1000 +2*50 +5 +(50 +5) =1160 disk accesses
Phases of Query Processing:

34

o Dynamic versus static optimization


There are two choices for when the first three phases of query processing can be
carried out.
One option is to dynamically carry out decomposition
Optimization every time thequery is run.
Dynamic Query Optimization:
Advantage:
All information required to select an optimum strategy is up to
date.
Disadvantages:
The performance of the query is affected because the query has to
be parsed, validated, and optimized before it can be executed.
It may be necessary to reduce the number of execution strategies to
be analyzed to achieve an acceptable overhead, which may have
the effect of selecting a less than optimum strategy.
Static Query Optimization:where the query is parsed, validated, and optimized
once.

35

Advantage:
Runtime overhead is removed.
More time available to evaluate a larger number of execution
strategies
Increases the chances of finding a more optimum strategy
Disadvantages:
The execution strategy that is chosen as being optimal when the
query is compiled may no longer be optimal when the query is run.
o Query Decomposition
Query decomposition is the first phase of query processing.
Aims
To transform a high-level query into a relational algebra query, and to
check that the query is syntactically and semantically correct.
The typical stages of query decomposition are analysis, normalization, semantic
analysis, simplification, and query restructuring.
Analysis
In this stage, the query is lexically and syntactically analyzed using the
techniques of programming language compilers.
Also verifies that the relations and attributes specified in the query are
defined in the system catalog.
It also verifies that any operations applied to database objects are
appropriate for the object type.
For example, consider the following query:
SELECTstaffNumberFROMStaffWHEREposition>10;
This query would be rejected on two grounds:
(1) In the select list, the attribute staffNumberis not defined for the
Staffrelation (should bestaffNo).
(2) In the WHERE clause, the comparison >10 is incompatible
with the data type position, which is a variable character string.
On completion of this stage, the high-level query has been transformed
into some internal representation that is more suitable for
processing. The internal form that is typicallychosen is some kind of
query tree, which is constructed as follows:
A leaf node is created for each base relation in the query.
A non-leaf node is created for each intermediate relation produced
by a relational algebra operation.
The root of the tree represents the result of the query.
The sequence of operations is directed from the leaves to the root.

36

Normalization
The normalization stage of query processing converts the query into a
normalized form that can be more easily manipulated.
The predicate (in SQL, the WHERE condition), which may be arbitrarily
complex, can be converted into one of two forms by applying a few
transformation rules.
Conjunctive normal form A sequence of conjuncts that are connected
with the (AND) operator. Each conjunct contains one or more terms
connected by the (OR) operator.
For example:
(position=Manager salary>20000) branchNo=B003
A conjunctive selection contains only those tuples that satisfy all
conjuncts.
Disjunctive normal form A sequence of disjuncts that are connected
with the (OR) operator. Each disjunct contains one or more terms
connected by the (AND) operator.
For example, we could rewrite the a bove conjunctive normal
form as:
(position=Manager branchNo=B003 ) (salary>20000
branchNo=B003)
A disjunctive selection contains those tuples formed by the union
of all tuples that satisfy the disjuncts.
Semantic analysis
The objective of semantic analysis is to reject normalized queries that
are incorrectly formulated or contradictory. A query is incorrectly
formulated if components do not contribute to the generation of the
result, which may happen if some join specifications are missing. A
query is contradictory if its predicate cannot be satisfied by any tuple.
Forexample,
the predicate (position=Manager position=Assistant) on
the Staff relation is contradictory, as a member of staff cannot be
both a Manager and an Assistant simultaneously.
However,
the
predicate
((position=Manager
position=Assistant) salary> 20000) could be simplified to
(salary>20000) by interpreting the contradictory clauseas the
Boolean value FALSE.
Simplification
The objectives of the simplification stage are to detect redundant
qualifications, eliminate common subexpressions, and transform the
query to a semantically equivalent but more easily and efficiently
computed form.
Typically, access restrictions, view definitions, and integrity
constraints are considered at this stage, some of which may also
introduce redundancy. If the user does not have the appropriate access to
all the components of the query, the query must be rejected.
Query restructuring
In the final stage of query decomposition, the query is restructured to
provide a more efficient implementation.

37

QUERY OPTIMIZATION:
Transformation Rules for the Relational Algebra Operations:
We use three relations R, S, and T, with Rdefined over the attributes A={A1, A2,..., An},
andSdefined over B={B1, B2,..., Bn}; p, q, and r denote predicates, and L, L1, L2, M, M1, M2,
andNdenote sets of attributes.
(1) Conjunctive Selection operations can cascade into individual Selection operations (and
vice versa).
This transformation is sometimes referred to as cascade of selection. For example:
branchNo=B003 salary>15000(Staff) =branchNo=B003(salary>15000(Staff))
(2) Commutativity of Selection operations.

(3) In a sequence of Projection operations, only the last in the sequence is required.

(4) Commutativity of Selection and Projection.


If the predicate p involves only the attributes in the projection list, then the Selectionand
Projection operations commute:

(5) Commutativity of Theta join (and Cartesian product).

Examples:

(6) Commutativity of Selection and Theta join (or Cartesian product).


If the selection predicate involves only attributes of one of the relations being joined,then
the Selection and Join (or Cartesian product) operations commute

Alternatively, if the selection predicate is a conjunctive predicate of the form (p q),


where p involves only attributes of R, and q involves only attributes of S, thenthe
Selection and Theta join operations commute as:

38

(7) Commutativity of Projection and Theta join (or Cartesian product).


If the projection list is of the form L=L1L2, where L1involves only attributes of R,and
L2involves only attributes of S, then provided the join condition only containsattributes
of L, the Projection and Theta join operations commute as:

(8) Commutativity of Union and Intersection (but not Set difference).

(9) Commutativity of Selection and set operations (Union, Intersection, and Set difference).

(10) Commutativity of Projection and Union.


(11) Associativity of Theta join (and Cartesian product).

(12) Associativity of Union and Intersection (but not Set difference).

39

TRANSACTION PROCESSING:
Definition for transaction:
An action, or series of actions, carried out by a single user or application program,
which reads or updates the contents of thedatabase.
A transaction is a logical unit of work on the database. It may be an entire program, a part
of a program, or a single command (for example, the SQL command INSERT orUPDATE), and
it may involve any number of operations on the database.
Example:

If all these updates are not made, referential integrity will be lost and the database will be in
an inconsistent state: a property will be managed by a member of staff who no longer
exists in the database.
A transaction should always transform the database from one consistent state to another,
although we accept that consistency may be violated while the transaction is in progress.

40

There are two other states:


o PARTIALLY COMMITTED, which occurs after the final statement has been
executed.At this point, it may be found that the transaction has violated
serializability or has violated an integrity constraint and the transaction has to be
aborted. Alternatively, the system may fail and any data updated by the transaction
may not have been safely recorded on secondary storage. In such cases, the
transaction would go into the FAILED state and would have to be aborted. If the
transaction has been successful,any updates can be safely recorded and the
transaction can go to the COMMITTED state.
o FAILED, which occurs if the transaction cannot be committed or the transaction is
aborted while in the ACTIVE state, perhaps due to the user aborting the transaction or
as a result of the concurrency control protocol aborting the transaction to ensure
serializability.
Properties of Transactions:
o Atomicity
The all or nothing property. A transaction is an indivisible unit that is either
performed in its entirety or is not performed at all. It is the responsibility of
the recovery subsystem of the DBMS to ensure atomicity.
o Consistency
A transaction must transform the database from one consistent state to another
consistent state. It is the responsibility of both the DBMS and the application
developers to ensure consistency. The DBMS can ensure consistency by
enforcing allthe constraints that have been specified on the database schema,
such as integrity and enterprise constraints.
o Isolation
Transactions execute independently of one another. In other words, the partial
effects of incomplete transactions should not be visible to other transactions. It
is the responsibility of the concurrency control subsystem to ensure isolation.
o Durability
The effects of a successfully completed (committed) transaction are
permanently recorded in the database and must not be lost because of a
subsequent failure. It is the responsibility of the recovery subsystem to ensure
durability.
CONCURRENCY CONTROL:
The process of managing simultaneous operations on the database without having
them interfere with one another is known as concurrency control.
The Need for Concurrency Control:
A major objective in developing a database is to enable many users to access shared data
concurrently. Concurrent access is relatively easy if all users are only reading data, as there
is no way that they can interfere with one another.
However, when two or more users are accessing the database simultaneously and at
least one is updating data, there may be interference that can result in inconsistencies.
Three examples of potential problems caused by concurrency:

41

o The Lost Update Problem,

o The Uncommitted Dependency Problem

o The Inconsistent Analysis Problem

Serializability and Recoverability:


The objective of a concurrency control protocol is to schedule transactions in such a way asto
avoid any interference between them, and hence prevent the types of problem.
o solution is to allow only one transaction to execute at a time: one transaction is
committed before the next transaction is allowed to begin.
Definitions
o A sequence of the operations by a set of concurrent transactions that preserves the
order of the operations in each of the individual transactions is called as schedule.
o Serial scheduleA schedule where the operations of each transaction are
executedconsecutively without any interleaved operations from other transactions.
o Nonserial schedule A schedule where the operations from a set of concurrent
transactions are interleaved.
Serilaizability:

42

o The objective of serializability is to find nonserial schedules that allow transactions


to execute concurrently without interfering with one another, and thereby produce a
database state that could be produced by a serial execution.
o If a set of transactions executes concurrently, we say that the (nonserial) schedule is
correct if it produces the same results as some serial execution. Such a schedule is
called serializable.
o To prevent inconsistency from transactions interfering with one another, it is essential
to guarantee serializability of concurrent transactions. In serializability, the ordering
of read and write operations is important:
If two transactions only read a data item, they do not conflict and order is not
important.
If two transactions either read or write completely separate data items, they do
not conflict and order is not important.
If one transaction writes a data item and another either reads or writes the
same data item, the order of execution is important.

a) non serial schedule S1


b) non serial schedule S2
c)serial schedule s3, equivalent to s1 and s2
o Schedule S3 is a serial schedule and, since S1 and S2 are equivalent to S3, S1 and S2
areserializable schedules.This type of serializability is known as conflict
serializability. A conflict serializableschedule orders any conflicting operations in the
same way as some serial execution.
o Testing for conflict serializability
Under the constrained write rule (that is, a transaction updates a data item
based on its old value, which is first read by the transaction), a precedence (or
serialization) graph can be produced to test for conflict serializability. For a
schedule S, a precedence graph is a directed graph G = (N, E) that consists of a
set of nodes N and a set of directed edges E, which is constructed as follows:
Create a node for each transaction.

43

Create a directed edge Ti Tj, if Tj reads the value of an item written


by Ti.
Create a directed edge TiTj, if Tj writes a value into an item after it
has been read by Ti.
Create a directed edge Ti Tj, if Tj writes a value into an item after it
has been writtenby Ti.
If an edge Ti Tj exists in the precedence graph for S, then in any serial
schedule S equivalent to S, Ti must appear before Tj. If the precedence graph
contains a cycle the schedule is not conflict serializable.
o View serializability
There are several other types of serializability that offer less stringent
definitions of schedule equivalence than that offered by conflict serializability.
One less restrictive definition is called view serializability. Two schedules S1
and S2 consisting of the same operations from n transactions T1, T2, . . . , Tn
are view equivalent if the following three conditions hold:
For each data item x, if transaction Ti reads the initial value of x in
schedule S1, thetransaction Ti must also read the initial value of x in
schedule S2.
For each read operation on data item x by transaction Ti in schedule S1,
if the value read by x has been written by transaction Tj, then
transaction Ti must also read the value of x produced by transaction Tj
in schedule S2.
For each data item x, if the last write operation on x was performed by
transaction Ti in schedule S1, the same transaction must perform the
final write on data item x in schedule S2.
A schedule is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is view serializable, although the converse
is not true.
Recoverability
o Serializability identifies schedules that maintain the consistency of the database,
assuming that none of the transactions in the schedule fails.
o An alternative perspective examines the recoverability of transactions within a
schedule. If a transaction fails, the atomicity property requires that we undo the
effects of the transaction.
o In addition, the durability property statesthat once a transaction commits, its
changes cannot be undone.
o Concurrency control techniques
Serializability can be achieved in several ways. There are two main
concurrency control techniques that allow transactions to execute safely in
parallel subject to certain constraints: locking and timestamp methods.
Locking and timestamping are essentially conservative (or pessimistic)
approaches in that they cause transactions to be delayed in case they conflict
with other transactions at some time in the future.
Locking Methods

44

A procedure used to control concurrent access to data. When one


transaction is accessing the database, a lock may deny access to other
transactions to prevent incorrect results.
Locking methods are the most widely used approach to ensure
serializability of concurrent transactions. Transaction must claim a
shared (read) or exclusive (write) lock on a data item before the
corresponding database read or write operation.
The lock prevents another transaction from modifying the item or
even reading it, in the case of an exclusive lock.
Data items of various sizes, ranging from the entire database down to
a field, maybe locked. The size of the item determines the fineness, or
granularity, of the lock.
o Shared lock If a transaction has a shared lock on a data item,
it can read the item but not update it.
o Exclusive lock If a transaction has an exclusive lock on a data
item, it can both read and update the item.
Locksare used in the following way:
Any transaction that needs to access a data item must first
lock the item, requesting ashared lock for read only access or
an exclusive lock for both read and writes access.
If the item is not already locked by another transaction, the
lock will be granted.
If the item is currently locked, the DBMS determines
whether the request is compatible with the existing lock. If
a shared lock is requested on an item that already has a shared
lock on it, the request will be granted; otherwise, the
transaction must wait until the existing lock is released.
A transaction continues to hold a lock until it explicitly
releases it either during execution.
Two-phase locking (2PL)
o 2PL is defined as a transaction follows the two-phase locking protocol if all locking
operationsprecede the first unlock operation in the transaction.
o According to the rules of this protocol, every transaction can be divided into two
phases:
first a growing phase, in which it acquires all the locks needed but cannot
release any locks, and
then a shrinking phase, in which it releases its locks but cannot acquire any
new locks. There is no requirement that all locks be obtained simultaneously.
o The technique of locking a child node and releasing the lock on the parent node if
possible is known as lock-coupling or crabbing.
o Latches
DBMSs also support another type of lock called a latch, which is held for a
much shorter duration than a normal lock.

45

A latch can be used before a page is read from, or written to, disk to ensure
that the operation is atomic.
For example, a latch would be obtained to write a page from the database
buffers to disk, the page would then be written to disk, and thelatch
immediately unset.

Deadlock
o An impasse that may result when two (or more) transactions are eachwaiting for
locks to be released that are held by the other.

o There are three general techniques for handling deadlock: timeouts, deadlock
prevention, and deadlock detection and recovery.
With timeouts, the transaction that has requested a lock waits for at most a
specified period of time.
Using deadlock prevention, the DBMS looks ahead to determine if a
transaction would cause deadlock, and never allows deadlock to occur.
Using deadlock detection and recovery, the DBMS allows deadlock to
occur but recognizes occurrences of deadlock and breaks them.
Timestamping Methods
o Different approach that also guarantees serializability uses transaction timestamps to
order transactionexecution for an equivalent serial schedule.
o Timestamp methods for concurrency control are quite different from locking
methods. No locks are involved, and therefore there can be no deadlock.
o Timestamp A unique identifier created by the DBMS that indicates the relative
starting time of a transaction.
o Timestamping A concurrency control protocol that orders transactions in such a
way that older transactions, transactions with smaller timestamps,get priority in the
event of conflict.
DATABASE RECOVERY:
The process of restoring the database to a correct state in the event of a failure is called
database recovery.

46

The Need for Recovery

The storage of data generally includes four different types of media with an increasing degree
of reliability: main memory, magnetic disk, magnetic tape, and optical disk.
There are many different types of failure that can affect database processing, each of which
has to be dealt with in a different manner. Some failures affect main memory only, while
others involve non-volatile (secondary) storage. Among the causes of failure are:
System crashes due to hardware or software errors, resulting in loss of main memory;
Media failures, such as head crashes or unreadable media, resulting in the loss of parts of
secondary storage;
Application software errors, such as logical errors in the program that is accessing the
database, which cause one or more transactions to fail;
Natural physical disasters, such as fires, floods, earthquakes, or power failures;
Carelessness or unintentional destruction of data or facilities by operators or users;
Sabotage, or intentional corruption or destruction of data, hardware, or software facilities.

Transactions and Recovery

Transactions represent the basic unit of recovery in a database system.


o It is the role of the recovery manager to guarantee two of the four ACID properties of
transactions, namely atomicity and durability, in the presence of failures.
The recovery manager has to ensure that, on recovery from failure, either all the effects of a
given transaction are permanently recorded in the database or none of them are.
The database buffers occupy an area in main memory from which data is transferred to and
from secondary storage.
It is only once the buffers have been flushed to secondary storage that any update operations
can be regarded as permanent.
This flushing of the buffers to the database can be triggered by a specific command (for
example, transaction commit) or automatically when the buffers become full.
The explicit writing of the buffers to secondary storage is known as force-writing.
If a failure occurs between writing to the buffers and flushing the buffers to secondary
storage, the recovery manager must determine the status of the transaction that
performed the write at the time of failure.
If the transaction had issued its commit, then to ensure durability the recovery manager
would have to redo that transactions updates to the database (also known as rollforward).
On the other hand, if the transaction had not committed at the time of failure, then the
recovery manager would have to undo (rollback) any effects of that transaction on the
database to guarantee transaction atomicity.
If only one transaction has to be undone, this is referred to as partial undo. A partial undo
can be triggered by the scheduler when a transaction is rolled back and restarted as a result of
the concurrency control protocol, as described in the previous section.
A transaction can also be aborted unilaterally, for example, by the user or by an exception
condition in the application program. When all active transactions have to be undone, this is
referred to as global undo.

47

The following terminology is used in database recovery when pages are written back to disk:
o A steal policy allows the buffer manager to write a buffer to disk before a transaction
commits (the buffer is unpinned). In other words, the buffer manages steals a page
from the transaction. The alternative policy is no-steal.
o A force policy ensures that all pages updated by a transaction are immediately written
to disk when the transaction commits. The alternative policy is no-force.
Recovery Facilities
A DBMS should provide the following facilities to assist with recovery:
o A backup mechanism, which makes periodic backup copies of the database;
o Logging facilities, which keep track of the current state of transactions and
databasechanges;
o A checkpoint facility, which enables updates to the database that are in progress to be
made permanent;
o A recovery manager, which allows the system to restore the database to a consistent state
following a failure.
Backup mechanism
The DBMS should provide a mechanism to allow backup copies of the database and the log
file to be made at regular intervals without necessarily having to stop the system first.
The backup copy of the database can be used in the event that the database has been damaged
or destroyed.
A backup can be a complete copy of the entire database or an incremental backup, consisting
only of modifications made since the last complete or incremental backup.
Typically, the backup is stored on offline storage, such as magnetic tape.
Log file
To keep track of database transactions, the DBMS maintains a special file called a log (or
journal) that contains information about all updates to the database. The log may containthe
following data:
o Transaction records, containing:
Transaction identifier;
Type of log record (transaction start, insert, update, delete, abort, commit);
Identifier of data item affected by the database action (insert, delete, and
updateoperations);
Before-image of the data item, that is, its value before change (update and
deleteoperations only);
After-image of the data item, that is, its value after change (insert and update
operation only);
Log management information, such as a pointer to previous and next log records
for that transaction (all operations).
Checkpointing
o The point of synchronization between the database and the transactionlog file.
o All buffers are force-written to secondary storage.
o Checkpoints are scheduled at predetermined intervals and involve the
followingoperations:

48

writing all log records in main memory to secondary storage;


writing the modified blocks in the database buffers to secondary storage;
writing a checkpoint record to the log file. This record contains the identifiers
of alltransactions that are active at the time of the checkpoint.
Recovery Techniques:
o There are Two cases:
If the database has been extensively damaged, for example a disk head
crash has occurred and destroyed the database, and then it is necessary to
restore the last backup copy of the database and reapply the update
operations of committed transactions using the log file. This reduces the
risk of both the database files and the log file being damaged at the same
time.
If the database has not been physically damaged but has become
inconsistent, for example the system crashed while transactions were
executing, then it is necessary toundo the changes that caused the
inconsistency. It may also be necessary to redo sometransactions to ensure
that the updates they performed have reached secondary storage.Here, we
do not need to use the backup copy of the database but can restore the
databaseto a consistent state using the before- and after-images held in the
log file.
o Recovery techniques using deferred update
Using the deferred update recovery protocol, updates are not written to the
database until after a transaction has reached its commit point. If a
transaction fails before it reaches this point, it will not have modified the
database and so no undoing of changes will be necessary. However, it may be
necessary to redo the updates of committed transactions as their effect
may not have reached the database.
o Recovery techniques using immediate update
Using the immediate update recovery protocol, updates are applied to the
database as they occur without waiting to reach the commit point. As well
as having to redo the updates of committed transactions following a failure, it
may now be necessary to undo the effects of transactions that had not
committed at the time of failure.
o Shadow paging
Shadow paging scheme maintains two-page tables during the life of a
transaction: a current page table and a shadow page table.
When the transaction starts, the two-page tables are the same. The shadow
page table is never changed thereafter, and is used to restore the database
in the event of a system failure.
During the transaction, the current page table is used to record all updates
to the database. When the transaction completes, the current page table
becomes the shadow page table.

49

DATABASE TUNING:
Database Tuning is the activity of making a database application run more quickly. More
quickly usually means higher throughput, though it may mean lower response time for timecritical applications.

Goal:
To make application run faster
To lower the response time of queries/transactions
To improve the overall throughput of transactions
There are two types of Tuning
Schema tuning:
A relation schema is a relation name and a set of attributes.
Schema tuning is the activity of organizing a set of table designs in order
to improve overall query and update performance.
Index tuning:
Index Tuning is concerned with when and how to construct an index.
Two data structures are most often used in practice for indexes: B+-trees
and Hash structure
Tuning Indexes
Reasons to tuning indexes
Certain queries may take too long to run for lack of an index;
Certain indexes may not get utilized at all;

50

Certain indexes may be causing excessive overhead because the index is


on an attribute that undergoes frequent changes
Options to tuning indexes
Drop or/and build new indexes
Change a non-clustered index to a clustered index (and vice versa)
Rebuilding the index
Tuning the Database Design
Dynamically changed processing requirements need to be addressed by
making changes to the conceptual schema if necessary and to reflect those
changes into the logical schema and physical design.
Possible changes to the database design
Existing tables may be joined (denormalized) because certain attributes
from two or more tables are frequently needed together.
For the given set of tables, there may be alternative design choices, all of
which achieve 3NF or BCNF. One may be replaced by the other.
A relation of the form R(K, A, B, C, D, ) that is in BCNF can be stored
into multiple tables that are also in BCNF by replicating the key K in each
table.
Attribute(s) from one table may be repeated in another even though this
creates redundancy and potential anomalies.
Apply horizontal partitioning as well as vertical partitioning if
necessary.
Indications for tuning queries
A query issues too many disk accesses
The query plan shows that relevant indexes are not being used.
Additional Query Tuning Guidelines
A query with multiple selection conditions that are connected via OR may
not be prompting the query optimizer to use any index.
Such a query may be split up and expressed as a union of queries, each
with a condition on an attribute that causes an index to be used.
Apply the following transformations
NOT condition may be transformed into a positive expression.
Embedded SELECT blocks may be replaced by joins.
If an equality join is set up between two tables, the range predicate on the
joining attribute set up in one table may be repeated for the other table
WHERE conditions may be rewritten to utilize the indexes on multiple
columns.

You might also like