Professional Documents
Culture Documents
Unit 1completed
Unit 1completed
Unit 1completed
UNIT 1
RELATIONAL MODEL ISSUES
ER Model - Normalization Query Processing Query Optimization - Transaction Processing Concurrency Control Recovery - Database Tuning
ER MODEL
Relationship types
o A relationship typeis a set of associations between one or more participating entity
types.
o Each relationship type is given a name that describes its function.
o Example of a relationship type:POwns, which associates thePrivateOwnerand
PropertyForRententities.
o A relationship occurrenceindicates the particular entity occurrences that are related.
o Consider a relationship type called Has, which represents an association between
Branch and Staff entities that is Branch HasStaff. Each occurrence of the Has
relationship associatesone Branch entity occurrence with one Staff entity occurrence.
o A semantic net is an object-level model, which uses the symbol to represent entities
and the symbol
to represent relationships.
o The ER model uses a higher level of abstraction than the semantic net by combining
sets of entity occurrences into entity types and sets of relationship occurrences into
relationship types.
o Diagrammatic representation of relationships types
Each relationship type is shown as a line connecting the associated entity
types, labeled with the name of the relationship. Normally, a relationship is
named using a verb (for example, Supervises or Manages) or a short phrase
including a verb (for example, LeasedBy).
Again, the first letter of each word in the relationship name is shown in upper
case. Whenever possible, a relationship name should be unique for a given
ER model.
A relationship is only labeled in one direction, which normally means that
the name of the relationship only makes sense in one direction (for example,
Branch HasStaff makes more sense than Staff HasBranch).
So once the relationship name is chosen, an arrow symbol is placed beside
the name indicating the correct direction for a reader to interpret
therelationship name (for example, Branch Has Staff).
o Degree of Relationship Type
o Recursive Relationship
A relationship type where the same entity type participates more than
once in different roles.
Attributes
o The particular properties of entity types are called attributes.
Example:
a Staff entity type may be described by the staffNo, name,
position, and salary attributes.
o The attributes hold values that describe each entity occurrence and represent the
main part of the data stored in the database.
o Each attribute is associated with a set of values called a domain.
o The domain defines the potential values that an attribute may hold and is similar
to the domain concept in the relational model
Example: the number of rooms associated with a property is between 1 and 15
for each entity occurrence.
define the set of values for the number of rooms (rooms) attribute of the
PropertyForRent entity type as the set of integers between 1 and 15.
o Attributes may share a domain. For example, the address attributes of the Branch,
PrivateOwner, and BusinessOwner entity types share the same domain of all possible
addresses. Domains can also be composed of domains.
Example: the domain for theaddressattribute of the Branch entity is made up of
sub-domains: street, city, and postcode.
o Attributes can be classified as:
Simple Attribute
o Simple attributes cannot be further subdivided into smaller
components.
o Simple attributes are sometimes called atomic attributes.
Examples: position and salaryof the Staffentity
Composite;
o Some attributes can be further divided to yield smaller
componentswith an independent existence of their own.
Examples:address attribute as a simple attribute or to subdivide the
attribute into street, city, and postcode
Single Valued
o An attribute that holds a single value for each occurrence of an entity
type
Example: each occurrence of the Branch entity type has a single value for
the branch number (branchNo) attribute (for example B003), and therefore the
branchNoattribute is referred to as being single-valued.
Multi-Valued
o An attribute that holds multiple values for each occurrence of an
entity type
Example: each occurrence of the Branch entity type can have multiple
values for the telNo attribute (for example, branch number B003 has telephone
numbers 0141-339-2178 and 0141-339-4439) and therefore the telNo attribute in
this case is multi-valued.
o A multi-valued attribute may have a set of numbers with upper and
lower limits.
Example: a branch may have aminimum of a single telephone number to
a maximum of three telephone numbers.
Derived
o An attribute that represents a value that is derivable from the value
of a related attribute or set of attributes, not necessarily in the same
entity type.
Example:
o The value for the duration attribute of the Lease entity is calculated
from the rentStart and rentFinish attributes also of the Lease entity
type. We refer to the duration attribute as a derived attribute, the
value of which is derived from the rentStart and rentFinish attributes.
Derived attributes may also involve the association of attributes of different
entity types.
Example:
o Consider an attribute called deposit of the Lease entity type. The
value of the deposit attribute is calculated as twice the monthly rent
for a property. Therefore, the value of the deposit attribute of the L
PropertyForRententity type.
o Keys
Candidate Key:The minimal set of attributes that uniquely identifies each
occurrence of an entity type.
o A candidate key is the minimal number of attributes, whose
value(s) uniquely identify each entity occurrence.
o Example:
The branch number (branchNo) attribute is the candidate key
for the Branch entity type, and has a distinct value for each
branch entity occurrence.
o The candidate key must hold values that are unique for every
occurrence of an entity type.
o This implies that a candidate key cannot contain a null.
o Example:
Each branch has a unique branch number (for example, B003),
and there will never bemore than one branch with the same
branch number.
Primary Key: The candidate key that is selected to uniquely identify each
occurrence of an entity type.An entity type may have more than one candidate
key.
o Example:
Consider that a member of staff has a unique companydefined staff number (staffNo) and also a unique National
Insurance Number (NIN) that is used by the Government.
We therefore have two candidate keys for the Staff entity, one
of which must be selected as the primary key.
we select staffNo as the primary key of the Staff entity type
and NIN is then referred to as the alternate key.
Composite Key: A candidate key that consists of two or more attributes.
o Example:
Consider an entity called Advert with propertyNo(property
number), newspaperName, dateAdvert, and cost attributes.
Many properties are advertised in many newspapers on a given
date.
To uniquely identify each occurrence of the Advert entity type
requires values for the propertyNo, newspaperNameand
dateAdvertattributes.
Thus, the Advert entity type has a composite primary key
made up of the propertyNo, newspaperName, and dateAdvert
attributes.
The first attribute(s) to be listed is the primary key for the entity type, if
known.
Thename(s)
name(s) of the primary key attribute(s) can be labeled with the tag {PK}.
Additional tags that can be used include partial primary key {PPK} when an
attribute forms part of a composite primary key, and alternate key {AK}.
For simple, single-valued
single
attributes, there is no need to use tags and so we
simply display the attribute names
n
in a list below the entity name.
For composite attributes, we list the name of the composite attribute
followed below and indented to the right by the namesof its simple component
attributes.
o For example, the composite attribute address of the Branc
Branch entity is
shown followed below by the names of its component attributes, street,
city, and postcode.
For multi-valued
valued attributes,
attributes we label the attribute name with an indication of
the range of values available for the attribute.
attribute
o For example, if we label thetelNo
telNo attribute with the range [1..*], this
means that the values for the telNo attribute is one or more.
If we know the precise maximum number of values,, we can label the
attribute with an exact range.
o For example, if the telNo attribute holds one to
to a maximum of three
values, we can label the attribute with [1..3].
For derived attributes,
attributes we prefix the attribute name with a /.
For example, the derivedattribute of the Staffentity type as /totalStaff.
Strong and weak entity types
o We can classify entity types as strong or weak
o Ensuring that all appropriate constraints are identified and represented is an important
part of modeling an enterprise.
o Binary relationships are generally referred to as being one-to-one (1:1), one-to-many
(1:*), ormany-to-many (*:*).
o We examine these three types of relationships using the following integrity
constraints:
a member of staff manages a branch (1:1)
a member of staff oversees properties for rent (1:*)
newspapers advertise properties for rent (*:*)
o One-to-One (1:1) Relationships
Each relationship (rn) represents the association between a single Staff entity
occurrence and a single Branch entity occurrence.
10
11
12
o Participation
Determines whether all or only some entity occurrences participatein a
relationship.
The participation constraint represents whether all entity occurrences are
involved in a particular relationship (referred to as mandatory participation)
or only some (referred to as optional participation).
The participation of entities in a relationship appears as the minimum values
for the multiplicity ranges on either side of the relationship.
Optional participation is represented as a minimum value of 0 while
mandatory participation is shown asa minimum value of 1.
Problems with ER Models
o The problems in ER Models are referred to as connection traps, and normally occur
due to a misinterpretation of the meaning of certain relationships.
o Two main types of connection traps:
Fan traps
Chasm traps
o Fan traps
Where a model represents a relationship between entity types, but the pathway
between certain entity occurrences is ambiguous.
A fan trap may exist where two or more 1:* relationships fan outfrom the
same entity.
A potential fan trap shows two 1:* relationships (Has and Operates)
emanating from the same entity called Division.
This model represents the facts that a single division operates one or more
branches andhas one or more staff.
However, a problem arises when we want to know which membersof staff
work at a particular branch.
13
Fan trap
o Chasm Traps
Where a model suggests the existence of a relationship between entity types,
but the pathway does not exist between certain entity occurrences.
Chasm trap
A chasm trap may occur where there are one or more relationships with a
minimum multiplicity of zero (that is optional participation) forming part of
the pathway between related entities.
Removing the chasm trap:
14
Enhanced ER Model
o As the basic concepts of ER modeling are often not sufficient to represent the
requirements of the newer, more complex applications, this stimulated the need to
develop additional semantic modeling concepts.
o Many differentsemantic data models have been proposed and some of the most
important semantic concepts have been successfully incorporated into the original ER
model.
o The ER model supported with additional semantic concepts is called the Enhanced
EntityRelationship(EER) model.
o There are three important and useful additional concepts of the EER model, namely
Specialization/Generalization
Aggregation
Composition
o Specialization/Generalization:
The concept of specialization/generalization is associated with special types of
entities known as super-classes and subclasses, and the process of attribute
inheritance.
15
Attribute Inheritance
o A subclass is an entity in its own right and so it may also have one or
more subclasses.
o An entity and its subclasses and their subclasses, and so on, is called a
type hierarchy. Type hierarchies are known by a variety of names
including:
16
17
Constraints on Specialization/Generalization
o There are two constraints that may apply to a specialization/generalization
called participation constraints and disjoint constraints.
o Participation constraints
Determines whether every member in the superclass must participate
as a member of a subclass.
A participation constraint may be mandatoryor optional.
A superclass/subclass relationship with mandatory participation
specifies that every member in the superclass must also be a member
of a subclass.
To represent mandatory participation, Mandatory is placed in curly
brackets below the triangle that points towards the superclass.
A superclass/subclass relationship with optional participation specifies
that a member of a superclass need not belong to any of its subclasses.
To represent optional participation,Optional is placed in curly
brackets below the triangle that points towards the superclass.
o Disjoint constraints
Describes the relationship between members of the subclasses and
indicates whether it is possible for a member of a superclass to be a
member of one, or more than one, subclass.
disjoint constraint only applies when a superclass has more than one
subclass. If the subclasses are disjoint, then an entity occurrence can
be a member of only one of the subclasses.
To represent a disjoint superclass/subclass relationship, Or is placed
next to the participation constraint within the curly brackets.
If subclasses of a specialization/generalization are not disjoint (called
nondisjoint), then an entity occurrence may be a member of more than
one subclass. To represent a nondisjoint superclass/subclass
relationship, And is placed next to the participation constraint within
the curly brackets.
18
o Aggregation:
Represents a has-a or is-part-of relationship between entity types, where one
represents the whole and the other the part.
A relationship represents an association between two entity types that are
conceptually at the same level.
Sometimes we want to model a has-a or is-part-of relationship, in which one
entity represents a larger entity (the whole), consisting of smaller entities
(theparts). This special kind of relationship is called an aggregation (Booch et al.,
1998).
Aggregation does not change the meaning of navigation across the relationship
between the whole and its parts, nor does it link the lifetimes of the whole and its
parts.
19
An example of an aggregation is the Has relationship, which relates the Branch entity
(the whole) to the Staff entity (the part).
Diagrammatic representation of aggregation
o Composition
A specific form of aggregation that represents an association between entities,
where there is a strong ownership and coincidental lifetime between the whole
and the part.
There is a variation of aggregation called composition that represents a strong
ownership and coincidental lifetime between the whole and the part (Booch et
al., 1998).
In a composite, the whole is responsible for the disposition of the parts,
whichmeans that the composition must manage the creation and destruction of its
parts.
20
NORMALIZATION
21
Approach 2 shows how normalization canbe used as a validation technique to check the structure
of relations, which may have beencreated using a top-down approach such as ER modeling.
22
o If we delete a tuple from the StaffBranch relation that represents the last member
of staff located at a branch, the details about that branch are also lost from the
database.
o For example, if we delete the tuple for staff number SA9 (Mary Howe) from the
Staff Branchrelation, the details relating to branch number B007 are lost from the
database.
Modification Anomalies
o If we want to change the value of one of the attributes of a particular branch in the
StaffBranch relation, for example the address for branch number B003, we must
update the tuples of all staff located at that branch.
o If this modification is not carried out on all the appropriate tuples of the
StaffBranch relation, the database will become inconsistent.
Functional Dependencies:
o An important concept associated with normalization is functional dependency,
which describes the relationship between attributes.
o Definition: Describes the relationship between attributes in a relation. For
dependency example, if A and B are attributes of relation R, B is functionally
dependent on A (denoted A B), if each value of A is associated with exactly
one value of B. (A and B may each consist of one or more attributes.)
o Characteristics of Functional Dependencies
Assume that a relational schema has attributes (A, B, C,..., Z) and that the
database is described by a single universal relation called R= (A, B, C,..., Z).
This assumption means that every attribute in the database has a unique
name.
o Functional dependency is a property of the meaning or semantics of the
attributes in a relation.
The semantics indicate how attributes relate to one another, and specify
the functional dependencies between attributes.
When a functional dependency is present, the dependency is specified as a
constraint between the attributes.
o Consider a relation with attributes A and B, where attribute B is functionally
dependent on attribute A. If we know the value of Aand we examine the relation
that holds this dependency, we find only one value of Bin all the tuples that have
a given value of A, at any moment in time. Thus, when two tuples have the same
value of A, they also have the same value of B. However, for a given value of B
there may be several different values of A.
23
24
25
By placing the repeating data, along with a copy of the original key
attribute(s), in a separate relation. Sometimes the unnormalized table
may contain more than one repeating group, or repeating groups within
repeating groups. In such cases, this approach is applied repeatedly until no
repeating groups remain. A set of relations is in 1NF if it contains no
repeating groups.
For both approaches, the resulting tables are now referred to as 1NF relations
containing atomic (or single) values at the intersection of each row and column.
Although both approaches are correct, approach 1introduces more redundancy
into the original UNF table as part of the flattening process, whereas approach
2 creates two or more relations with less redundancy than in the original UNF
table.
For example, there are two values for propertyNo(PG4 and PG16) for the
clientnamed John Kay. To transform an unnormalized table into 1NF, we ensure that
there is asingle value at the intersection of each row and column. This is
achieved by removing the repeating group.
With the first approach, we remove the repeating group (property rented details) by
entering the appropriate client data into each row. The resulting first normal form
ClientRental relation is shown.
First Normal Form :
26
27
fd4
ownerNooName (Transitive dependency)
fd5
clientNo, rentStart propertyNo, pAddress,rentFinish, rent, ownerNo,
oName (Candidate key)
fd6
propertyNo, rentStartclientNo, cName, rentFinish (Candidate key)
This results in the creation of three new relations called Client, Rental, and
PropertyOwner
These three relations are in Second Normal Form as every non primary-key attribute is
fully functionally dependent on the primary key of the relation.
The relations have the following form:
Client (clientNo, cName)
Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
28
Rental
fd1
clientNo, propertyNorentStart, rentFinish (Primary key)
fd5
clientNo, rentStart propertyNo, rentFinish (Candidate key)
fd6
propertyNo, rentStart clientNo, rentFinish (Candidate key)
PropertyOwner
fd3
propertyNopAddress, rent, ownerNo, oName (Primary key)
fd4
ownerNooName (Transitive dependency)
All the non-primary-key attributes within the Client and Rental relations are functionally
dependent on only their primary keys. The Client and Rental relations have no transitive
dependencies and are therefore already in 3NF.
To transform the PropertyOwner relation into 3NF we must first remove this transitive
dependency by creating two new relations called PropertyForRent and Owner, as shown in
Figure 13.15. The new relations have the form:
PropertyForRent (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)
The PropertyForRentand Ownerrelations are in 3NF as there are no further transitive
dependencies on the primary key.
3NF derived from the PropertyOwner relation
29
30
31
32
QUERY PROCESSING:
The activities involved in parsing, validating, optimizing, and executing a query is known as
query processing.
The aims of query processing are to transform a query written in a high-level language,
typically SQL, into a correct and efficient execution strategy expressed in a low-level
language (implementing the relational algebra), and to execute the strategy to retrieve the
required data.
The activity of choosing an efficient execution strategy for processing a query is known as
query optimization.
An important aspect of query processing is query optimization.
o As there are many equivalent transformations of the same high-level query, the aim of
query optimization is to choose the one that minimizes resource usage.
o Generally, we try to reduce the total execution time of the query, which is the sum of
the execution times of all individual operations that make up the query.
o However, resource usage may also be viewed as the response time of the query, in
which case we concentrate on maximizing the number of parallel operations.
Compare different processing strategies
Find all managers who work at a London branch
SELECT * FROM staff s, Branch b WHERE s.branchNo=b.branchNo AND
(s.position=Manager AND b.city=London)
33
For the purposes of this example, we assume that there are 1000 tuples in
Staff, 50 tuples in Branch, 50 Managers (one for each branch), and 5 London
branches.
We compare these three queries based on the number of disk accesses
required.
For simplicity, we assume that there are no indexes or sort keys on either
relation, and that the results of any intermediate operations are stored on
disk.
We further assume that tuples are accessed one at a time and main memory
is large enough to process entire relations for each relational algebra
operation.
o The first query calculates the Cartesian product of Staff and Branch, which requires
(1000 +50) disk accesses to read the relations, and creates a relation with (1000 * 50)
tuples. We then have to read each of these tuples again to test them against the selection
predicate at a cost of another (1000 * 50) disk accesses, giving a total cost of:
(1000 +50) +2*(1000 * 50) =101 050 disk accesses
o The second query joins Staff and Branch on the branch number branchNo, which again
requires (1000 +50) disk accesses to read each of the relations. We know that the join of
the two relations has 1000 tuples, one for each member of staff (a member of staff can
only work at one branch). Consequently, the Selection operation requires 1000 disk
accesses to read the result of the join, giving a total cost of:
2*1000 +(1000 +50) =3050 disk accesses
o The final query first reads each Staff tuple to determine the Manager tuples, which
requires 1000 disk accesses and produces a relation with 50 tuples. The second Selection
operation reads each Branchtuple to determine the London branches, which requires 50
disk accesses and produces a relation with 5 tuples. The final operation is the join of the
reduced Staffand Branchrelations, which requires (50 +5) disk accesses, giving a total
cost of:
1000 +2*50 +5 +(50 +5) =1160 disk accesses
Phases of Query Processing:
34
35
Advantage:
Runtime overhead is removed.
More time available to evaluate a larger number of execution
strategies
Increases the chances of finding a more optimum strategy
Disadvantages:
The execution strategy that is chosen as being optimal when the
query is compiled may no longer be optimal when the query is run.
o Query Decomposition
Query decomposition is the first phase of query processing.
Aims
To transform a high-level query into a relational algebra query, and to
check that the query is syntactically and semantically correct.
The typical stages of query decomposition are analysis, normalization, semantic
analysis, simplification, and query restructuring.
Analysis
In this stage, the query is lexically and syntactically analyzed using the
techniques of programming language compilers.
Also verifies that the relations and attributes specified in the query are
defined in the system catalog.
It also verifies that any operations applied to database objects are
appropriate for the object type.
For example, consider the following query:
SELECTstaffNumberFROMStaffWHEREposition>10;
This query would be rejected on two grounds:
(1) In the select list, the attribute staffNumberis not defined for the
Staffrelation (should bestaffNo).
(2) In the WHERE clause, the comparison >10 is incompatible
with the data type position, which is a variable character string.
On completion of this stage, the high-level query has been transformed
into some internal representation that is more suitable for
processing. The internal form that is typicallychosen is some kind of
query tree, which is constructed as follows:
A leaf node is created for each base relation in the query.
A non-leaf node is created for each intermediate relation produced
by a relational algebra operation.
The root of the tree represents the result of the query.
The sequence of operations is directed from the leaves to the root.
36
Normalization
The normalization stage of query processing converts the query into a
normalized form that can be more easily manipulated.
The predicate (in SQL, the WHERE condition), which may be arbitrarily
complex, can be converted into one of two forms by applying a few
transformation rules.
Conjunctive normal form A sequence of conjuncts that are connected
with the (AND) operator. Each conjunct contains one or more terms
connected by the (OR) operator.
For example:
(position=Manager salary>20000) branchNo=B003
A conjunctive selection contains only those tuples that satisfy all
conjuncts.
Disjunctive normal form A sequence of disjuncts that are connected
with the (OR) operator. Each disjunct contains one or more terms
connected by the (AND) operator.
For example, we could rewrite the a bove conjunctive normal
form as:
(position=Manager branchNo=B003 ) (salary>20000
branchNo=B003)
A disjunctive selection contains those tuples formed by the union
of all tuples that satisfy the disjuncts.
Semantic analysis
The objective of semantic analysis is to reject normalized queries that
are incorrectly formulated or contradictory. A query is incorrectly
formulated if components do not contribute to the generation of the
result, which may happen if some join specifications are missing. A
query is contradictory if its predicate cannot be satisfied by any tuple.
Forexample,
the predicate (position=Manager position=Assistant) on
the Staff relation is contradictory, as a member of staff cannot be
both a Manager and an Assistant simultaneously.
However,
the
predicate
((position=Manager
position=Assistant) salary> 20000) could be simplified to
(salary>20000) by interpreting the contradictory clauseas the
Boolean value FALSE.
Simplification
The objectives of the simplification stage are to detect redundant
qualifications, eliminate common subexpressions, and transform the
query to a semantically equivalent but more easily and efficiently
computed form.
Typically, access restrictions, view definitions, and integrity
constraints are considered at this stage, some of which may also
introduce redundancy. If the user does not have the appropriate access to
all the components of the query, the query must be rejected.
Query restructuring
In the final stage of query decomposition, the query is restructured to
provide a more efficient implementation.
37
QUERY OPTIMIZATION:
Transformation Rules for the Relational Algebra Operations:
We use three relations R, S, and T, with Rdefined over the attributes A={A1, A2,..., An},
andSdefined over B={B1, B2,..., Bn}; p, q, and r denote predicates, and L, L1, L2, M, M1, M2,
andNdenote sets of attributes.
(1) Conjunctive Selection operations can cascade into individual Selection operations (and
vice versa).
This transformation is sometimes referred to as cascade of selection. For example:
branchNo=B003 salary>15000(Staff) =branchNo=B003(salary>15000(Staff))
(2) Commutativity of Selection operations.
(3) In a sequence of Projection operations, only the last in the sequence is required.
Examples:
38
(9) Commutativity of Selection and set operations (Union, Intersection, and Set difference).
39
TRANSACTION PROCESSING:
Definition for transaction:
An action, or series of actions, carried out by a single user or application program,
which reads or updates the contents of thedatabase.
A transaction is a logical unit of work on the database. It may be an entire program, a part
of a program, or a single command (for example, the SQL command INSERT orUPDATE), and
it may involve any number of operations on the database.
Example:
If all these updates are not made, referential integrity will be lost and the database will be in
an inconsistent state: a property will be managed by a member of staff who no longer
exists in the database.
A transaction should always transform the database from one consistent state to another,
although we accept that consistency may be violated while the transaction is in progress.
40
41
42
43
44
45
A latch can be used before a page is read from, or written to, disk to ensure
that the operation is atomic.
For example, a latch would be obtained to write a page from the database
buffers to disk, the page would then be written to disk, and thelatch
immediately unset.
Deadlock
o An impasse that may result when two (or more) transactions are eachwaiting for
locks to be released that are held by the other.
o There are three general techniques for handling deadlock: timeouts, deadlock
prevention, and deadlock detection and recovery.
With timeouts, the transaction that has requested a lock waits for at most a
specified period of time.
Using deadlock prevention, the DBMS looks ahead to determine if a
transaction would cause deadlock, and never allows deadlock to occur.
Using deadlock detection and recovery, the DBMS allows deadlock to
occur but recognizes occurrences of deadlock and breaks them.
Timestamping Methods
o Different approach that also guarantees serializability uses transaction timestamps to
order transactionexecution for an equivalent serial schedule.
o Timestamp methods for concurrency control are quite different from locking
methods. No locks are involved, and therefore there can be no deadlock.
o Timestamp A unique identifier created by the DBMS that indicates the relative
starting time of a transaction.
o Timestamping A concurrency control protocol that orders transactions in such a
way that older transactions, transactions with smaller timestamps,get priority in the
event of conflict.
DATABASE RECOVERY:
The process of restoring the database to a correct state in the event of a failure is called
database recovery.
46
The storage of data generally includes four different types of media with an increasing degree
of reliability: main memory, magnetic disk, magnetic tape, and optical disk.
There are many different types of failure that can affect database processing, each of which
has to be dealt with in a different manner. Some failures affect main memory only, while
others involve non-volatile (secondary) storage. Among the causes of failure are:
System crashes due to hardware or software errors, resulting in loss of main memory;
Media failures, such as head crashes or unreadable media, resulting in the loss of parts of
secondary storage;
Application software errors, such as logical errors in the program that is accessing the
database, which cause one or more transactions to fail;
Natural physical disasters, such as fires, floods, earthquakes, or power failures;
Carelessness or unintentional destruction of data or facilities by operators or users;
Sabotage, or intentional corruption or destruction of data, hardware, or software facilities.
47
The following terminology is used in database recovery when pages are written back to disk:
o A steal policy allows the buffer manager to write a buffer to disk before a transaction
commits (the buffer is unpinned). In other words, the buffer manages steals a page
from the transaction. The alternative policy is no-steal.
o A force policy ensures that all pages updated by a transaction are immediately written
to disk when the transaction commits. The alternative policy is no-force.
Recovery Facilities
A DBMS should provide the following facilities to assist with recovery:
o A backup mechanism, which makes periodic backup copies of the database;
o Logging facilities, which keep track of the current state of transactions and
databasechanges;
o A checkpoint facility, which enables updates to the database that are in progress to be
made permanent;
o A recovery manager, which allows the system to restore the database to a consistent state
following a failure.
Backup mechanism
The DBMS should provide a mechanism to allow backup copies of the database and the log
file to be made at regular intervals without necessarily having to stop the system first.
The backup copy of the database can be used in the event that the database has been damaged
or destroyed.
A backup can be a complete copy of the entire database or an incremental backup, consisting
only of modifications made since the last complete or incremental backup.
Typically, the backup is stored on offline storage, such as magnetic tape.
Log file
To keep track of database transactions, the DBMS maintains a special file called a log (or
journal) that contains information about all updates to the database. The log may containthe
following data:
o Transaction records, containing:
Transaction identifier;
Type of log record (transaction start, insert, update, delete, abort, commit);
Identifier of data item affected by the database action (insert, delete, and
updateoperations);
Before-image of the data item, that is, its value before change (update and
deleteoperations only);
After-image of the data item, that is, its value after change (insert and update
operation only);
Log management information, such as a pointer to previous and next log records
for that transaction (all operations).
Checkpointing
o The point of synchronization between the database and the transactionlog file.
o All buffers are force-written to secondary storage.
o Checkpoints are scheduled at predetermined intervals and involve the
followingoperations:
48
49
DATABASE TUNING:
Database Tuning is the activity of making a database application run more quickly. More
quickly usually means higher throughput, though it may mean lower response time for timecritical applications.
Goal:
To make application run faster
To lower the response time of queries/transactions
To improve the overall throughput of transactions
There are two types of Tuning
Schema tuning:
A relation schema is a relation name and a set of attributes.
Schema tuning is the activity of organizing a set of table designs in order
to improve overall query and update performance.
Index tuning:
Index Tuning is concerned with when and how to construct an index.
Two data structures are most often used in practice for indexes: B+-trees
and Hash structure
Tuning Indexes
Reasons to tuning indexes
Certain queries may take too long to run for lack of an index;
Certain indexes may not get utilized at all;
50