Professional Documents
Culture Documents
Dbms 1
Dbms 1
Dbms 1
Chapter Page No
Database 3
File Systems and Associated Problems 3
Benifits of Database Approach 5
Database Mangement System 6
DBMS Functions 6
Database System 7
Functions of a Database Adminsitrator (DBA) 8
Components of DBMS 9
Data Model 10
Database Architecture 11
Schema 14
Types of Database Models 15
Evolution of RDBMS 21
What is a Relational Database? 21
What is a RDBMS? 22
Features of RDBMS 22
Basic Relational Database Terminology 22
Keys and their Use 23
Referential Integrity 24
Normalization 49
Why Normalization? 49
What is a Normal Form? 50
Types of Normal Forms 50
First Normal Form (1NF) 51
Functional Dependencies 52
Second Normal Form (2NF) 53
Transitive Dependency 55
Third Normal Form (3NF) 55
Boyce-Codd Normal Form (BCNF) 57
Multivalued Dependency 57
Fourth Normal Form (4NF) 57
Fifth Normal Form (5NF) 58
Supertype 63
Subtype 63
Inheritance 63
Relationships and Subtypes 64
Supertype/Subtype Notation 65
Generalization and Specialization 66
Constraints in Supertype 67
Constraints in Supertype/Subtype 70
Supertype/Subtype Hierarchy 72
Domains 73
Domain Integrity Constraints 73
Exercises 77-87
Chapter 1
Database
File Systems and Associated Problems
Benefits of Database Approach
Database Mangement System
DBMS Functions
Database System
Users
Functions of a Database Adminsitrator (DBA)
Components of DBMS
Data Model
Database Architecture
Schema
Types of Database Models
Objectives
• What is DBMS
• Various functions of a DBMS
• Database system
• Role of a DBA
• Components of DBMS
• Database Architecture
• Types of Database Models
Data
Data are known facts that can be recorded and that have implicit meaning.
Database
Database is a logical collection of relevant data. It is designed to offer an organized
mechanism for storing, managing and retrieving stored information. A ledger, a
telephone directory or an address book can be called a database because they all
store related data in a structured way.
Traditionally, data accessed through computers has been stored on different storage
media in the form of individual files. Files proved to be quite satisfactory as long as
computerization was limited to a few application areas and the use of computers
restricted to a privileged few. However, as actual users grew in number, especially
with the advent of online time-sharing systems, the file systems gave rise to many
serious problems. The discipline of database systems evolved in response to these
problems. Let us first consider what these problems are so as to understand the
different features of database systems more clearly.
As has been already stated, Database Systems provide an effective solution to the
above problems. Let us see how.
You have just seen, files give rise to several problems because they are application
specific. Consequently, the applications become data-dependent, that is, they
depend upon the organization and access method for the data on the secondary
storage. This happens because, with conventional application development tools such
as COBOL, the application logic incorporates the knowledge of data organization and
access methods. Therefore, most changes in data organization or access methods
affect the application logic substantially. If problems arising due to this fact are to be
avoided, the data organization (and the access method) and the application logic
have to be made independent of each other. Database Systems do precisely this.
The first step towards this goal is to distinguish between data as is actually stored
(called as the physical representation of data) and data as is presented to an
individual user (called as the logical representation of data).
• Data can be shared - Multiple users can login into the database to access
information and each of them are granted access to the database. They can
manipulate the database in a controlled environment.
DBMS Functions
• Data Definition - Database allows us to define our own data in a simpler
possible way.
• Data Security and Integrity - Database allows us to secure the data and a
true picture of the data is given to users accessing it.
• Data Recovery and Concurrency - We can always get back to a previously
defined consistent state of the database in case of a crash; and multiple users
can still access the database.
• Data Dictionary - Database maintains Meta information in its dictionaries.
This will help database identify information on behalf of the user queries.
• Performance - Performance of the database is maintained irrespective of the
load it takes in terms of number of users accessing the database.
Database System
A DBMS is a complex piece of software that usually consists of a number of modules.
It may be considered as an agent that allows communication between the various
types of users with the physical database and the operating system without the
users being aware of every detail of how it is done. To enable the DBMS to fulfill its
tasks, the database management system must maintain information about the data
itself that is stored in the system. This information would normally include what data
is stored, how it is stored, who has access to what parts of it, and so on.
The information about the data in a database is called the metadata (data about
data). In addition to information listed above, some information regarding the use of
a database is often collected to monitor the system's performance. This metadata
helps management in maintaining an effective and efficient database system.
Users
Application Programs/Queries
Users
The three broad classes of users are as follows:
The database administrator is responsible for the overall planning of the company’s
data resources, for the design of data, and for the day-to-day operational aspects of
data management.
The overall planning of corporate data is the strategic aspect of the database
administration function and involves company-wide planning of existing data and
assessment of organization-wise data standards.
• Designing schema
To carry out all these functions, it is crucial that the DBA has all the accurate
information about the company’s data readily on hand. For this purpose he maintains
a data dictionary. The data dictionary contains definitions of all data items and
structures, the various schemes, the relevant authorization and validation checks
and the different mapping definitions. It should also have information about the
source and destination of a data item and the flow of a data item as it is used by a
system. This type of information is a great help to the DBA in maintaining centralized
control of data.
Components of DBMS
The main components of DBMS are:
A Query Language and a Data Description Language (DDL) to provide users the
access to the database.
Database manager - provides interface between the low level data stored in the
database and the application programs and queries submitted to the system.
File manager - manages the allocation of space on disk storage and the data
structures used to represent information stored on the disk.
The metadata – This describes the data for database to provide access to it for
users.
The above listing of DBMS components is not exhaustive, and also includes some
very important components like concurrency controller and recovery manager. These
components have not been shown (to keep the architecture relatively simple).
Data Model
One fundamental characteristic of the database approach is that it provides some
level of data abstraction by hiding details of data storage that are not needed by
most database users. A data model is the main tool for providing this abstraction.
A data model is a set of concepts that can be used to describe the structure of a
database. It is a collection of high-level data description constructs that hide many
low-level storage details.
High Level or conceptual data models: Provide concepts that are close to the
way many users perceive data. Use concepts, such as entities, attributes and
relationships, where:
• Entity represents a real world object (e.g., student, employee) or concepts
(e.g., course, company);
• Attribute represents properties that describes objects (e.g., color, name);
• Relationships represent an interaction or links among entities (e.g., works-on, is-
a, has, etc.).
Architecture
The goal of the three-schema architecture is to separate the user, applications and
the physical database. The three levels of architecture are:
Internal Level
The internal level is the one closest to the physical storage, i.e., it is the one
concerned with the way data is physically stored. The internal (or physical) database
is stored on secondary storage devices, mainly the magnetic disk. It itself can be
conceptually viewed at different levels of abstraction.
At its lowest level, it is stored in the form of bits with the associated physical
addresses on the secondary storage device.
At its highest level, it can be viewed in the form of files and simple data structures. It
is this level that we shall study when we discuss the physical organization for
databases in later chapters.
The physical database is described by means of a physical schema or an internal
schema. It essentially describes the various types of stored records, the different
indexes that are employed for accessing these and the representations for different
stored fields. It is also called as the “storage structure definition”.
External Level
© SQL Star International Ltd. 11
Database Management System (DBMS)
The external level is the one closest to the user, i.e., it is the one concerned with
the way data is viewed by individual users. The external model (or view) is
application-specific. Therefore, the user views the database through an external
model, and there are as many external views as there are applications. External
views are the proper interface between the user and the database as an individual
user can hardly be expected to be interested in the entire database.
The external model is derived from the conceptual model. For this purpose, the
correspondence between the particular external models has to be defined. An
external/conceptual mapping similar to the conceptual/physical mapping does this.
However, a separate mapping has to be defined for each external view.
The user interacts with the database through a high-level language such as COBOL,
PL/I or some special purpose language. This language is known as the host
language. It includes a data sub-language (DSL). The user carries out the retrieval
and storage operations on the database through the DSL.
In fact, the Database Task Group (DBTG) report published in April 1971, contains
proposals for three distinct languages, two of which relate very closely to the concept
of a DSL. These are sub schema Data Description Language (DDL) and Data
Manipulation Language (DML). The sub schema DDL is used for defining the external
views while the DML is used for carrying out operations on the database.
Conceptual Level
The conceptual level is a level of indirection between the other two. The conceptual
model, also called as the data model, represents information content of the database
in its entirety, but is abstract with respect to the physical database. Broadly
speaking, the conceptual model provides a view of the data as it really is.
However, it is not the same as the external record. It contains all the information to
© SQL Star International Ltd. 12
Database Management System (DBMS)
build relevant external records. For example, a conceptual stock record may consist
of the quantity of material and the buying rate but not its value; still the user’s
external record may consist of the value of the stock. A conceptual model may
consist of occurrences of such stock records, a collection of supplier record
occurrences and a collection of assembly records.
Obviously, the conceptual model is derived from the physical model. For this, the
database needs a conceptual/physical mapping, which specifies how conceptual
records and fields map into their counterparts in the physical database. The
conceptual database is described by means of a conceptual schema. Needless to say,
the conceptual schema is independent of the physical characteristics of data, such as
storage structures, physical sequences, stored field representations etc. Ideally the
conceptual schema should include many features in addition to just the definitions of
conceptual records. These may include relevant authorization checks and validation
procedures, the uses of data, the source and destination of data etc.
The conceptual database is a real-world view of data from the organization point of
view. As the real world changes, changes have to be made to the conceptual
database and schema as well. In such a case, it is usually possible to limit the
corresponding changes to only those external schemas, which use the conceptual
elements that are changed.
There will be many distinct external views, each consisting of a more or less abstract
representation of some portion of the total database, and there will be one
conceptual view, consisting of a similarly abstract representation of the database in
its entirety. Likewise, there will be precisely one internal view, representing the total
database as physically stored.
External level
(Individual
user views)
Conceptual Level
(Community user
view)
Internal level
(Storage view)
Database
Database Architecture
Conceptual Model
Internal View
Schema
A description of data in terms of a data model is called a schema. The description of
a database is called database schema, which is specified during database design
and is not expected to change frequently.
Data Independence
Data independence refers to changing the schema at one level of a database system
without the need to change the schema at the next higher level.
Logical data independence: Separating the external views from the conceptual
view enables us to change the conceptual view without affecting the external views.
This separation is sometimes called logical data independence.
Hierarchical Model
Network model: This model represents data as record types. Here, we have
explicit linkages (expressed in the form of pointers), which relate various records.
Each record has a link field corresponding to every relationship that it participates in.
IDS (Integrated Data Store) is one of the DBMS product based on network models.
Network model
Relational model: In this model, each database item is viewed as a record with
attributes. A set of records with similar attributes is called a table. Most of the
popular commercial DBMS products like Oracle, Sybase, MySQL, etc. are based on
relational model.
Relational model
Object Relational model
Hierarchical, network and relational database models have been quite successful in
storing data for traditional business applications. But, object oriented databases
evolved to handle more complex applications such as databases for scientific
experiments, geographic information system, engineering design and manufacturing.
An object oriented database stores data, their relationships and the way they
interact with other data. This model draws its concept from real world objects. As
compared to the relational database approach, which deals with data at the lowest
level, that is, columns and rows, the object oriented approach deals with data at a
higher level, that is, with the objects surrounding the data. This model represents DB
in terms of objects, their attributes and their behaviors.
Summary
Evolution of RDBMS
What is a RDBMS?
Features of RDBMS
Referential Integrity
Objectives
In this chapter, we will discuss:
• Evolution of RDBMS
• Relational Database
• Relational Database Management System (RDBMS)
• Features of an RDBMS
• Important terms related to RDBMS
• Different types of keys and their use
• Explain referential integrity
RDBMS
Dr. E.F.Codd outlined the principles of the relational model, which formed the basis
for the evolution of the Relational Database Management System.
A Relational Database Management System is defined as a collection of tables
related to each other through common values.
Evolution of RDBMS
Before the acceptance of Codd’s Relational Model, database management systems
was just an ad hoc collection of data designed to solve a particular type of problem,
later extended to solve more basic purposes. This led to complex systems, which
were difficult to understand, install, maintain and use. These database systems were
plagued with the following problems:
• They required large budgets and staffs of people with special skills that were
in short supply.
• These database systems did not support the implementation of business logic
as a DBMS responsibility.
Hence, the objective of developing a relational model was to address each and every
one of the shortcomings that plagued those systems that existed at the end of the
1960s decade, and make DBMS products more widely appealing to all kinds of users.
The existing relational database management systems offer powerful, yet simple
solutions for a wide variety of commercial and scientific application problems. Almost
every industry uses relational systems to store, update and retrieve data for
operational, transaction, as well as decision support systems.
It is a tool, which can help you store, manage and disseminate information of various
kinds. It is a collection of objects, tables, queries, forms, reports, and macros, all
stored in a computer program all of which are inter-related.
Features of an RDBMS
Catalog:
A catalog consists of all the information of the various schemas (external, conceptual
and internal) and also all of the corresponding mappings (external/conceptual,
conceptual/internal).
It contains detailed information regarding the various objects that are of interest to
the system itself; e.g., tables, views, indexes, users, integrity rules, security rules,
etc.
In a relational database, the entities of the ERD are represented as tables and their
attributes as the columns of their respective tables in a database schema.
• Table: Tables are the basic storage structures of a database where data about
something in the real world is stored. It is also called a relation or an entity.
For example, the set of values that the attribute EMPLOYEE.id can assume are a
Key: An attribute or set of attributes whose values uniquely identify each entity in
an entity set is called a key for that entity set.
It is a unique identifier for the table (a column or a column combination with the
property that at any given time no two rows of the table contain the same value in
that column or column combination).
In such a case, we must decide which of the candidate keys will be used as the
primary key. The remaining candidate keys would be considered alternate keys.
A case in point is the entity set EMPLOYEE having the attribute department, which
identifies by its value all instances EMPLOYEE who belong to a given department.
Any key consisting of a single attribute is called a simple key, while that consisting
of a combination of attributes is called a composite key.
Referential Integrity
Referential Integrity can be defined as an integrity constraint that specifies that the
value (or existence) of an attribute in one relation depend on the value (or
existence) of an attribute in the same or another relation.
For referential integrity to hold, any field in a table that is declared a foreign key can
contain only values from a parent table's primary key field. For instance, deleting a
record that contains a value referred to by a foreign key in another table would
break referential integrity.
Primary Key
Course
E01 ELECTRONICS
M02 MATHS
A03 ACCOUNTS
B04 BIOLOGY
Foreign Key
Student
Summary
• Evolution of RDBMS
• Relational Database
• Relational Database Management System (RDBMS)
• Features of an RDBMS
• Important terms related to RDBMS
• Different types of keys and their use
• Referential Integrity
Objectives
y Requirement Analysis
y Conceptual Design (ER Model is used at this stage)
y Schema Refinement (Normalization)
y Logical Design
y Physical Database Design and Tuning
y Physical Design: Here, the internal storage structures/ access paths and file
organizations for the database files are specified. These activities and application
programs are designed and implemented as database transactions corresponding
to the high level specifications.
E-R Modeling
The Entity-Relationship model (ER Model in short) is a graphical designing tool for
implementation of database systems. It provides a common, informal and convenient
model for communication between users and the DBA for the purpose of modeling
the structure of data.
y Entity
y Entity Set
y Instance
y Attribute
y Relationship
y Cardinality
y Keys
Entity: An entity is anything that exists and is distinguishable. For example, each
chair is an entity. So is each person and each automobile. Entities can have concrete
existence or constitute ideas or concepts. Concepts like love and hate are entities.
A regular (independent) entity does not depend on any other entity for its
existence. For example, Employee is a regular entity. A regular entity is depicted
using a rectangle.
Employees
Employees
Or
1. All persons
2. All automobiles
3. All emotions
An attribute is denoted by an ellipse with its type written inside thereby attached to
their respective entity.
Type Name
Name
Employees
Same entity set could participate in different relationship sets, or in different “roles”
in same set.
It is shown using -
Type
It is shown using -
or
Degree of Relationship:
y Unary Relationship
y Binary Relationship
y Ternary Relationship
y N-ary Relationship
Unary Relationship:
A relationship where only one entity participates in more than role, is called a Unary
Relationship.
manages
Employee
Binary Relationship:
a Binary relationship.
Example:
Employee
Manager manages
Ternary relationship:
A relationship where three entity types are involved is called a ternary relationship.
Example:
Customer
N-ary Relationship:
1 Issued 1
Student Card
y One-to-many (or many-to-one): One student can enroll for only one course, but
one course can be offered to many students.
Chen- notation
1 m
Student enroll Course
y Many-to-many: One student can take many tests, and one test can be taken by
many students.
Keys:
Data items used to uniquely identify individual occurrences of an entity type.
Candidate Keys:
Composite key:
A candidate key with more than one attribute is called a composite key.
Let us now see how the E-R model is implemented using the above discussed
notations.
Consider that an employee works in a department and his details stored in the
database include his id,name, department name, department id etc.
In the above figure, we show the relationship set Works_in, in which each
relationship indicates a department in which an employee works. The entities are
described by a set of attributes and identified by primary keys denoted as ‘__’.
The entity sets that participate in a relationship set need not be distinct; sometimes
a relationship might involve two entities in the same entity set. For example, in
Reports_To relationship set, every relationship is of the form (emp1, emp2).
Works_In relationship shows that an employee can work in many departments and a
department can have many employees
Relationship sets can also have descriptive attributes (e.g., the since attribute of
Works_ In).
• Keys for each participating entity set (as foreign keys). This set of attributes
forms superkey for the relation.
Look at an example:
Consider the relationship Manages: Each dept has at most one manager,
according to the key constraint on ‘Manages’ relationship. The arrow from
Department to Manages indicates that each Department entity appears in at most
one ‘Manages’ relationship in any allowable instance of ‘Manages’. Thus given a
Department entity, we can uniquely determine the ‘Manages’ relationship in which it
appears.
Map relationship to a table: Note that did is the key now. Since each department has
a unique manager, we could instead combine ‘Manages’ and Departments.
REFERENCES Employees,
FOREIGN KEY (did)
REFERENCES Departments)
The following figures show the relationship between employee, department and
locations. Since three entity are involved in the relationship with a key constraint on
the employee entity, is it known as key constraint for ternary relationship.
In the above figure, SSn, Did and Address are a primary keys in the Employee
entity, Department entity and location entity respecitively.
An arrow drawned from the employee entity indicates that an employee can work in
at most one department at a single location.
Participation Constraints:
The key constraint on ‘Manages’ tells us that a Department has at most one Manager
(indicated by arrow).
y Total participation
y Partial participation
Weak entity
A weak entity’s existence is dependent on another (owner) entity. Hence, a weak
entity will not have it’s own key. It can be identified uniquely only by considering the
primary key of its owner entity.
y Owner entity set and weak entity set must participate in a one-to-many
relationship set (1 owner, many weak entities).
y Weak entity set must have total participation in this identifying relationship set.
Translating Weak Entity Sets: Weak entity set and identifying relationship set are
translated into a single table.
y When the owner entity is deleted, all owned weak entities must also be deleted.
For example: If the employee quits, any policy owned by the employee is
terminated. All the relevant policy and dependent information is also deleted from
the database.
To indicate that Dependent is a weak entity and policy is its identifying relationship,
we draw both with dark lines.
ISA Constraints:
There are two types of ISA constraints:
Queries involving all employees easy, those involving just Hourly_Emp require a join
to get some attributes.
Alternative:
y Just Hourly_ Emp and Contract_ Emp.
y Hourly_ Emp : ssn, name, lot, hourly_ wages, hours_ worked.
Aggregation
Aggregation is meant to represent a relationship between a whole object and its
component parts. It is used when we have to model a relationship involving (entity
sets and a relationship set).
Monitors are mapped to the table like any other relationship set.
The use of aggregation vs. ternary relationship may be guided by certain integrity
constraints. For example: we can impose a constraint that each sponsorship is
monitored by at most one employee (not possible without aggregation).
Works_In does not allow an employee to work in a department for two or more
periods. Why?
Consider that an employee works in a given department over more than one period.
This possibility is ruled out by the ER diagram’s semantics of previous slide. The
problem is that we want to record several values for descriptive attributes for each
instance of Works_in relationship. We can address this problem by introducing an
entity set called Duration, with attributes from and to.
But, what if a manager gets a discretionary budget that covers all managed
departments? The following factors follow:
One of the possible designs to resolve the two issues of the previous ER diagram:
We model the appointment as an entity set, say Mgr_appt, and use a ternary
relationship, say manages, to relate a manager, an appointment, and a department.
The budget is now associated with the appointment of the employee
The figure below models a situation in which an employee can own several policies,
each policy can be owned by several employees, and each dependent can be covered
by several policies.
Each policy is owned by just 1 employee. Key constraint on Policy would mean policy
can only cover 1 dependent!
Hence, there is a further need for refining the schema. Relational schema obtained
from ER diagram is a good first step. But, the ER design is subjective and can’t
express certain constraints; so this relational schema may need refinement.
Functional Dependencies
For example, a department can’t order two distinct parts from the same supplier. We
cannot express this with respect to ternary Contracts relationship.
Normalization refines ER design by considering FDs.
The next chapter will deal with Normalization to refine the Entity Relationship Design.
Summary
Normalization
Why Normalization?
What is a Normal Form?
Types of Normal Forms
First Normal Form
Functional Dependencies
Second Normal Form
Transitive Dependency
Third Normal Form
Boyce-Codd Normal Form
Multivalued Dependency
Fourth Normal Form
Fifth Normal Form
Objectives
• Normalization
• Reasons for Normalization
• Refining a database
• Defining Normal Form
• Types of Normal Forms
Normalization
Normalization is a process of designing a consistent Database by minimizing
redundancy and ensuring Data Integrity through the principle of Non-loss
decomposition.
Why Normalization?
a. Does the design ensure that all database operations will be efficiently
performed and that the design does not make the DBMS perform expensive
consistency checks, which could be avoided?
Unless these issues are properly handled, several difficulties like redundancy and loss
of information may arise. There are several methods to avoid the above-mentioned
dependencies.
Database normalization:
Data integrity ensures the correctness of data stored within the database.
It is achieved by imposing integrity constraints.
An integrity constraint is a rule, which restricts values present in the
database.
♦ Entity constraints:
The entity integrity rule states that the value of the primary key can never be a null
value (a null value is one that has no value and is not the same as a blank). Because
a primary key is used to identify a unique row in a relational
table, its value must always be specified and should never be unknown. The integrity
rule requires that insert, update and delete operations maintain the uniqueness and
existence of all primary keys.
♦ Domain Constraints:
Only permissible values of an attribute are allowed in a relation.
♦ Direct Redundancy:
Direct redundancy can result due to the presence of same data in two
different locations, thereby, leading to anomalies such as reading, writing,
updating and deleting.
♦ Indirect redundancy:
Indirect Redundancy results due to storing information that can be
computed from the other data items stored within the database.
Normalized databases have a design that reflects the true dependencies between
tracked quantities, allowing quick updates to data with little risk of introducing
inconsistencies. There are formal methods for quantifying "how normalized" a
relational database is, and these classifications are called Normal Forms (or NF).
A database is said to be in one of the Normal Forms, if it satisfies the rules required
by that Form as well as previous; it also will not suffer from any of the problems
addressed by the Form.
A form is said to be in its particular form only if it satisfies the previous Normal form.
A Relation is in 1NF, if every row contains exactly one value for each attribute.
Consider a table ‘Faculty’ which has information about the faculty, subjects and, the
number of hours allotted to each subject they teach.
Faculty:
Anomalies: -
The above table does not have any atomic values in the ‘Subject’ column. Hence, it
is called un-normalized table. Inserting, Updating and deletion would be a problem is
such table.
For the above table to be in first normal form, each row should have atomic values.
Hence let us re-construct the data in the table. A ‘S.No’ column is included in the
table to uniquely identity each row.
This table shows the same data as the previous table but we have eliminated the
repeating groups.
Hence the table is now said to be in First Normal form (1NF). But we have
introduced Redundancy into the table now. This can be eliminated using Second
Normal Form (2NF).
It is denoted by
Or
A determines B.
In the above example, Hours is fully functional dependent on both Empno and
Project_no.
Partial Dependency:
Partial dependency:
Empno -> Ename holds.
y Find and remove attributes that are related to only a part of the key.
y Group the removed items in another table.
y Assign the new table a key that consists of that part of the old composite key.
While eliminating the repeating groups, we have introduced redundancy into table.
Faculty Code, Name and date of Birth are repeated since the same faculty is multi
skilled.
To eliminate this, let us split the table into 2 parts; one with the non-repeating
groups and the other for repeating groups.
Faculty:
Faculty code Faculty Name Date of Birth
100 Smith 17/07/64
101 Jones 24/12/72
102 Fred 03/02/80
103 Robert 28/11/66
Subject:
SNO Faculty code Subject Hours
1 100 Java 16
2 100 PL/SQL 8
3 100 Linux 8
4 101 Java 16
5 101 Forms 8
6 101 Reports 12
7 102 SQL 10
8 102 Linux 8
9 102 Java 16
10 103 SQL 10
11 103 PL/SQL 8
12 103 Forms 8
Faculty Code is the only key to identify the faculty name and the date of birth.
Hence, Faculty code is the primary key in the first table and foreign key in the
second table.
Faculty code is repeated in the Subject table. Hence, we have to take into account
the ‘SNO’ to form a composite key in Subject table. Now, SNO +Faculty code can
unique identity each row in this table.
This Subject table should therefore be further decomposed without any loss of information as:
Subject Hours
Transitive Dependency
Transitive dependencies arise:
In order to remove the anomalies that arose in Second Normal Form and to remove
transitive dependencies, if any, we have to perform third normalization.
Now let us see how to normalize the second table obtained after 2NF.
Subject:
In this table, hours depend on the subject and subject depends on the Faculty
code and SNO. But, hours is neither dependent on the faculty code nor the SNO.
Hence, there exits a transitive dependency between SNO, Subject and Hours.
Fac_Sub:
Sub_Hrs:
Subject Hours
Java 16
PL/SQL 8
Linux 8
Forms 8
Reports 12
SQL 10
After decomposing the ‘Subject’ table we now have ‘Fac_Sub’ and ‘Sub_Hrs’ table
respectively. By doing so, the following anomalies are addressed in the table.
Insertion: - No redundancy of data for subject and hours while inserting the records.
Updation: - Subject and hours are stored in the separate table. So updation becomes
much easier as there is no repetitiveness of data.
Deletion: - Even if the faculty leaves the organization, the hours allotted to a
particular subject can be still retrieved from the Sub_Hrs table.
The intention of Boyce-Codd Normal Form (BCNF) is that - 3NF does not
satisfactorily handle the case of a relation processing two or more composite or
overlapping candidate keys.
In most cases, third normal form is the sufficient level of decomposition. But some
case requires the design to be further formalized upto the level of 4th as well as 5th.
These are based on the concept of MultiValued Dependency. Let us have a idea
about it now.
Multivalued Dependency:
Multivalued dependency defined by X Y is said to hold for a relation R(X,Y,Z) if
for a given set of values for X, there is a set of associated values for set of values of
attribute Y, and X values depend only on X values and have no dependence on the
set of attributes Z.
In the above example, same topic is being taught in a seminar by more than 1
faculty. And Each Faculty takes up different topics in the same seminar. Hence, Topic
names are being repeated several times. This is an example of multivalued
dependency. For a table to be in fourth Normal Form, multivalued dependency must
be avoided.
If we were to add the seminar DAT-2 to New York, we would have to add a line to
the table for each instructor located in New York.
The table would look like as shown below adding the above information:
From the above table, we observe that there is a redundancy of data stored for
Brown’s information. So to eliminate this redundancy, we have to do a ‘Non-Loss
decomposition’ of the table.
Consider the following decomposition of the above table into fifth normal form:
Faculty Seminar
Brown DBP-1
Brown DAT-2
Robert DBP-1
Robert DAT-2
Seminar Location
DBP-1 New York
DAT-2 Chicago
DBP-1 Chicago
DAT-2 New York
Faculty Location
Brown New York
Brown Chicago
Robert Chicago
Robert New York
Generally, table is in fifth normal form when its information content cannot be
reconstructed from several smaller tables, i.e., from tables having fewer fields than
the original table, each table having different keys.
In the normalized form, the fact that ‘Brown’ traveling to ‘New York’ is recorded only
once, whereas, in the unnormalized form it may be repeated many times.
An attempt has been made to explain Normal forms in a simple yet understandable
manner.
Some redundancies are unavoidable. One should take care while normalizing a table
so that data integrity is not compromised for removing redundancies.
Summary
y Normalization
y Reasons for Normalization
y Refining a database
y Normal Form
y Types of Normal Forms
Supertype
Subtype
Inheritance
Relationships and Subtypes
Supertype/Subtype Notation
Generalization and Specialization
Constraints in Supertype
Constraints in Supertype/Subtype
Supertype/Subtype Hierarchy
Domains
Domain Integrity Constraints
Objectives
Basics
Supertype
Supertype is a generic parent entity that contains generalized attributes and key. It
is a generic entity type that has a relationship with one or more subtypes.
Subtype
A subtype is a subgrouping of the entities in an entity type, which has attributes that
are distinct from those in other sub groupings. Subtypes are category entities that
inherit the attributes keys, and relationships of the Supertype entity. Each subtype
entity will contain the migrated foreign key and only those attributes that pertain to
the category type.
Inheritance
Subtype entities inherit values of all attributes of the supertype. An instance of a
subtype is also an instance of the supertype.
By this important property, the subtype entities inherit values of all attributes of the
supertype. It makes it unnecessary to include supertype attributes redundantly with
the subtypes.
Attributes shared
by all entities
And so forth
Specialized versions of
Subtype 1 Subtype 2 supertype
Attributes Attributes
unique unique
to subtype 1 to subtype 2
Example:
The following figure shows an Employee supertype with three subtypes.
Addres
Employ
s
ee na
All Employee subtypes will
Employ EMPLOYEE Date_hi have Emp name, number,
ee_no red date_hired and address.
contact
_numbe
Hourly_ annual_ Stock_o Billing_r
rate salary ption ate
Example:
Supertype/Subtype Notation
y SUPERTYPE
y SUBTYPE 3
y SUBTYPE 2
y SUBTYPE 1
y Attributes unique to subtype 1
y Attributes unique to subtype 2
y Attributes unique to subtype 3
y Attributes shared by all entities
Generalization
The primary rule of generalization hierarchies is that each instance of the supertype
entity must appear in at least one subtype; likewise, an instance of the subtype must
appear in the supertype.
Subtypes can be a part of only one generalization hierarchy. That is, a subtype
cannot be related to more than one supertype. However, generalization hierarchies
may be nested by having the subtype of one hierarchy be the supertype for another.
Subtypes may be the parent entity in a relationship, but not the child. If this were
allowed, the subtype would inherit two primary keys.
The following figure shows three entity types: CAR, TRUCK and MOTORCYCLE.
Specialization
It is the process of defining one or more subtypes of the supertype, and forming
supertype/subtype relationships TOP-DOWN.
Constraints in Supertype
Completeness Constraints
The total specialization rule specifies that each entity instance of the supertype must
be a member of some subtype in the relationship. For example: all STUDENTS are
either UNDERGRADUATE or GRADUATE students.
The partial specialization rule specifies that an entity instance of the supertype is
allowed to not belong to any subtype. For example: FACULTY and STAFF are not the
only possible members of the entity EMPLOYEE.
Disjointness Constraints
The disjoint constraint addresses the question of whether an instance of a Super
type may simultaneously be a member of two (or more) subtypes.
a) Disjoint Rule
The disjoint rule specifies that if an entity instance is a member of one subtype, it
cannot simultaneously be a member of any other subtype. For example: all
PERSONS are either MALE or FEMALE.
b) Overlap Rule
The overlap rule specifies that an entity instance can simultaneously be a member
of two (or more) subtypes. For example: an ATHLETE can be both a RUNNER and a
JUMPER. It is denoted by the letter “O”.
Constraints in Supertype/Subtype
Discriminators
Subtype Discriminator
The subtype discriminator is “an attribute of the supertype whose values determine
the target subtype(s)”. It is used to direct into which of the subtypes (if any) a new
instance of the supertype should be inserted.
The following figure introduces subtype discriminators - disjoint rule and overlap
rule.
Supertype/Subtype Hierarchy
A supertype/subtype hierarchy is “a hierarchical arrangement of supertypes and
subtypes, where each subtype has only one supertype”.
In this hierarchy, attributes are assigned at the highest logical level that is possible
in the hierarchy. Subtypes that are lower in the hierarchy inherit attributes not only
from their immediate supertype, but also from all supertypes higher in the
hierarchy, up to the root.
Domains
A domain is a conceptual pool of values from which one or more attributes draw their
actual values.
Examples:
Two values can only be compared if they come from the same domain.
Defining a Domain
The syntax to create a domain in a database is as follows:
Example:
DOMAIN GENDER
Domains are used in the relational model to define the characteristics of the columns
of a table. The domain specifies its own name, data type and logical size. The logical
size represents the size as perceived by the user, not how it is implemented
internally.
For example, for an integer, the logical size represents the number of digits used to
display the integer, not the number of bytes used to store it. The domain integrity
constraints are used to specify the valid values that a column defined over the
domain can take. You can define the valid values by listing them as a set of values
y The ability to specify the complete set of domains that apply to a given
database (the result of any operation on any column defined over any domain
must then yield a result in one of the specified domains).
y The ability to specify - for every domain, pair of domains, triplet of domains,
and so on - which operators can be applied to the values taken from the
domains, as well as what the domain of the result must be.
y The ability to specify an ordering of the values in the domain.
Summary
Exercises
Chapter 6
E-R Diagrams
1. Construct an E-R Diagram for a hospital with a set of patients and a set of medical
doctors. A log of the various conducted tests is associated with each patient. Construct
the normalized relations from this ER diagram.
2. Construct an E-R Diagram for a car insurance company with a set of customers, each of
who owns a number of cars. Each car has a number of accidents associated with it.
Construct the normalized relations from this ER diagram.
3. Consider the following E-R Diagram: Represent the diagram in the relational model by
relations (tables).
4. Suppose we have a database consisting of the following 3 relations:
FREQUENTS ( DRINKER, BAR )
SERVES ( BAR, BEER )
LIKES ( DRINKER, BEER )
The first relation indicates the bars each drinker visits, the second tells what beers
each bar serves, and the last indicates which beers each drinker likes to drink.
Draw an E-R Diagram for the given relations.
5. An education database contains information about an in-house company education-
training scheme. For each training course, the database contains details of all
prerequisite courses and all offerings for that course; and for each offering it contains
details of all teachers and all student enrollments for that offering. The database also
contains information about employees. The relevant relations in outline are as follows:
COURSE ( COURSE#, TITLE )
PREREQ (SUP_COURSE#, SUB_COURSE# )
OFFERING ( COURSE#, OFF#, OFFDATE, LOCATION )
TEACHER ( COURSE#, OFF#, EMP# )
ENROLLMENT ( COURSE#, OFF#, EMP#, GRADE )
EMPLOYEE ( EMP#, ENAME, JOB )
The meaning of the PREREQ relation is that the superior course (SUP_COURSE#) has the
subordinate course (SUB_COURSE#) as an immediate prerequisite.
Draw an E-R Diagram for this education database.
Normalization
6. Consider the table:
CUSTOMERS(Customer_Number, Customer_Name)
ORDERS(Order_Number, Customer_Number*, Order_Date)
ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)
PARTS(Part_Number,Part_Description,Part_Price,Supplier_Number,
Supplier_Name)
The structures are now in 2NF, since every non-primary key attribute depends on the
whole of the key. The next step is to convert the structure into 3NF by ensuring that each
non-primary key attribute depends on nothing, but the key.
The CUSTOMERS table is patently in 3NF, because there is no non-primary key attribute
for Customer_Name to depend on. The ORDERS table is in 3NF, because there is no
dependency between Order_Date and Customer_Number (a customer can place different
orders on different dates). The ORDER_PARTS table is in 3NF, because the quantity
ordered is dependent on both the order number and the part number. Looking however at
the PARTS table it can be seen that the Supplier_Name attribute depends on the
Supplier_Number and has nothing to do with the part number. To convert the structure
into 3NF, a separate table is created containing supplier details.
CUSTOMERS(Customer_Number, Customer_Name)
ORDERS(Order_Number, Customer_Number*, Order_Date)
ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)
PARTS(Part_Number, Supplier_Number*, Part_Description, Part_Price)
SUPPLIERS(Supplier_Number, Supplier_Name)
Example:
Manage:
Department and Employee
Partial Participation
Relation Attribute : StartDate.
Works For:
Department and Employee
Total Participation
Control :
Department , Project
Partial Participation from Department
Total Participation from Project
Control Department is a RKA.
Supervisor :
Employee, Employee
Partial and Recursive
Works–On:
Project , Employee
Total Participation
Hours Worked is a RKA.
Dependants of:
Employee , Dependant
Dependant is a Weaker
Dependant is Total , Employee is Partial.
Summary of ER
Several kinds of integrity constraints can be expressed in the ER model: Key constraints,
Participation constraints, and Overlap/Covering constraints for ISA hierarchies. Some
Foreign key constraints are also implicit in the definition of a relationship set.
• Some of these constraints can be expressed in SQL only if we use general
CHECK constraints or assertions.
Case Studies
1. Prescriptions-R-X chain
The Prescriptions-R-X chain of pharmacies has offered to give you a free lifetime supply of
medicines if you design its database. Given the rising cost of health care, you agree.
Here's the information that you gather:
Patients are identified by an SSN, and their names, addresses, and ages must be
recorded.
Doctors are identified by an SSN. For each doctor, the name, specialty, and years of
experience must be recorded.
Each pharmaceutical company is identified by name and has a phone number.
For each drug, the trade name and formula must be recorded. Each drug is sold by a given
pharmaceutical company, and the trade name identifies a drug uniquely from among the
products of that company. If a pharmaceutical company is deleted, you no longer need to
keep track of its products.
Each pharmacy has a name, address, and phone number.
Every patient has a primary physician. Every doctor has at least one patient.
Each pharmacy sells several drugs and has a price for each. A drug could be sold at
several pharmacies, and the price could vary from one pharmacy to another.
Doctors prescribe drugs for patients. A doctor could prescribe one or more drugs for
several patients, and a patient could obtain prescriptions from several doctors. Each
prescription has a date and a quantity associated with it. You can assume that if a doctor
prescribes the same drug for the same patient more than once, only such last prescription
needs to be stored.
Pharmaceutical companies have long-term contracts with pharmacies. A pharmaceutical
company can contract with several pharmacies, and a pharmacy can contract with several
pharmaceutical companies. For each contract, you have to store a start date, an end date,
and the text of the contract.
Pharmacies appoint a supervisor for each contract. There must always be a supervisor for
each contract, but the contract supervisor can change over the lifetime of the contract.
1. Draw an ER diagram that captures the above information. Identify any constraints that
are not captured by the ER diagram.
2. How would your design change if each drug must be sold at a fixed price by all
pharmacies?
3. How would your design change if the design requirements change as follows: If a
doctor prescribes the same drug for the same patient more than once, several such
prescriptions may have to be stored.
Computer Sciences Department has been frequently complaining to Dane County Airport
officials about the poor organization at the airport. As a result, the officials have decided
that all information related to the airport should be organized using a DBMS, and you've
been hired to design the database. Your first task is to organize the information about all
the airplanes that are stationed and maintained at the airport.
The relevant information is as follows:
• Every airplane has a registration number, and each airplane is of a specific model.
• The airport accommodates a number of airplane models, and each model is
identified by a model number (e.g., DC-10) and has a capacity and a weight.
• A number of technicians work at the airport. You need to store the name, SSN,
address, phone number, and salary of each technician.
• Each technician is an expert on one or more plane model(s), and his or her
expertise may overlap with that of other technicians. This information about
technicians must also be recorded.
• Traffic controllers must have an annual medical examination. For each Traffic
controller, you must store the date of the most recent exam.
• All airport employees (including technicians) belong to a union. You must store the
union membership number of each employee. You can assume that each employee
is uniquely identified by the social security number.
• The airport has a number of tests that are used periodically to ensure that airplanes
are still airworthy. Each test has a Federal Aviation Administration (FAA) test
number, a name, and a maximum possible score.
• The FAA requires the airport to keep track of each time that a given airplane is
tested by a given technician using a given test. For each testing event, the
information needed is the date, the number of hours the technician spent doing the
test, and the score that the airplane received on the test.
1. Draw an ER diagram for the airport database. Be sure to indicate the various attributes
of each entity and relationship set; also specify the key and participation constraints
for each relationship set. Specify any necessary overlap and covering constraints as
well (in English).
2. The FAA passes a regulation that tests on a plane must be conducted by a technician
who is an expert on that model. How would you express this constraint in the ER
diagram? If you cannot express it, explain briefly.
3. University Database