Introduction To RDBMS Day - 1

Introduction to RDBMS
Day - 1
What is Data?
Data (plural of the word Datum) - Data is a factual
information used as a basis for reasoning, discussion, or
calculation
Data may be numerical data which may be integers or
floating point numbers, and non-numerical data such as
characters, date and etc. Data by itself normally doesnt
have a meaning associated with it.
e.g:Krishnan
01-Jan-71
15-Jun-05
50000
Information
Related data is often called as information. Information will
always have a meaning and context attached to the data
element. Let us consider the same example that we gave
for data. When we add meaning and context to the data it
becomes information:
Employee Name: Krishnan

Date of Birth: 01-Jan-71
Data of Joining: 15-Jun-05
Salary: 50000
Department Number: 10
Database
A logically coherent collection of related data
(information) with inherent meaning, built for a certain
application, and representing a subset of the "real-world".
For e.g. a customer database in your bank, details of the
insurance policies that we hold etc.
Evolution of Databases
File Systems (1950s)

Basic Constructs
Sequential records
A record contains sequential fields
Relies on indexes for random access
ISAM (Index Sequential Access Method)
VSAM (Virtual Storage Access Method)
Basic operation
Open, close, or reset a file
Read, write or delete a record
Disadvantages of File processing

system
Data redundancy and inconsistency
Difficulty in accessing data
Data isolation
Integrity problems
Atomicity problem
Concurrent access anomalies
Security problems
Hierarchical Model (1960s)

Data are represented by collection of records and the
relationship among data are represented by links, which
can be viewed as pointers.
A record is a collection of fields (attributes), each of which
contains only one data value.
A link is an association between precisely two records.
Records are organized as collection of rooted trees.
Hierarchical Model
Again consider the database representing a customeraccount relationship in a banking system. There are two
record types, customer and account.
type customer = record
customer name: string;
customer street: string;
customer city: string;
end
The account record type can be defined as

type account = record
account number: string;
balance: integer;
end
Hierarchical Model
Network Model (1960s)

Data are represented by collection of records and the
relationship among data are represented by links, which
can be viewed as pointers.
Record is a collection of fields (attributes), each of which
contains only one data
Link is an association between precisely two records.
Records are organized as arbitrary graphs.
Network Model
consider a database representing a customer-account
relationship in a banking system. There are two record
types, customer and account.
type customer = record
customer name: string;
customer street: string;
customer city: string;
end
The account record type can be defined as

type account = record
account number: string;
balance: integer;
end
Network Model
Relational Model (1980s)

Data is presented as a collection of relations
Each relation is depicted as a table
Columns are attributes
Rows ("Tuples") represent entities
Every table has one or more set of attributes that taken
together as a "key" uniquely identifies each entity
Customer
Account
emp_no
name
street
city
emp_no
ac_no
balance
E1
Hayes
Main
harrison
E1
A-102
400
E2
Jhonson
Alma
Palo alto
E2
A-101
500
E3
Turner
Putnam
stamford
E2
A-201
900
E3
A-304
300
Relational Model
Relationship is established with the help of keys
emp_no
name
street
city
E1
Hayes
Main
harrison
E2
Jhonson
Alma
Palo alto
E3
Turner
Putnam
stamford
emp_no
ac_no
balance
E1
A-102
400
E2
A-101
500
E2
A-201
900
E3
A-304
300
Database Management System

(DBMS)
A database management system (DBMS) is software that
allows databases to be defined, constructed, and
manipulated.
A collection of programs that enables you to store, modify,
and extract information from a database.
The general purpose of a DBMS is to provide for the
definition, storage, and management of data in a
centralized area that can be shared by many users.
Purpose of a DBMS
DBMS developed to handle the following difficulties:
Data redundancy and inconsistency

Difficulty in accessing data
Data isolation - multiple files and formats
Integrity problems
Atomicity of updates
Concurrent access by multiple users
Security problems
Functionality of a DBMS
Specifying the database structure
data definition language
Manipulation of the database

query processing and query optimization
Integrity enforcement
integrity constraints
Concurrent control
multiple user environment
Crash recovery
Security and authorization
DBMS Approach
Data independence
capability of changing a database scheme without having
to change the scheme at the next higher level
two level of data independence
Data Abstraction
Physical Level describes how data are actually stored
Logical Level describes what data are stored
View Level
describes only a part of the database
useful for a particular user
View 1
View 2
View n
Logical level
Physical level
Database Languages
data definition language (DDL) :
specify the conceptual database scheme
data manipulation language (DML): query language
used to retrieve and update information in a database
host language:
a conventional high level language used to write
application programs
DBMS Structure
Relations
Relations
Relations
Codds Rule for RDBMS

Rule 1: The Information Rule
All data should be presented to the user in table form
Rule 2: Guaranteed Access Rule

All data should be accessible without ambiguity.
This can be accomplished through a combination
of the table name, primary key, and column name.
Rule 3: Systematic Treatment of Null Values

A field should be allowed to remain empty. This
involves the support of a null value, which is distinct
from an empty string or a number with a value of
zero. Of course, this can't apply to primary keys. In
addition, most database implementations support the
concept of a not null field constraint that prevents null
values in a specific table column.
Rule 4: Dynamic On-Line Catalog Based on the Relational Model
A relational database must provide access to its structure
through the same tools that are used to access the data.
This is usually accomplished by storing the structure
definition within special system tables.
Rule 5: Comprehensive Data Sub-language Rule

The database must support at least one clearly defined
language that includes functionality for data definition, data
manipulation, data integrity, and database transaction
control. All commercial relational databases use forms of
the standard SQL (Structured Query Language) as their
supported comprehensive language.
Rule 6: View Updating Rule
Data can be presented to the user in different logical
combinations, called views. Each view should support the
same full range of data manipulation that direct-access to a
table has available. In practice, providing update and delete
access to logical views is difficult and is not fully supported
by any current database.
Rule 7: High-level Insert, Update, and Delete

Data can be retrieved from a relational database in sets
constructed of data from multiple rows and/or multiple
tables. This rule states that insert, update, and delete
operations should be supported for any retrievable set
rather than just for a single row in a single table.
Rule 8: Physical Data Independence
The user is isolated from the physical method of storing and
retrieving information from the database. Changes can be
made to the underlying architecture ( hardware, disk
storage methods ) without affecting how the user accesses
it.
Rule 9: Logical Data Independence

How a user views data should not change when the logical
structure (tables structure) of the database changes. This
rule is particularly difficult to satisfy. Most databases rely on
strong ties between the user view of the data and the actual
structure of the underlying tables.
Rule 10: Integrity Independence
The database language (like SQL) should support
constraints on user input that maintain database integrity.
This rule is not fully implemented by most major vendors.
At a minimum, all databases do preserve two constraints
through SQL.
-- No component of a primary key can have a null value
-- If a foreign key is defined in one table, any value in it must exist as
a primary key in another table
Rule 11: Distribution Independence

A user should be totally unaware of whether or not the
database is distributed (whether parts of the database exist
in multiple locations). A variety of reasons make this rule
difficult to implement; I will spend time addressing these
reasons when we discuss distributed databases.
Rule 12: Non-subversion Rule
There should be no way to modify the database structure
other than through the multiple row database language (like
SQL). Most databases today support administrative tools
that allow some direct manipulation of the data Structure.
Functional dependency
Given a relation R, a set of attributes X in R is said to

functionally determine another attribute Y, also in R, (written
X Y) if and only if each X value is associated with at most
one Y value. Customarily we call X the determinant set and Y
the dependent attribute.
Properties of functional dependencies
Given that X, Y, and Z are sets of attributes in a relation R,

one can derive several properties of functional dependencies.
Among the most important are Armstrong's axioms
Reflexivity: If Y is a subset of X, then X Y
Augmentation: If X Y, then XZ YZ
Transitivity: If X Y and Y Z, then X Z
Closure of a set of Functional

Dependencies
Given a set F of functional dependencies, there are certain
other functional dependencies that are logically implied by
F.
The set of all functional dependencies logically implied by
F is the closure of F denoted by F+.
F+ can be found by applying Armstrongs Axioms:

Dependencies
We can further simplify the computation of F+ by using

additional
rules:

Dependencies(example)
R=(A, B, C, G, H, I)
F= { A ---- > B
A ---- > C
CG ---- > H
CG ---- > I
B ---- > H }
Some members of F+:
A ---- > H (transitivity A ---- > B, B ---- > H)
AG ---- > I (pseudotransitivity A ---- > C, CG ---- > I)
CG ---- > HI (union CG ---- > H, CG ---- > I)
NORMALIZATION
Normalization
A company obtains parts from a number of suppliers. Each supplier is
located in one city. A city can have more than one supplier located there
and each city has a status code associated with it. Each supplier may
provide many parts. The company creates a simple relational table to
store this information that can be expressed in relational notation as:
FIRST (s#, status, city, p#, qty)
S#
Status
City
P#
qty
S1
20
London
p1,p2,p3
100,200,50
S2
10
Paris
p1,p2
200,100
S3
10
Paris
p1
200
S4
20
London
p4,p5
100,50
First Normal Form (1 NF)

A relational table, by definition, is in first normal form if all values of
the columns are atomic. That is, they contain no repeating values
1 NF Anomalies
contains redundant data. For example, information about the
supplier's location and the location's status have to be
repeated for every part supplied.
Redundancy causes what are called update anomalies.
INSERT. The fact that a certain supplier (s5) is located in a

particular city (Athens) cannot be added until they supplied a part.
DELETE. If a row is deleted, then not only is the information
about quantity and part lost but also information about the
supplier.
UPDATE. If supplier s1 moved from London to New York, then six
rows would have to be updated with this new information.
Second Normal Form (2 NF)
A relational table is in second normal form 2NF if it is in 1NF and

every non-key column is fully dependent upon the primary key.
2 NF
functional dependencies in the First table:

s# > city, status
city > status
(s#,p#) >qty
2 NF
The process for transforming a 1NF table to 2NF is:
1. Identify any determinants other than the composite key, and the
columns they determine.
2. Create and name a new table for each determinant and the
unique columns it determines.
3. Move the determined columns from the original table to the new
table. The determinate becomes the primary key of the new
table.
4. Delete the columns you just moved from the original table except
for the determinate which will serve as a foreign key.
5. The original table may be renamed to maintain semantic
meaning.
2 NF
To transform FIRST into 2NF we move the columns s#,
status, and city to a new table called SECOND. The column
s# becomes the primary key of this new table
2 NF Anomalies
Tables in 2NF but not in 3NF still contain modification
anomalies. In the example of SECOND, they are:
INSERT. The fact that a particular city has a certain status

(Rome has a status of 50) cannot be inserted until there is
a supplier in the city.
DELETE. Deleting any row in SUPPLIER destroys the

status information about the city as well as the association
between supplier and city.
Third Normal Form (3 NF)
A relational table is in third normal form (3NF) if it is already in 2NF

and every non-key column is non transitively dependent upon its
primary key. In other words, all nonkey attributes are functionally
dependent only upon the primary key.
3 NF
Table PARTS is already in 3NF. The non-key column, qty,

is fully dependent upon the primary key (s#, p#).
SUPPLIER is in 2NF but not in 3NF because it contains

the following transitive dependency.
s# > status
s# > city
city > status
3 NF
The process of transforming a table into 3NF is:
1. Identify any determinants, other the primary key, and the columns
they determine.
2. Create and name a new table for each determinant and the
unique columns it determines.
3. Move the determined columns from the original table to the new
table. The determinate becomes the primary key of the new table.
4. Delete the columns you just moved from the original table except
for the determinate which will serve as a foreign key.
5. The original table may be renamed to maintain semantic
meaning.
3 NF
create a new table called CITY_STATUS and move the columns city
and status into it. Status is deleted from the original table, city is left
behind to serve as a foreign key to CITY_STATUS, and the original
table is renamed to SUPPLIER_CITY to reflect its semantic meaning
3 NF
The results of putting the original table into 3NF has
created three tables. These can be represented in
"psuedo-SQL" as:
PARTS (#s, p#, qty)
Primary Key (s#,#p)
Foreign Key (s#) references SUPPLIER_CITY.s#
SUPPLIER_CITY(s#, city)
Primary Key (s#)
Foreign Key (city) references CITY_STATUS.city
CITY_STATUS (city, status)
Primary Key (city)
Advantages of Third Normal Form

The advantage of having relational tables in 3NF is that it
eliminates redundant data which in turn saves space and
reduces manipulation anomalies. For example, the
improvements to our sample database are:
INSERT. Facts about the status of a city, Rome has a status of 50,
can be added even though there is not supplier in that city.
Likewise, facts about new suppliers can be added even though
they have not yet supplied parts.
DELETE. Information about parts supplied can be deleted without

destroying information about a supplier or a city. UPDATE.
Changing the location of a supplier or the status of a city requires
modifying only one row.
Boyce/Codd Normal Form
Determinant: an attribute or a group of attributes on which

some other attribute is fully functionally dependent.
Boyce/Codd Normal Form: a relation is in BCNF if and

only if every determinant is a candidate key.
www.igate.com

Introduction To RDBMS Day - 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To RDBMS Day - 1

Uploaded by

Copyright:

Available Formats

Introduction to RDBMS

Employee Name: Krishnan

File Systems (1950s)

Disadvantages of File processing

Hierarchical Model (1960s)

The account record type can be defined as

Network Model (1960s)

The account record type can be defined as

Relational Model (1980s)

Database Management System

Data redundancy and inconsistency

Manipulation of the database

Codds Rule for RDBMS

Rule 2: Guaranteed Access Rule

Rule 3: Systematic Treatment of Null Values

Rule 5: Comprehensive Data Sub-language Rule

Rule 7: High-level Insert, Update, and Delete

Rule 9: Logical Data Independence

Rule 11: Distribution Independence

Given a relation R, a set of attributes X in R is said to

Properties of functional dependencies

Given that X, Y, and Z are sets of attributes in a relation R,

Closure of a set of Functional

Closure of a set of Functional

We can further simplify the computation of F+ by using

Closure of a set of Functional

First Normal Form (1 NF)

INSERT. The fact that a certain supplier (s5) is located in a

Second Normal Form (2 NF)

A relational table is in second normal form 2NF if it is in 1NF and

functional dependencies in the First table:

INSERT. The fact that a particular city has a certain status

DELETE. Deleting any row in SUPPLIER destroys the

Third Normal Form (3 NF)

A relational table is in third normal form (3NF) if it is already in 2NF

Table PARTS is already in 3NF. The non-key column, qty,

SUPPLIER is in 2NF but not in 3NF because it contains

Advantages of Third Normal Form

DELETE. Information about parts supplied can be deleted without

Boyce/Codd Normal Form

Determinant: an attribute or a group of attributes on which

Boyce/Codd Normal Form: a relation is in BCNF if and

You might also like