Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 65

RDBMS CONCEPTS

RDBMS CONCEPTS
• What is DBMS?
• The DataBase Models
• What is RDBMS?

RDBMS CONCEPTS
What is DBMS?
• A brief definition might be:
 A STORE OF INFORMATION,
 HELD OVER A PERIOD OF TIME,
 IN COMPUTER-READABLE FORM.
• Typical examples ( A store of information)
 Information collected for the sake of making a statistical
analysis,
 e.g. the national census,
 a survey of cracks in a stretch of motorway.
 Operational and administrative information required for
running an organisation.
 In a commercial concern this will take the form of stock
records, personnel records, customer records, among others.

RDBMS CONCEPTS
• (held over a period of time) Because of the investment involved in
setting up a database, the expectation must be that it will
continue to be useful, over years rather than months. But the
relationship with time varies from one type of information to
another.
 Census information is collected on a particular date and stored
as a snapshot of the state of affairs when the survey was
taken. Information from later observations will be kept quite
separately, but appropriate comparisons may be made
provided that the framework remains consistent.
 Bibliographic or other textual databases are accumulated over
time - new material is added periodically but probably very
little will be removed. When designing such a database it will
be important to estimate and allow for the expected rate of
growth, and perhaps to ensure that the more recent
information is given some priority.

RDBMS CONCEPTS
• ( in computer-readable form)
• Information (often referred to in this context as data) has been
processed by computer for over 30 years, using a variety of
storage media. Some form of magnetic disc is likely to be used,
since discs currently provide the most cost-effective way of
holding large quantities of data while allowing fast access to any
individual item.
• Database handling techniques grew out of earlier and simpler file
processing techniques.
• A file consists of an ordered collection of records; a
database consists of two or more related files which we
may wish to process together in various different ways.

RDBMS CONCEPTS
• Computer storage and processing implies the use of software: in
the current context a DATABASE MANAGEMENT SYSTEM (DBMS).
 The function of the DBMS is to store and retrieve information
as required by applications programs or users sitting at
terminals, using the facilities provided by the computer
operating system.
 It is one of a number of software layers making computer
facilities available to users with perhaps comparatively little
technical expertise.
• Data definition.
 This includes describing:

 FILES
 RECORD STRUCTURES
 FIELD NAMES, TYPES and SIZES
 RELATIONSHIPS between records of different types
 Extra information to make searching efficient, e.g. INDEXES.
 Data entry and validation.

RDBMS CONCEPTS
• Validation may include:

• TYPE CHECKING
• RANGE CHECKING
• CONSISTENCY CHECKING

• In an interactive data entry system, errors should be detected


immediately - some can be prevented altogether by keyboard
monitoring - and recovery and re-entry permitted.

RDBMS CONCEPTS
• Updating involves:

• Record INSERTION
• Record MODIFICATION
• Record DELETION.
• At the same time any back-ground data such as indexes or
pointers from one record to another must be changed to maintain
consistency. Updating may take place interactively, or by
submission of a file of transaction records; handling these may
require a program of some kind to be written, either in a
conventional programming language (a host language, e.g.
COBOL or C) or in a language supplied by the DBMS for
constructing command files.

RDBMS CONCEPTS
• Data retrieval on the basis of selection criteria.
• For this purpose most systems provide a QUERY LANGUAGE with
which the characteristics of the required records may be specified.
Query languages differ enormously in power and sophistication
but a standard which is becoming increasingly common is based
on the so-called RELATIONAL operations. These allow:

• selection of records on the basis of particular field values.


• selection of particular fields from records to be displayed.
• linking together records from two different files on the basis of
matching field values.
• Arbitrary combinations of these operators on the files making up a
database can answer a very large number of queries without
requiring users to go into one record at a time processing.

RDBMS CONCEPTS
• Report definition.
• Most systems provide facilities for describing how summary
reports from the database are to be created and laid out on paper.
These may include obtaining:

• COUNTS
• TOTALS
• AVERAGES
• MAXIMUM and MINIMUM values
• over particular CONTROL FIELDS. Also specification of PAGE and
LINE LAYOUT, HEADINGS, PAGE-NUMBERING, and other narrative
to make the report comprehensible.

RDBMS CONCEPTS
• Security.
• This has several aspects:

• Ensuring that only those authorised to do so can see and modify


the data, generally by some extension of the password principle.
• Ensuring the consistency of the database where many users are
accessing and up-dating it simultaneously.
• Ensuring the existence and INTEGRITY of the database after
hardware or software failure. At the very least this involves
making provision for back-up and re-loading.

RDBMS CONCEPTS
• A database is a collection of information that is organised
so that it can easily be accessed, managed, and updated.
• A database engine may comply with a combination of any of the
following:
 The database is a collection of table, files or datasets.
 Each table is a collection of fields, columns or data items.
 One or more columns in each table may be selected as the
primary key.
 There may be additional unique keys or non-unique indexes to
assist in data retrieval.
 Columns may be fixed length or variable length.
 Records amy be fixed length or variable length.
 Table and column names may be restricted in length (8, 16 or
32 characters).
 Table and column names may be case-sensitive.

RDBMS CONCEPTS
• Why have a database (and a DBMS)?
 An organisation uses a computer to store and process
information because it hopes for speed, accuracy, efficiency,
economy etc. beyond what could be achieved using clerical
methods.

 the computer's processing speed gave a potential for


RELATING data from different sources to produce valuable
manage-ment information, provided that some standardisation
could be imposed over departmental boundaries.

 The idea emerged of the integrated database as a central


resource. Data is captured as close as possible to its point of
origin and transmitted to the database, then extracted by
anyone within the organisation who requires it.

RDBMS CONCEPTS
 The idea is that any piece of information is entered and
stored just once, eliminating duplications of effort and
the possibility of inconsistency between different
departmental records.

 Organisational requirements change over time, and


applications programs laboriously developed need to be
periodically adjusted.

 A DBMS gives some protection against change by taking


care of basic storage and retrieval functions in a
standard way, leaving the applications developer to
concentrate on specific organisational requirements.
Changes in one of these areas need not have
repercussions elsewhere.

RDBMS CONCEPTS
What is RDBMS?
• RDBMS stands for Relational Database Management System.
• RDBMS data is structured in database tables, fields and records.
• Each RDBMS table consists of database table rows.
• Each database table row consists of one or more database table
fields.
• RDBMS store the data into collection of tables, which might be
related by common fields (database table columns).
• RDBMS also provide relational operators to manipulate the data
stored into the database tables.
• Most RDBMS use SQL as database query language.
• Edgar Codd introduced the relational database model. Many
modern DBMS do not conform to the Codd’s definition of a
RDBMS, but nonetheless they are still considered to be RDBMS.

• The most popular RDBMS are MS SQL Server, DB2, Oracle and
MySQL.

RDBMS CONCEPTS
• For example a table called Users might
• store information about many persons, and each entry in this table will
• represent one unique user. Even though all user entries in the Users
table are
• unique, they are related in the sense that they describe similar objects.

• Table Users(Draw a table)

FirstName LastName DateOfBirth

John Smith 12/12/1969

David Stonewall 01/03/1954

Susan Grant 03/03/1970

RDBMS CONCEPTS
Data Models
• In order to provide a general and powerful set of facilities for its
users any DBMS imposes restraints on the way information can be
described and accessed, and demands familiarity with the DATA
MODEL which it supports and the command language which it
provides to define and manipulate data.
• Data models still in use are HIERARCHICAL (tree-structured),
NETWORK and RELATIONAL (tabular).
• Over the years there have been several different ways of
constructing databases, amongst which have been the following:
• The Hierarchical Data Model
• The Network Data Model
• The Relational Data Model

RDBMS CONCEPTS
Hierarchical Data Model
• The Hierarchical Data Model structures data in a tree of records,
with each record having one parent record and many children. It
can be represented as follows:
A hierarchical database consists
of the following:
It contains nodes connected by
branches.
The top node is called the root.
If multiple nodes appear at the
top level, the nodes are called
root segments.
The parent of node nx is a node
directly above nx and connected
to nx by a branch.

Each node (with the exception of the root) has exactly one parent.
The child of node nx is the node directly below nx and connected to nx
by a branch.
One parent may have many children.
RDBMS CONCEPTS
The Network Data Model

• The Network Data Model uses a lattice structure in which a record


can have many parents as well as many children. It can be
represented as follows:
Like the The Hierarchical Data
Model the Network Data Model
also consists of nodes and
branches, but a child may have
multiple parents within the
network structure instead of
being restricted to just one.

RDBMS CONCEPTS
• Both hierarchical and network databases suffered from the
following deficiencies (when compared with relational databases):
• Access to the database was not via SQL query strings, but by a
specific set of API's.
• It was not possible to provide a variable WHERE clause. The only
selection mechanism was to read entries from a child table for a
specific entry on a related parent table with any filtering being
done within the application code.
• It was not possible to provide an ORDER BY clause. Data was
presented in the order in which it existed in the database. This
mechanism could be tuned by specifying sort criteria to be used
when each record was inserted, but this had several
disadvantages:
 Only a single sort sequence could be defined for each path
(link to a parent), so all records retrieved on that path would
be provided in that sequence.
 It could make inserts rather slow when attempting to insert
into the middle of a large collection, or where a table had
multiple paths each with its own set of sort criteria.

RDBMS CONCEPTS
The Relational Data Model

• The Relational Data Model has the relation at its heart, but then a
whole series of rules governing keys, relationships, joins,
functional dependencies, transitive dependencies, multi-valued
dependencies, and modification anomalies.
• The Relation
• The Relation is the basic element in a relational data model.

Figure 3 - Relations in the Relational Data Model

RDBMS CONCEPTS
• A relation is subject to the following rules:
• Relation (file, table) is a two-dimensional table.
• Attribute (i.e. field or data item) is a column in the table.
• Each column in the table has a unique name within that table.
• Each column is homogeneous. Thus the entries in any column are
all of the same type (e.g. age, name, employee-number, etc).
• Each column has a domain, the set of possible values that can
appear in that column.
• A Tuple (i.e. record) is a row in the table.
• The order of the rows and columns is not important.
• Values of a row all relate to some thing or portion of a thing.
• Repeating groups (collections of logically related attributes that
occur multiple times within one record occurrence) are not
allowed.

RDBMS CONCEPTS
• Duplicate rows are not allowed (candidate keys are designed to
prevent this).
• Cells must be single-valued (but can be variable length). Single
valued means the following:
 Cannot contain multiple values such as 'A1,B2,C3'.
 Cannot contain combined values such as 'ABC-XYZ' where
'ABC' means one thing and 'XYZ' another.
• A relation may be expressed using the notation R(A,B,C, ...)
where:
• R = the name of the relation.
• (A,B,C, ...) = the attributes within the relation.
• A = the attribute(s) which form the primary key.

RDBMS CONCEPTS
• There are three levels of database design:
• Conceptual: producing a data model which accounts for the relevant
entities and relationships within the target application domain;
• Logical: ensuring, via normalisation procedures and the definition of
integrity rules, that the stored database will be non-redundant and
properly connected;
• Physical: specifying how database records are stored, accessed and related
to ensure adequate performance.
• It is considered desirable to keep these three levels quite separate -- one
of Codd's requirement for an RDBMS is that it should maintain logical-
physical data independence. The generality of the relational model means
that RDBMSs are potentially less efficient than those based on one of the
older data models where access paths were specified once and for all at
the design stage. However the relational data model does not preclude the
use of traditional techniques for accessing data - it is still essential to
exploit them to achieve adequate performance with a database of any size.

RDBMS CONCEPTS
Keys

• A simple key contains a single attribute.


• A composite key is a key that contains more than one attribute.
• A candidate key is an attribute (or set of attributes) that
uniquely identifies a row. A candidate key must possess the
following properties:
 Unique identification - For every row the value of the key must
uniquely identify that row.
 Non redundancy - No attribute in the key can be discarded
without destroying the property of unique identification.
• A primary key is the candidate key which is selected as the
principal unique identifier. Every relation must contain a primary
key. The primary key is usually the key selected to identify a row
when the database is physically implemented. For example, a part
number is selected instead of a part description.

RDBMS CONCEPTS
• A superkey is any set of attributes that uniquely identifies a row.
A superkey differs from a candidate key in that it does not require
the non redundancy property.
• A foreign key is an attribute (or set of attributes) that appears
(usually) as a non key attribute in one relation and as a primary
key attribute in another relation. I say usually because it is
possible for a foreign key to also be the whole or part of a primary
key:
 A many-to-many relationship can only be implemented by
introducing an intersection or link table which then becomes
the child in two one-to-many relationships. The intersection
table therefore has a foreign key for each of its parents, and
its primary key is a composite of both foreign keys.
 A one-to-one relationship requires that the child table has no
more than one occurrence for each parent, which can only be
enforced by letting the foreign key also serve as the primary
key.

RDBMS CONCEPTS
• A semantic or natural key is a key for which the possible values
have an obvious meaning to the user or the data. For example, a
semantic primary key for a COUNTRY entity might contain the
value 'USA' for the occurrence describing the United States of
America. The value 'USA' has meaning to the user.
• A technical or surrogate or artificial key is a key for which the
possible values have no obvious meaning to the user or the data.
These are used instead of semantic keys for any of the following
reasons:
 When the value in a semantic key is likely to be changed by
the user, or can have duplicates. For example, on a PERSON
table it is unwise to use PERSON_NAME as the key as it is
possible to have more than one person with the same name,
or the name may change such as through marriage.
 When none of the existing attributes can be used to guarantee
uniqueness. In this case adding an attribute whose value is
generated by the system, e.g from a sequence of numbers, is
the only way to provide a unique value. Typical examples
would be ORDER_ID and INVOICE_ID. The value '12345' has
no meaning to the user as it conveys nothing about the entity
to which it relates.

RDBMS CONCEPTS
• A key functionally determines the other attributes in the row, thus
it is always a determinant.
• Note that the term 'key' in most DBMS engines is implemented as
an index which does not allow duplicate entries.

RDBMS CONCEPTS
Relationships

• One table (relation) may be linked with another in what is known


as a relationship. Relationships may be built into the database
structure to facilitate the operation of relational joins at runtime.
• A relationship is between two tables in what is known as a one-
to-many or parent-child or master-detail relationship where
an occurrence on the 'one' or 'parent' or 'master' table may have
any number of associated occurrences on the 'many' or 'child' or
'detail' table. To achieve this the child table must contain fields
which link back the primary key on the parent table. These
fields on the child table are known as a foreign key, and the
parent table is referred to as the foreign table (from the
viewpoint of the child).
• It is possible for a record on the parent table to exist without
corresponding records on the child table, but it should not be
possible for an entry on the child table to exist without a
corresponding entry on the parent table.

RDBMS CONCEPTS
• A child record without a corresponding parent record is known as
an orphan.
• It is possible for a table to be related to itself. For this to be
possible it needs a foreign key which points back to the primary
key. Note that these two keys cannot be comprised of exactly the
same fields otherwise the record could only ever point to itself.
• A table may be the subject of any number of relationships, and it
may be the parent in some and the child in others.
• Some database engines allow a parent table to be linked via a
candidate key, but if this were changed it could result in the link
to the child table being broken.
• Some database engines allow relationships to be managed by
rules known as referential integrity or foreign key restraints.
These will prevent entries on child tables from being created if
the foreign key does not exist on the parent table, or will deal
with entries on child tables when the entry on the parent table is
updated or deleted.

RDBMS CONCEPTS
Relational Joins

• The join operator is used to combine data from two or more


relations (tables) in order to satisfy a particular query. Two
relations may be joined when they share at least one common
attribute. The join is implemented by considering each row in an
instance of each relation. A row in relation R1 is joined to a row in
relation R2 when the value of the common attribute(s) is equal in
the two relations. The join of two relations is often called a binary
join.
• The join of two relations creates a new relation.
• The join of relations R1 and R2 is possible if there is a common
attribute

RDBMS CONCEPTS
• Relations may share multiple common attributes. All of these
common attributes must be used in creating a join.
• The join operation provides a method for reconstructing a relation
that was decomposed into two relations during the normalisation
process.
• The join of two rows, however, can create a new row that was not
a member of the original relation.
• Thus invalid information can be created during the join process.

RDBMS CONCEPTS
• Lossless Joins
• A set of relations satisfies the lossless join property if the
instances can be joined without creating invalid data (i.e. new
rows). The term lossless join may be somewhat confusing. A join
that is not lossless will contain extra, invalid rows.
• A join that is lossless will not contain extra, invalid rows. Thus the
term gainless join might be more appropriate.

RDBMS CONCEPTS
Determinant and Dependent

• The terms determinant and dependent can be described as


follows:
• The expression X Y means 'if I know the value of X, then I can
obtain the value of Y' (in a table or somewhere).
• In the expression X Y, X is the determinant and Y is the
dependent attribute.
• The value X determines the value of Y.
• The value Y depends on the value of X.

RDBMS CONCEPTS
Functional Dependencies (FD)

• A functional dependency can be described as follows:


• An attribute is functionally dependent if its value is determined by
another attribute.
• That is, if we know the value of one (or several) data items, then
we can find the value of another (or several).
• Functional dependencies are expressed as X Y, where X is the
determinant and Y is the functionally dependent attribute.
• If A (B,C) then A B and A C.
• If (A,B) C, then it is not necessarily true that A C and
B C.
• If A B and B A, then A and B are in a 1-1 relationship.
• If A B then for A there can only ever be one value for B.

RDBMS CONCEPTS
Transitive Dependencies (TD)

• A transitive dependency can be described as follows:


• An attribute is transitively dependent if its value is determined by
another attribute which is not a key.
• If X Y and X is not a key then this is a transitive dependency.
• A transitive dependency exists when A B C but NOT A
C.

RDBMS CONCEPTS
Multi-Valued Dependencies (MVD)

• A multi-valued dependency can be described as follows:


• A table involves a multi-valued dependency if it may contain
multiple values for an entity.
• A multi-valued dependency may arise as a result of enforcing 1st
normal form.
• X Y, ie X multi-determines Y, when for each value of X we
can have more than one value of Y.
• If A B and A C then we have a single attribute A which
multi-determines two other independent attributes, B and C.
• If A (B,C) then we have an attribute A which multi-
determines a set of associated attributes, B and C.

RDBMS CONCEPTS
Join Dependencies (JD)

• A join dependency can be described as follows:


• If a table can be decomposed into three or more smaller tables, it
must be capable of being joined again on common keys to form
the original table.

RDBMS CONCEPTS
Modification Anomalies

• A major objective of data normalisation is to avoid modification


anomalies. These come in two flavours:

• An insertion anomaly is a failure to place information about a


new database entry into all the places in the database where
information about that new entry needs to be stored. In a
properly normalized database, information about a new entry
needs to be inserted into only one place in the database. In an
inadequately normalized database, information about a new entry
may need to be inserted into more than one place, and, human
fallibility being what it is, some of the needed additional insertions
may be missed.

RDBMS CONCEPTS
• A deletion anomaly is a failure to remove information about an
existing database entry when it is time to remove that entry. In a
properly normalized database, information about an old, to-be-
gotten-rid-of entry needs to be deleted from only one place in the
database. In an inadequately normalized database, information
about that old entry may need to be deleted from more than one
place, and, human fallibility being what it is, some of the needed
additional deletions may be missed.
• An update of a database involves modifications that may be
additions, deletions, or both. Thus 'update anomalies' can be
either of the kinds of anomalies discussed above.

• All three kinds of anomalies are highly undesirable, since their


occurrence constitutes corruption of the database. Properly
normalised databases are much less susceptible to corruption than
are unnormalised databases.

RDBMS CONCEPTS
Types of Relational Join

• A JOIN is a method of creating a result set that combines rows


from two or more tables (relations). When comparing the contents
of two tables the following conditions may occur:

• Every row in one relation has a match in the other relation.

• Relation R1 contains rows that have no match in relation R2.

• Relation R2 contains rows that have no match in relation R1.

• INNER joins contain only matches. OUTER joins may contain


mismatches as well.

RDBMS CONCEPTS
Inner Join

• This is sometimes known as a simple join. It returns all rows from


both tables where there is a match. If there are rows in R1 which
do not have matches in R2, those rows will not be listed. There are
two possible ways of specifying this type of join:

• SELECT * FROM R1, R2 WHERE R1.r1_field = R2.r2_field;


SELECT * FROM R1 INNER JOIN R2 ON R1.field = R2.r2_field
• If the fields to be matched have the same names in both tables
then the ON condition, as in:

• ON R1.fieldname = R2.fieldname ON (R1.field1 = R2.field1


AND R1.field2 = R2.field2) can be replaced by the shorter
USING condition, as in:

• USING fieldname USING (field1, field2)

RDBMS CONCEPTS
Natural Join

• A natural join is based on all columns in the two tables that have
the same name. It is semantically equivalent to an INNER JOIN or
a LEFT JOIN with a USING clause that names all columns that
exist in both tables.

• SELECT * FROM R1 NATURAL JOIN R2

• The alternative is a keyed join which includes an ON or USING


condition.

RDBMS CONCEPTS
• Left [Outer] Join
• Returns all the rows from R1 even if there are no matches in R2.
If there are no matches in R2 then the R2 values will be shown as
null.
• SELECT * FROM R1 LEFT [OUTER] JOIN R2 ON R1.field =
R2.field Right [Outer] Join
• Returns all the rows from R2 even if there are no matches in R1.
If there are no matches in R1 then the R1 values will be shown as
null.
• SELECT * FROM R1 RIGHT [OUTER] JOIN R2 ON R1.field =
R2.field Full [Outer] Join
• Returns all the rows from both tables even if there are no matches
in one of the tables. If there are no matches in one of the tables
then its values will be shown as null.
• SELECT * FROM R1 FULL [OUTER] JOIN R2 ON R1.field =
R2.field

RDBMS CONCEPTS
Self Join

• This joins a table to itself. This table appears twice in the FROM
clause and is followed by table aliases that qualify column names
in the join condition.

• SELECT a.field1, b.field2 FROM R1 a, R1 b WHERE a.field =


b.field Cross Join

• This type of join is rarely used as it does not have a join condition,
so every row of R1 is joined to every row of R2. For example, if
both tables contain 100 rows the result will be 10,000 rows. This
is sometimes known as a cartesian product and can be specified
in either one of the following ways:

• SELECT * FROM R1 CROSS JOIN R2 SELECT * FROM R1, R2

RDBMS CONCEPTS
• Entity-Relationship Diagram (ERD)
• An entity-relationship diagram (ERD) is a data modeling technique
that creates a graphical representation of the entities, and the
relationships between entities, within an information system. Any
ER diagram has an equivalent relational table, and any relational
table has an equivalent ER diagram. ER diagramming is an
invaluable aid to engineers in the design, optimization, and
debugging of database programs.
• The entity is a person, object, place or event for which data is
collected. It is equivalent to a database table. An entity can be
defined by means of its properties, called attributes. For example,
the CUSTOMER entity may have attributes for such things as
name, address and telephone number.

RDBMS CONCEPTS
• The relationship is the interaction between the entities. It can be
described using a verb such as:
 A customer places an order.
 A sales rep serves a customer.
 A order contains a product.
 A warehouse stores a product.
• In an entity-relationship diagram entities are rendered as
rectangles, and relationships are portrayed as lines connecting the
rectangles.
• One way of indicating which is the 'one' or 'parent' and which is
the 'many' or 'child' in the relationship is to use an arrowhead
• Figure 4 - One-to-Many relationship using arrowhead notation

RDBMS CONCEPTS
• This can produce an ERD as shown

ERD with arrowhead notation


                                                                            

RDBMS CONCEPTS
• Another method is to replace the arrowhead with a crow's-foot, as
shown
• Figure 6 - One-to-Many relationship using crow's-foot notation

• The relating line can be enhanced to indicate cardinality which defines


the relationship between the entities in terms of numbers. An entity
may be optional (zero or more) or it may be mandatory (one or more).
• A single bar indicates one.
• A double bar indicates one and only one.
• A circle indicates zero.
• A crowsfoot or arrowhead indicates many

RDBMS CONCEPTS
• As well as using lines and circles the cardinality can be expressed
using numbers, as in:
• One-to-One expressed as 1:1
• Zero-to-Many expressed as 0:M
• One-to-Many expressed as 1:M
• Many-to-Many expressed as N:M
• This can produce an ERD as shown
ERD with crow's-foot notation and cardinality

                                                                   
         

RDBMS CONCEPTS
• In plain language the relationships can be expressed as follows:
• 1 instance of a SALES REP serves 1 to many CUSTOMERS
• 1 instance of a CUSTOMER places 1 to many ORDERS
• 1 instance of an ORDER lists 1 to many PRODUCTS
• 1 instance of a WAREHOUSE stores 0 to many PRODUCTS

RDBMS CONCEPTS
Data Normalisation
• Relational database theory, and the principles of normalisation,
were first constructed by people with a strong mathematical
background. They wrote about databases using terminology which
was not easily understood outside those mathematical circles.
Below is an attempt to provide understandable explanations.
• Data normalisation is a set of rules and techniques concerned
with:
• Identifying relationships among attributes.
• Combining attributes to form relations.
• Combining relations to form a database.
• It follows a set of rules worked out by E F Codd in 1970. A
normalised relational database provides several benefits:
• Elimination of redundant data storage.
• Close modeling of real world entities, processes, and their
relationships.
• Structuring of data so that the model is flexible.

RDBMS CONCEPTS
• Decompose relations such that each non key attribute is
dependent on all the attributes in the key.
• Combine relations with identical primary keys (2nd normal form).
• Identify all transitive dependencies.
 Check relations for dependencies of one non key attribute with
another non key attribute.
 Check for dependencies within each primary key (i.e.
dependencies of one attribute in the key on other attributes
within the key).
• The guidelines for developing relations in 3rd Normal Form can be
summarised as follows:
• Define the attributes.
• Group logically related attributes into relations.
• Identify candidate keys for each relation.
• Select a primary key for each relation.
• Identify and remove repeating groups.
• Combine relations with identical keys (1st normal form).
• Identify all functional dependencies.

RDBMS CONCEPTS
• Decompose relations such that there are no transitive
dependencies.
• Combine relations with identical primary keys (3rd normal
form) if there are no transitive dependencies.

RDBMS CONCEPTS
• 1st Normal Form
• A table is in first normal form if all the key attributes have been
defined and it contains no repeating groups. Taking the ORDER
entity in figure below as an example we could end up with a set of
attributes like this:

• order_id customer_id product1 product2 product3


• 123 456 abc1 def1 ghi1
• 456 789 abc2  
• This structure creates the following problems:
• Order 123 has no room for more than 3 products.
• Order 456 has wasted space for product2 and product3.
• In order to create a table that is in first normal form we must
extract the repeating groups and place them in a separate table,
which I shall call ORDER_LINE.
• ORDER
• order_id customer_id
• 123 456
• 456 789

RDBMS CONCEPTS
• I have removed 'product1', 'product2' and 'product3', so there are
no repeating groups.
• ORDER_LINE
• order_id product
• 123 abc1
• 123 def1
• 123 ghi1
• 456 abc2

• Each row contains one product for one order, so this allows an
order to contain any number of products.

RDBMS CONCEPTS
• This results in a new version of the ERD, as shown in figure :
• Figure - ERD with ORDER and ORDER_LINE

• The new relationships can be expressed as follows:


• 1 instance of an ORDER has 1 to many ORDER LINES
• 1 instance of a PRODUCT has 0 to many ORDER LINES

RDBMS CONCEPTS
• 2nd Normal Form
• A table is in second normal form (2NF) if and only if it is in 1NF
and every non key attribute is fully functionally dependent on the
whole of the primary key (i.e. there are no partial dependencies).
• Anomalies can occur when attributes are dependent on only part
of a multi-attribute (composite) key.
• A relation is in second normal form when all non-key attributes
are dependent on the whole key. That is, no attribute is
dependent on only a part of the key.
• Any relation having a key with a single attribute is in second
normal form.
• Take the following table structure as an example:
• order(order_id, cust, cust_address, cust_contact,
order_date, order_total)
• Here we should realise that cust_address and cust_contact are
functionally dependent on cust but not on order_id, therefore
they are not dependent on the whole key. To make this table 2NF
these attributes must be removed and placed somewhere else.

RDBMS CONCEPTS
• 3rd Normal Form
• A table is in third normal form (3NF) if and only if it is in 2NF and
every non key attribute is non transitively dependent on the
primary key (i.e. there are no transitive dependencies).
• Anomalies can occur when a relation contains one or more
transitive dependencies.
• A relation is in 3NF when it is in 2NF and has no transitive
dependencies.
• A relation is in 3NF when 'All non-key attributes are dependent on
the key, the whole key and nothing but the key'.
• Take the following table structure as an example:
• order(order_id, cust, cust_address, cust_contact,
order_date, order_total)
• Here we should realise that cust_address and cust_contact are
functionally dependent on cust which is not a key. To make this
table 3NF these attributes must be removed and placed
somewhere else.

RDBMS CONCEPTS
• You must also note the use of calculated or derived fields. Take
the example where a table contains PRICE, QUANTITY and
EXTENDED_PRICE where EXTENDED_PRICE is calculated as
QUANTITY multiplied by PRICE. As one of these values can be
calculated from the other two then it need not be held in the
database table. Do not assume that it is safe to drop any one of
the three fields as a difference in the number of decimal places
between the various fields could lead to different results due to
rounding errors. For example, take the following fields:
• AMOUNT - a monetary value in home currency, to 2 decimal
places.
• EXCH_RATE - exchange rate, to 9 decimal places.
• CURRENCY_AMOUNT - amount expressed in foreign currency,
calculated as AMOUNT multiplied by EXCH_RATE.
• If you were to drop EXCH_RATE could it be calculated back to its
original 9 decimal places?
• Reaching 3NF is is adequate for most practical needs, but there
may be circumstances which would benefit from further
normalisation.

RDBMS CONCEPTS
• Boyce-Codd Normal Form
• A table is in Boyce-Codd normal form (BCNF) if and only if it is in
3NF and every determinant is a candidate key.
• Anomalies can occur in relations in 3NF if there is a composite key
in which part of that key has a determinant which is not itself a
candidate key.
• This can be expressed as R(A,B,C), C A where:
 The relation contains attributes A, B and C.
 A and B form a candidate key.
 C is the determinant for A (A is functionally dependent on C).
 C is not part of any key.
• Anomalies can also occur where a relation contains several
candidate keys where:
 The keys contain more than one attribute (they are composite
keys).
 An attribute is common to more than one key.

RDBMS CONCEPTS
• Take the following table structure as an example:
• schedule(campus, course, class, time, room/bldg)
• Take the following sample data:
• Campus course class time room/bldg
• East English 101 1 8:00-9:00 212AYE
• East English 101 2 10:00-11:00 305RFK
• West English 101 3 8:00-9:00 102PPR

• Note that no two buildings on any of the university campuses


have the same name, thus ROOM/BLDG CAMPUS. As the
determinant is not a candidate key this table is NOT in Boyce-
Codd normal form.
• This table should be decomposed into the following relations:
• R1(course, class, room/bldg, time)
• R2(room/bldg, campus)

RDBMS CONCEPTS
• As another example take the following structure:
• enrol(student#, s_name, course#, c_name, date_enrolled)
• This table has the following candidate keys:
• (student#, course#)
• (student#, c_name)
• (s_name, course#) - this assumes that s_name is a unique
identifier
• (s_name, c_name) - this assumes that c_name is a unique
identifier
• The relation is in 3NF but not in BCNF because of the following
dependencies:
• student# s_name
• course# c_name

RDBMS CONCEPTS
• 4th Normal Form
• A table is in fourth normal form (4NF) if and only if it is in BCNF
and contains no more than one multi-valued dependency.
• Anomalies can occur in relations in BCNF if there is more than one
multi-valued dependency.
• If A B and A C but B and C are unrelated, ie A (B,C) is false, then
we have more than one multi-valued dependency.
• A relation is in 4NF when it is in BCNF and has no more than one
multi-valued dependency.

RDBMS CONCEPTS
De-Normalisation

• Denormalisation is the process of modifying a perfectly normalised


database design for performance reasons. Denormalisation is a
natural and necessary part of database design, but must follow
proper normalisation.

RDBMS CONCEPTS

You might also like