Professional Documents
Culture Documents
Rdbms Concepts
Rdbms Concepts
RDBMS CONCEPTS
• What is DBMS?
• The DataBase Models
• What is RDBMS?
RDBMS CONCEPTS
What is DBMS?
• A brief definition might be:
A STORE OF INFORMATION,
HELD OVER A PERIOD OF TIME,
IN COMPUTER-READABLE FORM.
• Typical examples ( A store of information)
Information collected for the sake of making a statistical
analysis,
e.g. the national census,
a survey of cracks in a stretch of motorway.
Operational and administrative information required for
running an organisation.
In a commercial concern this will take the form of stock
records, personnel records, customer records, among others.
RDBMS CONCEPTS
• (held over a period of time) Because of the investment involved in
setting up a database, the expectation must be that it will
continue to be useful, over years rather than months. But the
relationship with time varies from one type of information to
another.
Census information is collected on a particular date and stored
as a snapshot of the state of affairs when the survey was
taken. Information from later observations will be kept quite
separately, but appropriate comparisons may be made
provided that the framework remains consistent.
Bibliographic or other textual databases are accumulated over
time - new material is added periodically but probably very
little will be removed. When designing such a database it will
be important to estimate and allow for the expected rate of
growth, and perhaps to ensure that the more recent
information is given some priority.
RDBMS CONCEPTS
• ( in computer-readable form)
• Information (often referred to in this context as data) has been
processed by computer for over 30 years, using a variety of
storage media. Some form of magnetic disc is likely to be used,
since discs currently provide the most cost-effective way of
holding large quantities of data while allowing fast access to any
individual item.
• Database handling techniques grew out of earlier and simpler file
processing techniques.
• A file consists of an ordered collection of records; a
database consists of two or more related files which we
may wish to process together in various different ways.
RDBMS CONCEPTS
• Computer storage and processing implies the use of software: in
the current context a DATABASE MANAGEMENT SYSTEM (DBMS).
The function of the DBMS is to store and retrieve information
as required by applications programs or users sitting at
terminals, using the facilities provided by the computer
operating system.
It is one of a number of software layers making computer
facilities available to users with perhaps comparatively little
technical expertise.
• Data definition.
This includes describing:
FILES
RECORD STRUCTURES
FIELD NAMES, TYPES and SIZES
RELATIONSHIPS between records of different types
Extra information to make searching efficient, e.g. INDEXES.
Data entry and validation.
RDBMS CONCEPTS
• Validation may include:
• TYPE CHECKING
• RANGE CHECKING
• CONSISTENCY CHECKING
RDBMS CONCEPTS
• Updating involves:
• Record INSERTION
• Record MODIFICATION
• Record DELETION.
• At the same time any back-ground data such as indexes or
pointers from one record to another must be changed to maintain
consistency. Updating may take place interactively, or by
submission of a file of transaction records; handling these may
require a program of some kind to be written, either in a
conventional programming language (a host language, e.g.
COBOL or C) or in a language supplied by the DBMS for
constructing command files.
RDBMS CONCEPTS
• Data retrieval on the basis of selection criteria.
• For this purpose most systems provide a QUERY LANGUAGE with
which the characteristics of the required records may be specified.
Query languages differ enormously in power and sophistication
but a standard which is becoming increasingly common is based
on the so-called RELATIONAL operations. These allow:
RDBMS CONCEPTS
• Report definition.
• Most systems provide facilities for describing how summary
reports from the database are to be created and laid out on paper.
These may include obtaining:
• COUNTS
• TOTALS
• AVERAGES
• MAXIMUM and MINIMUM values
• over particular CONTROL FIELDS. Also specification of PAGE and
LINE LAYOUT, HEADINGS, PAGE-NUMBERING, and other narrative
to make the report comprehensible.
RDBMS CONCEPTS
• Security.
• This has several aspects:
RDBMS CONCEPTS
• A database is a collection of information that is organised
so that it can easily be accessed, managed, and updated.
• A database engine may comply with a combination of any of the
following:
The database is a collection of table, files or datasets.
Each table is a collection of fields, columns or data items.
One or more columns in each table may be selected as the
primary key.
There may be additional unique keys or non-unique indexes to
assist in data retrieval.
Columns may be fixed length or variable length.
Records amy be fixed length or variable length.
Table and column names may be restricted in length (8, 16 or
32 characters).
Table and column names may be case-sensitive.
RDBMS CONCEPTS
• Why have a database (and a DBMS)?
An organisation uses a computer to store and process
information because it hopes for speed, accuracy, efficiency,
economy etc. beyond what could be achieved using clerical
methods.
RDBMS CONCEPTS
The idea is that any piece of information is entered and
stored just once, eliminating duplications of effort and
the possibility of inconsistency between different
departmental records.
RDBMS CONCEPTS
What is RDBMS?
• RDBMS stands for Relational Database Management System.
• RDBMS data is structured in database tables, fields and records.
• Each RDBMS table consists of database table rows.
• Each database table row consists of one or more database table
fields.
• RDBMS store the data into collection of tables, which might be
related by common fields (database table columns).
• RDBMS also provide relational operators to manipulate the data
stored into the database tables.
• Most RDBMS use SQL as database query language.
• Edgar Codd introduced the relational database model. Many
modern DBMS do not conform to the Codd’s definition of a
RDBMS, but nonetheless they are still considered to be RDBMS.
• The most popular RDBMS are MS SQL Server, DB2, Oracle and
MySQL.
RDBMS CONCEPTS
• For example a table called Users might
• store information about many persons, and each entry in this table will
• represent one unique user. Even though all user entries in the Users
table are
• unique, they are related in the sense that they describe similar objects.
RDBMS CONCEPTS
Data Models
• In order to provide a general and powerful set of facilities for its
users any DBMS imposes restraints on the way information can be
described and accessed, and demands familiarity with the DATA
MODEL which it supports and the command language which it
provides to define and manipulate data.
• Data models still in use are HIERARCHICAL (tree-structured),
NETWORK and RELATIONAL (tabular).
• Over the years there have been several different ways of
constructing databases, amongst which have been the following:
• The Hierarchical Data Model
• The Network Data Model
• The Relational Data Model
RDBMS CONCEPTS
Hierarchical Data Model
• The Hierarchical Data Model structures data in a tree of records,
with each record having one parent record and many children. It
can be represented as follows:
A hierarchical database consists
of the following:
It contains nodes connected by
branches.
The top node is called the root.
If multiple nodes appear at the
top level, the nodes are called
root segments.
The parent of node nx is a node
directly above nx and connected
to nx by a branch.
Each node (with the exception of the root) has exactly one parent.
The child of node nx is the node directly below nx and connected to nx
by a branch.
One parent may have many children.
RDBMS CONCEPTS
The Network Data Model
RDBMS CONCEPTS
• Both hierarchical and network databases suffered from the
following deficiencies (when compared with relational databases):
• Access to the database was not via SQL query strings, but by a
specific set of API's.
• It was not possible to provide a variable WHERE clause. The only
selection mechanism was to read entries from a child table for a
specific entry on a related parent table with any filtering being
done within the application code.
• It was not possible to provide an ORDER BY clause. Data was
presented in the order in which it existed in the database. This
mechanism could be tuned by specifying sort criteria to be used
when each record was inserted, but this had several
disadvantages:
Only a single sort sequence could be defined for each path
(link to a parent), so all records retrieved on that path would
be provided in that sequence.
It could make inserts rather slow when attempting to insert
into the middle of a large collection, or where a table had
multiple paths each with its own set of sort criteria.
RDBMS CONCEPTS
The Relational Data Model
• The Relational Data Model has the relation at its heart, but then a
whole series of rules governing keys, relationships, joins,
functional dependencies, transitive dependencies, multi-valued
dependencies, and modification anomalies.
• The Relation
• The Relation is the basic element in a relational data model.
RDBMS CONCEPTS
• A relation is subject to the following rules:
• Relation (file, table) is a two-dimensional table.
• Attribute (i.e. field or data item) is a column in the table.
• Each column in the table has a unique name within that table.
• Each column is homogeneous. Thus the entries in any column are
all of the same type (e.g. age, name, employee-number, etc).
• Each column has a domain, the set of possible values that can
appear in that column.
• A Tuple (i.e. record) is a row in the table.
• The order of the rows and columns is not important.
• Values of a row all relate to some thing or portion of a thing.
• Repeating groups (collections of logically related attributes that
occur multiple times within one record occurrence) are not
allowed.
RDBMS CONCEPTS
• Duplicate rows are not allowed (candidate keys are designed to
prevent this).
• Cells must be single-valued (but can be variable length). Single
valued means the following:
Cannot contain multiple values such as 'A1,B2,C3'.
Cannot contain combined values such as 'ABC-XYZ' where
'ABC' means one thing and 'XYZ' another.
• A relation may be expressed using the notation R(A,B,C, ...)
where:
• R = the name of the relation.
• (A,B,C, ...) = the attributes within the relation.
• A = the attribute(s) which form the primary key.
RDBMS CONCEPTS
• There are three levels of database design:
• Conceptual: producing a data model which accounts for the relevant
entities and relationships within the target application domain;
• Logical: ensuring, via normalisation procedures and the definition of
integrity rules, that the stored database will be non-redundant and
properly connected;
• Physical: specifying how database records are stored, accessed and related
to ensure adequate performance.
• It is considered desirable to keep these three levels quite separate -- one
of Codd's requirement for an RDBMS is that it should maintain logical-
physical data independence. The generality of the relational model means
that RDBMSs are potentially less efficient than those based on one of the
older data models where access paths were specified once and for all at
the design stage. However the relational data model does not preclude the
use of traditional techniques for accessing data - it is still essential to
exploit them to achieve adequate performance with a database of any size.
RDBMS CONCEPTS
Keys
RDBMS CONCEPTS
• A superkey is any set of attributes that uniquely identifies a row.
A superkey differs from a candidate key in that it does not require
the non redundancy property.
• A foreign key is an attribute (or set of attributes) that appears
(usually) as a non key attribute in one relation and as a primary
key attribute in another relation. I say usually because it is
possible for a foreign key to also be the whole or part of a primary
key:
A many-to-many relationship can only be implemented by
introducing an intersection or link table which then becomes
the child in two one-to-many relationships. The intersection
table therefore has a foreign key for each of its parents, and
its primary key is a composite of both foreign keys.
A one-to-one relationship requires that the child table has no
more than one occurrence for each parent, which can only be
enforced by letting the foreign key also serve as the primary
key.
RDBMS CONCEPTS
• A semantic or natural key is a key for which the possible values
have an obvious meaning to the user or the data. For example, a
semantic primary key for a COUNTRY entity might contain the
value 'USA' for the occurrence describing the United States of
America. The value 'USA' has meaning to the user.
• A technical or surrogate or artificial key is a key for which the
possible values have no obvious meaning to the user or the data.
These are used instead of semantic keys for any of the following
reasons:
When the value in a semantic key is likely to be changed by
the user, or can have duplicates. For example, on a PERSON
table it is unwise to use PERSON_NAME as the key as it is
possible to have more than one person with the same name,
or the name may change such as through marriage.
When none of the existing attributes can be used to guarantee
uniqueness. In this case adding an attribute whose value is
generated by the system, e.g from a sequence of numbers, is
the only way to provide a unique value. Typical examples
would be ORDER_ID and INVOICE_ID. The value '12345' has
no meaning to the user as it conveys nothing about the entity
to which it relates.
RDBMS CONCEPTS
• A key functionally determines the other attributes in the row, thus
it is always a determinant.
• Note that the term 'key' in most DBMS engines is implemented as
an index which does not allow duplicate entries.
RDBMS CONCEPTS
Relationships
RDBMS CONCEPTS
• A child record without a corresponding parent record is known as
an orphan.
• It is possible for a table to be related to itself. For this to be
possible it needs a foreign key which points back to the primary
key. Note that these two keys cannot be comprised of exactly the
same fields otherwise the record could only ever point to itself.
• A table may be the subject of any number of relationships, and it
may be the parent in some and the child in others.
• Some database engines allow a parent table to be linked via a
candidate key, but if this were changed it could result in the link
to the child table being broken.
• Some database engines allow relationships to be managed by
rules known as referential integrity or foreign key restraints.
These will prevent entries on child tables from being created if
the foreign key does not exist on the parent table, or will deal
with entries on child tables when the entry on the parent table is
updated or deleted.
RDBMS CONCEPTS
Relational Joins
RDBMS CONCEPTS
• Relations may share multiple common attributes. All of these
common attributes must be used in creating a join.
• The join operation provides a method for reconstructing a relation
that was decomposed into two relations during the normalisation
process.
• The join of two rows, however, can create a new row that was not
a member of the original relation.
• Thus invalid information can be created during the join process.
RDBMS CONCEPTS
• Lossless Joins
• A set of relations satisfies the lossless join property if the
instances can be joined without creating invalid data (i.e. new
rows). The term lossless join may be somewhat confusing. A join
that is not lossless will contain extra, invalid rows.
• A join that is lossless will not contain extra, invalid rows. Thus the
term gainless join might be more appropriate.
RDBMS CONCEPTS
Determinant and Dependent
RDBMS CONCEPTS
Functional Dependencies (FD)
RDBMS CONCEPTS
Transitive Dependencies (TD)
RDBMS CONCEPTS
Multi-Valued Dependencies (MVD)
RDBMS CONCEPTS
Join Dependencies (JD)
RDBMS CONCEPTS
Modification Anomalies
RDBMS CONCEPTS
• A deletion anomaly is a failure to remove information about an
existing database entry when it is time to remove that entry. In a
properly normalized database, information about an old, to-be-
gotten-rid-of entry needs to be deleted from only one place in the
database. In an inadequately normalized database, information
about that old entry may need to be deleted from more than one
place, and, human fallibility being what it is, some of the needed
additional deletions may be missed.
• An update of a database involves modifications that may be
additions, deletions, or both. Thus 'update anomalies' can be
either of the kinds of anomalies discussed above.
RDBMS CONCEPTS
Types of Relational Join
RDBMS CONCEPTS
Inner Join
RDBMS CONCEPTS
Natural Join
• A natural join is based on all columns in the two tables that have
the same name. It is semantically equivalent to an INNER JOIN or
a LEFT JOIN with a USING clause that names all columns that
exist in both tables.
RDBMS CONCEPTS
• Left [Outer] Join
• Returns all the rows from R1 even if there are no matches in R2.
If there are no matches in R2 then the R2 values will be shown as
null.
• SELECT * FROM R1 LEFT [OUTER] JOIN R2 ON R1.field =
R2.field Right [Outer] Join
• Returns all the rows from R2 even if there are no matches in R1.
If there are no matches in R1 then the R1 values will be shown as
null.
• SELECT * FROM R1 RIGHT [OUTER] JOIN R2 ON R1.field =
R2.field Full [Outer] Join
• Returns all the rows from both tables even if there are no matches
in one of the tables. If there are no matches in one of the tables
then its values will be shown as null.
• SELECT * FROM R1 FULL [OUTER] JOIN R2 ON R1.field =
R2.field
RDBMS CONCEPTS
Self Join
• This joins a table to itself. This table appears twice in the FROM
clause and is followed by table aliases that qualify column names
in the join condition.
• This type of join is rarely used as it does not have a join condition,
so every row of R1 is joined to every row of R2. For example, if
both tables contain 100 rows the result will be 10,000 rows. This
is sometimes known as a cartesian product and can be specified
in either one of the following ways:
RDBMS CONCEPTS
• Entity-Relationship Diagram (ERD)
• An entity-relationship diagram (ERD) is a data modeling technique
that creates a graphical representation of the entities, and the
relationships between entities, within an information system. Any
ER diagram has an equivalent relational table, and any relational
table has an equivalent ER diagram. ER diagramming is an
invaluable aid to engineers in the design, optimization, and
debugging of database programs.
• The entity is a person, object, place or event for which data is
collected. It is equivalent to a database table. An entity can be
defined by means of its properties, called attributes. For example,
the CUSTOMER entity may have attributes for such things as
name, address and telephone number.
RDBMS CONCEPTS
• The relationship is the interaction between the entities. It can be
described using a verb such as:
A customer places an order.
A sales rep serves a customer.
A order contains a product.
A warehouse stores a product.
• In an entity-relationship diagram entities are rendered as
rectangles, and relationships are portrayed as lines connecting the
rectangles.
• One way of indicating which is the 'one' or 'parent' and which is
the 'many' or 'child' in the relationship is to use an arrowhead
• Figure 4 - One-to-Many relationship using arrowhead notation
RDBMS CONCEPTS
• This can produce an ERD as shown
RDBMS CONCEPTS
• Another method is to replace the arrowhead with a crow's-foot, as
shown
• Figure 6 - One-to-Many relationship using crow's-foot notation
RDBMS CONCEPTS
• As well as using lines and circles the cardinality can be expressed
using numbers, as in:
• One-to-One expressed as 1:1
• Zero-to-Many expressed as 0:M
• One-to-Many expressed as 1:M
• Many-to-Many expressed as N:M
• This can produce an ERD as shown
ERD with crow's-foot notation and cardinality
RDBMS CONCEPTS
• In plain language the relationships can be expressed as follows:
• 1 instance of a SALES REP serves 1 to many CUSTOMERS
• 1 instance of a CUSTOMER places 1 to many ORDERS
• 1 instance of an ORDER lists 1 to many PRODUCTS
• 1 instance of a WAREHOUSE stores 0 to many PRODUCTS
RDBMS CONCEPTS
Data Normalisation
• Relational database theory, and the principles of normalisation,
were first constructed by people with a strong mathematical
background. They wrote about databases using terminology which
was not easily understood outside those mathematical circles.
Below is an attempt to provide understandable explanations.
• Data normalisation is a set of rules and techniques concerned
with:
• Identifying relationships among attributes.
• Combining attributes to form relations.
• Combining relations to form a database.
• It follows a set of rules worked out by E F Codd in 1970. A
normalised relational database provides several benefits:
• Elimination of redundant data storage.
• Close modeling of real world entities, processes, and their
relationships.
• Structuring of data so that the model is flexible.
RDBMS CONCEPTS
• Decompose relations such that each non key attribute is
dependent on all the attributes in the key.
• Combine relations with identical primary keys (2nd normal form).
• Identify all transitive dependencies.
Check relations for dependencies of one non key attribute with
another non key attribute.
Check for dependencies within each primary key (i.e.
dependencies of one attribute in the key on other attributes
within the key).
• The guidelines for developing relations in 3rd Normal Form can be
summarised as follows:
• Define the attributes.
• Group logically related attributes into relations.
• Identify candidate keys for each relation.
• Select a primary key for each relation.
• Identify and remove repeating groups.
• Combine relations with identical keys (1st normal form).
• Identify all functional dependencies.
RDBMS CONCEPTS
• Decompose relations such that there are no transitive
dependencies.
• Combine relations with identical primary keys (3rd normal
form) if there are no transitive dependencies.
RDBMS CONCEPTS
• 1st Normal Form
• A table is in first normal form if all the key attributes have been
defined and it contains no repeating groups. Taking the ORDER
entity in figure below as an example we could end up with a set of
attributes like this:
RDBMS CONCEPTS
• I have removed 'product1', 'product2' and 'product3', so there are
no repeating groups.
• ORDER_LINE
• order_id product
• 123 abc1
• 123 def1
• 123 ghi1
• 456 abc2
• Each row contains one product for one order, so this allows an
order to contain any number of products.
RDBMS CONCEPTS
• This results in a new version of the ERD, as shown in figure :
• Figure - ERD with ORDER and ORDER_LINE
•
RDBMS CONCEPTS
• 2nd Normal Form
• A table is in second normal form (2NF) if and only if it is in 1NF
and every non key attribute is fully functionally dependent on the
whole of the primary key (i.e. there are no partial dependencies).
• Anomalies can occur when attributes are dependent on only part
of a multi-attribute (composite) key.
• A relation is in second normal form when all non-key attributes
are dependent on the whole key. That is, no attribute is
dependent on only a part of the key.
• Any relation having a key with a single attribute is in second
normal form.
• Take the following table structure as an example:
• order(order_id, cust, cust_address, cust_contact,
order_date, order_total)
• Here we should realise that cust_address and cust_contact are
functionally dependent on cust but not on order_id, therefore
they are not dependent on the whole key. To make this table 2NF
these attributes must be removed and placed somewhere else.
RDBMS CONCEPTS
• 3rd Normal Form
• A table is in third normal form (3NF) if and only if it is in 2NF and
every non key attribute is non transitively dependent on the
primary key (i.e. there are no transitive dependencies).
• Anomalies can occur when a relation contains one or more
transitive dependencies.
• A relation is in 3NF when it is in 2NF and has no transitive
dependencies.
• A relation is in 3NF when 'All non-key attributes are dependent on
the key, the whole key and nothing but the key'.
• Take the following table structure as an example:
• order(order_id, cust, cust_address, cust_contact,
order_date, order_total)
• Here we should realise that cust_address and cust_contact are
functionally dependent on cust which is not a key. To make this
table 3NF these attributes must be removed and placed
somewhere else.
RDBMS CONCEPTS
• You must also note the use of calculated or derived fields. Take
the example where a table contains PRICE, QUANTITY and
EXTENDED_PRICE where EXTENDED_PRICE is calculated as
QUANTITY multiplied by PRICE. As one of these values can be
calculated from the other two then it need not be held in the
database table. Do not assume that it is safe to drop any one of
the three fields as a difference in the number of decimal places
between the various fields could lead to different results due to
rounding errors. For example, take the following fields:
• AMOUNT - a monetary value in home currency, to 2 decimal
places.
• EXCH_RATE - exchange rate, to 9 decimal places.
• CURRENCY_AMOUNT - amount expressed in foreign currency,
calculated as AMOUNT multiplied by EXCH_RATE.
• If you were to drop EXCH_RATE could it be calculated back to its
original 9 decimal places?
• Reaching 3NF is is adequate for most practical needs, but there
may be circumstances which would benefit from further
normalisation.
RDBMS CONCEPTS
• Boyce-Codd Normal Form
• A table is in Boyce-Codd normal form (BCNF) if and only if it is in
3NF and every determinant is a candidate key.
• Anomalies can occur in relations in 3NF if there is a composite key
in which part of that key has a determinant which is not itself a
candidate key.
• This can be expressed as R(A,B,C), C A where:
The relation contains attributes A, B and C.
A and B form a candidate key.
C is the determinant for A (A is functionally dependent on C).
C is not part of any key.
• Anomalies can also occur where a relation contains several
candidate keys where:
The keys contain more than one attribute (they are composite
keys).
An attribute is common to more than one key.
RDBMS CONCEPTS
• Take the following table structure as an example:
• schedule(campus, course, class, time, room/bldg)
• Take the following sample data:
• Campus course class time room/bldg
• East English 101 1 8:00-9:00 212AYE
• East English 101 2 10:00-11:00 305RFK
• West English 101 3 8:00-9:00 102PPR
RDBMS CONCEPTS
• As another example take the following structure:
• enrol(student#, s_name, course#, c_name, date_enrolled)
• This table has the following candidate keys:
• (student#, course#)
• (student#, c_name)
• (s_name, course#) - this assumes that s_name is a unique
identifier
• (s_name, c_name) - this assumes that c_name is a unique
identifier
• The relation is in 3NF but not in BCNF because of the following
dependencies:
• student# s_name
• course# c_name
RDBMS CONCEPTS
• 4th Normal Form
• A table is in fourth normal form (4NF) if and only if it is in BCNF
and contains no more than one multi-valued dependency.
• Anomalies can occur in relations in BCNF if there is more than one
multi-valued dependency.
• If A B and A C but B and C are unrelated, ie A (B,C) is false, then
we have more than one multi-valued dependency.
• A relation is in 4NF when it is in BCNF and has no more than one
multi-valued dependency.
RDBMS CONCEPTS
De-Normalisation
RDBMS CONCEPTS