Relational Data Manipulation: CXB 3104 Advanced Database Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

CXB 3104 Advanced Database Systems

Relational Data Manipulation


Relational tables are sets. Rows of the tables can be considered as elements of the set Operations that can be performed on sets can be done on relational tables.

Lecture 3
Muthu

muthma@sunway.edu.my

Union Operator
The union operation of two relational tables is formed by appending rows from one table to those of a second table to produce a third. Duplicate rows are eliminated. Tables that are union compatible must have the same number of columns and corresponding columns must come from the same domain.

Union Operator

Difference Operator

Difference Operator

The difference of two relational tables is a third that contains those rows that occur in the first table but not in the second Requires that the tables be union compatible.

Intersection Operator
The intersection of two relational tables is a

Intersection Operator

third table that contains common rows.


Requires that the tables be union compatible.

Product Operator

Product Operator

The product of two relational tables is called the Cartesian product. It is the concatenation of every row in one table with every row in the second. The product of table A (having m rows) and table B (having n rows) is the table C (having m x n rows).
9 10

Projection & Selection Operators

Join Operator

Projection
The project operator retrieves a subset of columns from a table, removing duplicate rows from the result. Yields vertical subset of a table

combines the product, selection, and projection combines data from one row of a table with rows from another or the same table when certain criteria are met. criteria involves a relationship among the columns in the join relational table. If the join criterion is based on equality of column value, the result is called an equijoin. A natural join removes redundant columns.
12

Selection
The select operator retrieves subsets of rows from a relational table based on a value(s) in a column or columns. Yields a horizontal subset of a table

11

Division Operator Join Operator

Results in columns values in one table for which there are other matching column values corresponding to every row in another table.

13

14

Data Dictionary
It provides details of all tables found within the

Data Dictionary

database.

It contains all the attribute name and characteristics

for each table in the system. data

The data dictionary contains metadata - data about

15

16

Database Tables and Normalization

Normalization
Process for evaluating and correcting table structures to minimize data redundancies

Normalization

helps eliminate data anomalies

Works through a series of stages called normal forms:


Normal form (1NF) Second normal form (2NF) Third normal form (3NF)
18

Database Tables and Normalization


2NF is better than 1NF; 3NF is better than 2NF For most business database design purposes, 3NF is highest we need to go in the normalization process Highest level of normalization is not always most desirable

What is Normalisation?

In a RDB normalisation is crucial for: retaining data consistency on updates Minimizing data redundancy and therefore reducing file space required in the database Minimize data storage

Key Concepts in normalization are Functional Dependency and keys


19 20

Update Anomalies

Deletion Anomalies

Tables that have redundant data may have problems called update anomalies. Consider the following table of data on products and required manufacturing equipment:
Product name Ethylene Ethylene Styrene Styrene Equipment name cooler heater pump heater Product manager Smith Smith Jones Jones Equipment supplier Westwood Westwood Davison Westwood Supplier address Derby Derby Rugby Derby

If a row is deleted that represents the last product with a particular piece of equipment, the equipment details are also lost - this is a deletion anomaly.

21

22

Deletion Anomalies example

Insert Anomalies

Insert anomalies New rows that are entered must always have consistent sets of product and/or equipment - human error may lead to inconsistencies Product and equipment data cannot be entered separately without using null values - this might violate primary keys

Product name Ethylene Ethylene Styrene Styrene

Equipment name cooler heater pump heater

Product manager Smith Smith Jones Jones

Equipment supplier Westwood Westwood Davison Westwood

Supplier address Derby Derby Rugby Derby

Loss of equipment Supplier Details

23

24

Insert Anomalies example

Modification Anomalies

An update on the values of product or equipment in one row must also be performed on all the other rows that have the same product or equipment, or inconsistencies will occur in the data modification anomaly

Potential for inconsistent data sets

25

26

Modification Anomalies example

Normal Forms

Normal Forms are rules developed to avoid logical inconsistencies from table update operations. Each normal form prohibits a form of redundancy in table organisation that could yield meaningless results if one table were updated independently of other tables or other rows in the table.

27

28

Normal Forms

Normal Forms

There are multiple levels of normal forms. Each higher level adds in an additional constraint to the level preceding it. As the database design satisfies higher level normal forms the tables become more fragmented. This means that:
As data consistency is improved database navigation and hence queries become slower The tables become less like the real-world system they represent.

The six normal form levels are:


1st Normal Form (1NF) 2nd Normal Form (2NF) 3rd Normal Form (3NF) Boyce-Codd Normal Form (BCNF) 4th Normal Form (4NF) 5th Normal Form (5NF)

Usually ensuring that the database satisfies the third normal form is sufficient for data consistency.

29

30

First Normal Form- 1

First Normal Form 2

A table is in first normal form when a primary key can be defined for each combination of data. In the example below, a row has two entries in column Equipment name.
Table violates INF

This can be remedied by using two rows for the dual entry:
Table satisfies INF Table violates 2NF

Product name Ethylene Ethylene Styrene Styrene

Equipment name cooler heater pump heater

Product manager Smith Smith Jones Jones

Equipment supplier Westwood Westwood Davison Westwood

Supplier address Derby Derby Rugby Derby

Product name Ethylene Styrene Styrene

Equipment name cooler, heater pump heater

Product manager Smith Jones Jones

Equipment supplier Westworld Davison Westworld

Supplier address Derby Rugby Derby

Primary key: (Product name, Equipment name)

31

32

First Normal Form Example Rules


First Normal Form Determinancy Diagram


Product Name Product Manager

Manufacture of each product uses a set of equipment. A piece of equipment can be used in the manufacture different products. Each equipment used in the manufacture of each product has a supplier. Each product has a product manager Each supplier has an address.

Equipment Name

Equipment Supplier

Supplier Address

33

34

Second Normal Form - 1

Second Normal Form 2

A table is in second normal form when it satisfies first normal form and each row has a primary key. Each non-primary key must fully depend on the primary key. In the example, Equipment supplier and Supplier address depend on the chosen primary key (Product name, Equipment name) but Product Manager only depends on part of the primary key.

This can be remedied by splitting off this partial dependency into another table:
Tables satisfy 2NF Tables violate 3NF

Product name Ethylene Styrene

Product manager Smith Jones

Product name Ethylene Ethylene Styrene Styrene

Equipment name cooler heater pump heater

Equipment supplier Westwood Westwood Davison Westwood

Supplier address Derby Derby Rugby Derby

Primary key: (Product name)

Primary key: (Product name, Equipment name)

35

36

Third Normal Form - 1

Third Normal Form 2

A table is in third normal form when it satisfies second normal form and each nonprimary key column directly depends on the primary key. In the example, 3NF is violated because there is transitive dependency. Supplier address depends on Equipment supplier which in turn depends on the primary key.

This can be remedied by splitting off this indirect dependence into a further table:
Tables satisfy 3NF

Product name Ethylene Styrene

Product manager Smith Jones

Equipment supplier Westwood Davison

Supplier address Derby Rugby

Product name Ethylene Ethylene Styrene Styrene

Equipment Equipment name supplier cooler heater pump heater Westwood Westwood Davison Westwood

Primary key: (Product name)

Primary key: (Equipment supplier)

Primary key: (Product name, Equipment name)

37

38

Other Normal Forms


The data in a RDB is free of redundancy when it is in the fifth normal form. In this state an update on a column in any table should not lead to data inconsistencies occurring. In practice it is adequate to normalise data into the 3NF.

39

You might also like