Module5 Database Management Sustems

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

Database Management Systems 1 Module 5

Normalization
 It is use to remove the design flaws from a database. We describe a number of normal forms which are sets of rules describing
what we should not do in our table.
 It is a design technique that is widely used as a guide in designing relational databases. Its theory is based on the concepts of
normal forms. A relational table is said to be a particular normal form if it satisfied a certain set of constraints.

Normalization Process
 It consists of breaking tables into smaller tables that forms a better design.
 It is the process of decomposing relations with anomalies to produce smaller, well-structured relations.

Well-Structured Relations
It is a relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data
inconsistencies.

The goal of normalization is to create a set of relational tables that are free of redundant data and that can be consistently and
correctly modified, and to avoid anomalies. This means that all tables in a relational database should be in the third normal form
(3NF). A relational table is in 3NF if and only if all non-key columns are:
(a) Mutually independent and;
(b) Fully dependent upon the primary key.

Mutual independence means that no non-key column is dependent upon any combination of the other columns. The first two
normal forms are intermediate steps to achieve the goal of having all tables in 3NF.

Different Normal Forms


First Normal Form (1NF)
It states that each attribute must be atomic: that is, each attribute must contain a single, not a set of values or another database
row.

Second Normal Form (2NF)


It states that an attribute that are not part of the primary key are fully functional dependent on the primary key and the schema
is already in First Normal Form.

Third Normal Form (3NF)


 Meet all the requirements of the second normal form.
 Remove columns that are not dependent upon the primary key.

Types of Anomalies
Insertion Anomaly
 means that that some data cannot be inserted in the database
 adding new rows forces user to create duplicate data
Deletion Anomaly
 deleting some data cause other information to be lost
 deleting rows may cause a loss of data that would be needed for other future rows
Modification Anomaly
 means we have data redundancy in the database and to make any modification we have to change all copies of the redundant
data or else the database will contain incorrect data.
 changing data in a row forces changes to other rows because of duplication

Example:
Sid Sname Phone Course Course_description Credit_ Grade
_id hours
100 Jake Maggay 487 IS380 Database Concepts 3 A
2454

Prepared by: J. D. Maribbay | CSU Carig – College of Information and Computer Science 1
Database Management Systems 1 Module 5
100 Jake Maggay 487 IS416 Unix Operating 3 B
2454 System
200 Josephine 671 IS380 Database Concepts 3 B
Maribbay 8120
200 Josephine 671 IS416 Unix Operating 3 B
Maribbay 8120 System
200 Josephine 671 IS420 Data Net Work 3 C
Maribbay 8120
300 Jinky Maggay 871 IS417 System Analysis 3 A
2356

Attribute Grade is fully functionally dependent on the primary key (Sid, Course-id) because both parts of the primary keys
are needed to determine Grade. On the other hand both Sname, and Phone attributes are not fully functionally dependent on the
primary key, because only a part of the primary key namely Sid is needed to determine both Sname and Phone. Also attributes
Credit_hours and Course_Description are not fully functionally dependent on the primary key because only Course_id is needed to
determine their values.
The new relation Student-courses still suffers from all three anomalies for the following reasons:
1.    The relation contains redundant data (Note Database_Concepts as the course description for IS380 appears in more
than one place).
2.    The relation contains information about two entities Student and course.

Following is the detail description of the anomalies that relation Student-courses suffers from.
Insertion anomaly: We cannot add a new course such as IS247 with course description programming techniques to the database
unless we add a student who to take the course.

Update anomaly: If we change the course description for IS380 from Database Concepts to New Database Concepts we have to make
changes in more than one place or else the database will be inconsistent. In other words in some places the course description will be
New Database Concepts and in any place we forgot to make the changes the description still will be Database Concepts.

Deletion anomaly: If student Jinky Maggay is deleted from the database we also loose information that we had on course IS417 with
description System Analysis.

The above discussion indicates that having a single table Student-courses for our database causing problems (anomalies).
Therefore we break the table to smaller table to get a higher normal form relation.

Second Normal Form: A first normal form relation is in second normal form if all its non-primary attributes are fully functionally
dependent on the primary key.
Primary attributes are those attributes, which are parts of the primary key, and non-primary attributes do not participate in
the primary key. In Student-courses relation both Sid and Course_id are primary attributes because they are components of the
primary key. However attributes Sname, Phone, Course-description, Credit-hours and Grade all are non-primary attributes because
none of them is a component of the primary key.
To convert Student-courses to second normal relations we have to make all non-primary attributes to be fully functionally
dependent on the primary key. To do that we need to project (that is we break it down to two or more tables) Student-courses table
into two or more tables.
Following are these three relations and their contents:
 
Student (Sid:pk, Sname, Phone)
Sid Sname Phone
100 Jake Maggay 487 2454
200 Josephine Mribbay 671 8120
300 Jinky Maggay 871 2356
 

Courses (Course_id:pk, Course_Description)


Course_id Course_description Credit_hours
IS380 Database Concepts 3
IS416 Unix Operating System 3
IS420 Data Net Work 3
IS417 System Analysis 3
 

Prepared by: J. D. Maribbay | CSU Carig – College of Information and Computer Science 2
Database Management Systems 1 Module 5
Student-grade (Sid:fk:Student, Course_id:fk:Courses, Grade)
Sid Course_id Grade
100 IS380 A
100 IS416 B
200 IS380 B
200 IS416 B
200 IS420 C
300 IS417 A
 
All these three relations/tables are in second normal form. Examination of these relations shows that we have eliminated the
redundancy in the database. Now relation Student contains information only related to the entity student, relation Courses contains
information related to entity Courses only, and the relation Student-grade contains information related to the relationship between
these two entities.
Further these three sets are free from all anomalies. Let us clarify this in more detail:
Insertion anomaly: Now a new Course with course_d IS247 and Course_description Programming Techniques can be inserted to the
table Course. Equally we can add any new students to the database by adding their id, name and phone to Student table. Therefore our
database, which made up of these three tables does not suffer from insertion anomaly.

Update anomaly: Since redundancy of the data was eliminated no update anomaly can occur. To change the course_description for
IS380 only one change is needed in table Courses.

Deletion anomaly: the deletion of student Russell from the database is achieved by deleting Russell's records from both Student and
Student-grade relations and this does not have any side effect because the course IS417 untouched in the table Courses.

Structured Query Language (SQL)

SQL was developed at IBM by Andrew Richardson, Donald C. Messerly and Raymond F. Boyce in the early 1970s. This
version, initially called SEQUEL, was designed to manipulate and retrieve data stored in IBM's original relational database product,
System R. IBM patented this version of SQL in 1985.
During the 1970s, a group at IBM San Jose Research Laboratory developed the System R relational database management system.
Donald D. Chamberlin and Raymond F. Boyce of IBM subsequently created the Structured English Query Language (SEQUEL or
SEQL) to manage data stored in System R. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademark
of the UK-based Hawker Siddeley aircraft company.
Ingres implemented a query language known as QUEL, which was later supplanted in the marketplace by SQL.

Language elements
The SQL language is sub-divided into several language elements.
Clauses - These are in some cases optional, constituent (basic) components of statements and queries.
Expressions - These can produce either scalar values or tables consisting of columns and rows of data.
Predicates - These specify conditions that can be evaluated to SQL three-valued logic (3VL) Boolean truth values and which are used
to limit the effects of statements and queries, or to change program flow.
 Queries which retrieve data based on specific criteria.
 Statements which may have a persistent effect on schemas and data, or which may control transactions, program flow,
connections, sessions, or diagnostics.

SQL statements also include the semicolon (";") statement terminator. Though not required on every platform, it is defined as a
standard part of the SQL grammar.
Insignificant whitespace is generally ignored in SQL statements and queries, making it easier to format SQL code for readability.

Queries
The most common operation in SQL is the query, which is performed with the declarative SELECT statement. Queries allow
the user to describe desired data.
A query includes a list of columns to be included in the final result immediately following the SELECT keyword.
An asterisk ("*") can also be used to specify that the query should return all columns of the queried tables.

SELECT retrieves data from one or more tables, or expressions. It is the most complex statement in SQL, with optional keywords and
clauses that include:
FROM clause – it indicates the table(s) from which data is to be retrieved.

Prepared by: J. D. Maribbay | CSU Carig – College of Information and Computer Science 3
Database Management Systems 1 Module 5

WHERE clause – it includes a comparison predicate, which restricts the rows returned by the query. It eliminates all rows from the
result set for which the comparison predicate does not evaluate to True. WHERE clause is applied before the GROUP BY clause

GROUP BY clause – it is used to project rows having common values into a smaller set of rows. It is often used in conjunction with
SQL aggregation functions or to eliminate duplicate rows from a result set.

HAVING clause – it includes a predicate used to filter rows resulting from the GROUP BY clause because it acts on the results of the
GROUP BY clause.
ORDER BY clause – it identifies which columns are used to sort the resulting data, and in which direction they should be sorted
(options are ascending or descending). Without an ORDER BY clause, the order of rows returned by an SQL query is undefined.

The following is an example of a SELECT query that returns a list of expensive books. The query retrieves all rows from the
Book table in which the price column contains a value greater than 100.00. The result is sorted in ascending order by title. The asterisk
(*) in the select list indicates that all columns of the Book table should be included in the result set.

SELECT * FROM Book WHERE price > 100.00 ORDER BY title; or


SELECT * FROM Book WHERE price > 100.00 ORDER BY title ASC;

The example below demonstrates a query of multiple tables, grouping, and aggregation, by returning a list of books and the number of
authors associated with each book.

SELECT Book.Title, count(*) AS Authors FROM Book, Author WHERE Book.Author_ID =


Author.Author_ID GROUP BY Book.Title;

SQL includes operators and functions for calculating values on stored values. SQL allows the use of expressions in the select
list to project data, as in the following example which returns a list of books that cost more than 100.00 with an additional Sales_Tax
column containing a sales tax figure calculated at 6% of the price. The resul is sorted in descending order by Title

SELECT Book_ID, Title, Price, Price * 0.06 AS Sales_Tax FROM Book WHERE Price > 100.00
ORDER BY Title DESC;

Null
The idea of Null was introduced into SQL to handle missing information in the relational model. It does not have a value (and is not a
member of any data domain) but is rather a placeholder or “mark” for missing information. Therefore comparisons with Null can
never result in either True or False but always in the third logical result, Unknown.

Data Manipulation Language (DML)


It is the subset of SQL used to add, update and delete data.
1. INSERT – adds rows (formally tuples) to an existing table, example:

INSERT INTO Table_Name (Field1, Field2, Field3) VALUES ('Value 1', 'Value 2', ‘Value3’);

2. UPDATE – modifies a set of existing table rows, example:

UPDATE Table_Name SET Field1 = ‘New Value' WHERE Field2 = 'Value 2';

3. DELETE – removes existing rows from a table, Example:

DELETE FROM Table_Name WHERE Field2 = 'Value 2';

Data Definition Language (DDL)


Itmanages table and index structure. The most basic items of DDL are the CREATE, ALTER, RENAME, DROP and
TRUNCATE statements:
 CREATE creates an object (a table, for example) in the database.
 DROP deletes an object in the database, usually irretrievably.
 ALTER modifies the structure an existing object in various ways—for example, adding a column to an existing table.
 TRUNCATE – deletes all data from a table in a very fast way.

Prepared by: J. D. Maribbay | CSU Carig – College of Information and Computer Science 4
Database Management Systems 1 Module 5
Example:
This statement will create table myFirst_Table.
CREATE TABLE myFirst_Table
(
my_Field1 INT,
my_Field2 VARCHAR(50),
my_Field3 DATE NOT NULL,
PRIMARY KEY (my_field1, my_field2)
);

This statement will delete column my_Field3.


ALTER TABLE myFirst_Table DROP COLUMN my_Field3;

This statement will add column my_Field3.


ALTER TABLE my_Table ADD COLUMN my_Field3 varchar(50);

This statement will delete all rows in Table_Name faster than DELETE clause.
TRUNCATE table Table_Name;*cannot be applied in MS Access

Prepared by: J. D. Maribbay | CSU Carig – College of Information and Computer Science 5

You might also like